Senior Software Engineer in Test (AI Agentic Systems)

Posted Yesterday
Be an Early Applicant
2 Locations
In-Office
99K-124K Annually
Senior level
Healthtech
The Role
Owner of quality for an LLM-based multi-agent claims adjudication pipeline. Build evaluation frameworks, automated LLM-grading workflows, function-call and reasoning-trace auditing using Vertex AI. Integrate CI/CD regression, Auto-SxS comparisons, and large-scale mocking to ensure compliance and accurate adjudications for DOL/DOI audits.
Summary Generated by Built In

At Collective Health, we’re transforming how employers and their people engage with their health benefits by seamlessly integrating cutting-edge technology, compassionate service, and world-class user experience design.

This is not a traditional QA role. You will be the quality owner for an LLM-based multi-agent pipeline that autonomously adjudicates health insurance claims for self-funded plan sponsors. You are building a Three-Tier Evaluation Framework to ensure our Gemini-powered agents reason correctly, call tools accurately, and produce DOL-ready outcomes.

You will work at the intersection of Vertex AI, healthcare compliance, and high-scale data engineering. Your work directly determines whether claims are paid correctly and whether the company can withstand a Department of Labor (DOL) or state DOI audit. The stakes are real, the domain is hard, and the problems are genuinely novel.

What you'll do:
  • Outcome Evaluation (The "What")
    • Golden Set Governance: Build and maintain a versioned library of "Grounding Data" results by working with senior claims examiners to define "Ground Truth."
    • Model-as-a-Judge Automation: Design automated "LLM-grading-LLM" workflows using custom rubrics to score factual grounding and policy compliance.
    • Semantic Assertion Framework: Develop testing libraries that move beyond string matching to validate semantic equivalence and numerical accuracy in agent outputs.
  • Trajectory Evaluation (The "How")
    • Function-Call Auditing: Use Vertex AI traces to programmatically verify that mandatory tools (via MCP) were invoked with correct arguments.
    • Orchestration Logic Validation: Assert that agents respect defined priorities across the four architectural layers: Data & Knowledge, Orchestration, Agentic Reasoning, and Tooling.
    • Reasoning Trace Auditing: Ensure every autonomous decision is traceable to a specific SOP sentence and a live API data point.
  • Continuous Automated Regression (The "Always")
    • CI/CD Integration: Every prompt or model update in Vertex AI Prompt Management must trigger an automated regression run.
    • Auto-SxS: Own the automated pairwise comparison process to detect logic drift between "New" and "Production" agent versions.
    • Mocking & Resilience: Build a Vertex AI/ADK mocking layer to simulate model responses, allowing for thousands of logic tests in seconds with zero API costs.
To be successful in this role, you'll need:
  • Required Skills (The Core Bar)
    • Python SDET Expertise: Expert in Python and pytest, specifically building custom mocking frameworks for external APIs (Vertex AI/ADK).
    • AI/LLM Observability: Hands-on experience with Vertex AI Experiments, Auto-SxS, and Cloud Logging for trace analysis.
    • Data Literacy: Expert-level SQL (BigQuery) and Pandas skills to "diff" massive datasets and identify adjudication discrepancies.
    • Prompt Engineering for QA: Ability to analyze "System Instructions" and refine prompts based on failed test cases to close logic gaps.
    • Architectural Testing: Experience testing multi-layer systems involving RAG (Vertex AI Search), state management (LangGraph), and function calling.
  • Preferred Skills (The "Nice-to-Haves")
    • Healthcare/Claims Domain: Familiarity with claims adjudication concepts (pend reason codes, COB, eligibility, stop-loss).
    • Compliance Knowledge: Understanding of HIPAA/PHI handling and writing test evidence for regulatory bodies (DOL/DOI).
    • Human-in-the-Loop Testing: Experience in "Shadow Mode" monitoring—comparing agent decisions against human expert (MCA) baselines.
Pay Transparency Statement 

This is a hybrid position based out of our Lehi office, with the expectation of being in office at least two weekdays per week#LI-hybrid

The actual pay rate offered within the range will depend on factors including geographic location, qualifications, experience, and internal equity. In addition to the salary, you will be eligible for 115000 stock options and benefits like health insurance, 401k, and paid time off. Learn more about our benefits at https://jobs.collectivehealth.com/benefits/.

Lehi, UT Pay Range
$99,200$124,000 USD
Plano, TX Pay Range
$109,120$136,400 USD
Why Join Us?
  • Mission-driven culture that values innovation, collaboration, and a commitment to excellence in healthcare
  • Impactful projects that shape the future of our organization
  • Opportunities for professional development through internal mobility opportunities, mentorship programs, and courses tailored to your interests
  • Flexible work arrangements and a supportive work-life balance

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance, please contact [email protected].

Privacy Notice

For more information about why we need your data and how we use it, please see our privacy policy: https://collectivehealth.com/privacy-policy/.

Skills Required

  • Expert in Python and pytest, including building custom mocking frameworks for external APIs (Vertex AI/ADK).
  • Hands-on experience with Vertex AI Experiments, Auto-SxS, and Cloud Logging for trace analysis.
  • Expert-level SQL (BigQuery) and Pandas for diffing massive datasets and identifying discrepancies.
  • Prompt engineering for QA: analyze system instructions and refine prompts based on failed test cases.
  • Experience testing multi-layer systems involving RAG (Vertex AI Search), state management (LangGraph), and function calling.
  • CI/CD integration to trigger automated regression runs for prompt or model updates.
  • Build and maintain a Vertex AI/ADK mocking layer to simulate model responses at scale.
  • Familiarity with claims adjudication concepts (pend reason codes, COB, eligibility, stop-loss).
  • Understanding of HIPAA/PHI handling and writing test evidence for regulatory audits (DOL/DOI).
  • Experience with human-in-the-loop or Shadow Mode monitoring comparing agents to human expert baselines.
  • Hybrid work based out of Lehi office with expectation of being in office at least two weekdays per week.

Collective Health Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Collective Health and has not been reviewed or approved by Collective Health.

  • Healthcare Strength Benefits include competitive medical, dental, and vision coverage with options for $0 employee-only premiums and no waiting period, with some plans offering very low out-of-pocket costs. Feedback suggests this level of coverage is a standout element of the total package.
  • Leave & Time Off Breadth Generous time-off programs encompass ample PTO, 12 paid holidays, baby-bonding leave with a supportive return-to-work program, and a fully paid four-week sabbatical after five years. Feedback suggests these elements meaningfully enhance overall rewards beyond cash.
  • Wellbeing & Lifestyle Benefits A $100 monthly wellness stipend, access to Headspace resources, commuter support, and optional add-on insurances provide breadth and flexibility in day-to-day benefits. Feedback suggests these lifestyle supports contribute tangible value alongside salary.

Collective Health Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Chicago, IL
500 Employees
Year Founded: 2013

What We Do

Collective Health is a technology company simplifying employer healthcare to make health insurance work for everyone. With more than a quarter million members and over 60 enterprise clients—including Pinterest, Restoration Hardware, and more—our technical and customer experience teams are reinventing the healthcare experience for employers and their people.

Why Work With Us

Collective Health has a mighty mission—to make the American healthcare system effortless—and a culture focused empathy, authenticity, curiosity, and a need to solve hard problems. We have a diverse, mission-driven team with doctors working alongside data scientists and nuclear engineers to reinvent the the healthcare experience for everyday people.

Gallery

Gallery

Similar Jobs

PNC Bank Logo PNC Bank

Business Systems Analyst

Machine Learning • Payments • Security • Software • Financial Services
Remote or Hybrid
USA
55000 Employees
75K-125K Annually

Boeing Logo Boeing

Electrical Design and Analysis Engineer, Associate or Experienced

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Hill Air Force Base, UT, USA
170000 Employees
92K-152K Annually

Boeing Logo Boeing

Facilities Project Administrator

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Salt Lake City, UT, USA
170000 Employees
81K-122K Annually

Boeing Logo Boeing

Quality Production Specialist

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Salt Lake City, UT, USA
170000 Employees
71K-97K Annually

Similar Companies Hiring

Camber Thumbnail
Fintech • Healthtech • Social Impact
New York, New York
90 Employees
Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account