Photon

QA Engineer (Automation)- Dallas, TX

Posted 21 Days Ago

2 Locations

In-Office or Remote

38K-133K Annually

Expert/Leader

Agency • Information Technology

The Role

Design and build automation frameworks and evaluation pipelines for agentic AI products. Develop non-deterministic tests, golden datasets, tool-use validation, prompt regression tests, latency/token monitoring, hallucination detection, and collaborate with AI engineers to convert requirements into measurable automated test cases.

Summary Generated by Built In

We are seeking a QA Automation Engineer who is ready to move beyond traditional "Pass/Fail" testing. In this role, you will design and build automation frameworks specifically for Agentic AI products. You will focus on evaluating the performance of autonomous agents, ensuring they follow logical reasoning paths, call the correct tools, and provide accurate, safe outputs.

Your mission is to build the "evaluations" (Evals) that define what high-quality AI behavior looks like, moving the needle from unpredictable experiments to production-grade software.

Key Responsibilities

Non-Deterministic Testing: Develop automation strategies for probabilistic outputs, using model-based evaluation to "test the tester."
Building "Eval" Pipelines: Create and maintain "Golden Datasets" to benchmark agent performance across different versions of prompts and models.
Tool-Use Validation: Build automated tests to verify that agents call the correct functions/APIs with the right parameters in complex multi-step workflows.
Regression Testing for Prompts: Monitor how subtle changes in prompt engineering or model updates (e.g., moving from GPT-4 to Claude 3.5) affect the product’s reliability.
Latency & Token Monitoring: Integrate performance testing into the CI/CD pipeline to track agent reasoning time and cost-efficiency.
Hallucination Detection: Develop automated checks to identify and report AI hallucinations, bias, or "jailbreak" attempts.
Collaboration: Work closely with AI Engineers to translate "vague" business requirements into measurable, automated test cases.

Required Skills & Qualifications

Experience: 10+ years in QA Automation, with a recent focus on AI/ML or LLM-based applications.
Python Proficiency: Expert-level Python skills (the industry standard for AI testing) and experience with testing frameworks like Pytest.
AI Testing Tools: Familiarity with AI evaluation frameworks such as LangSmith, DeepEval, RAGAS, or Promptfoo.
API & Backend Testing: Deep experience with Playwright, Selenium, or Cypress for UI, but a heavy focus on API-level testing and database validation.
Statistical Mindset: Understanding that AI testing often requires "scoring" (e.g., 85% accuracy) rather than a simple binary pass/fail.
Data Skills: Ability to work with SQL and JSON to validate data retrieved by agents during RAG (Retrieval-Augmented Generation) processes.

Preferred Qualifications

Experience testing Multi-Agent Systems (where one agent tests another).
Knowledge of Prompt Engineering and how it influences software behavior.
Background in Investment Banking or Fintech (if applicable) to understand high-stakes data accuracy.

Compensation, Benefits and Duration

Minimum Compensation: USD 38,000
Maximum Compensation: USD 133,000
Compensation is based on actual experience and qualifications of the candidate. The above is a reasonable and a good faith estimate for the role.
Medical, vision, and dental benefits, 401k retirement plan, variable pay/incentives, paid time off, and paid holidays are available for full time employees.
This position is not available for independent contractors
No applications will be considered if received more than 120 days after the date of this post

Skills Required

10+ years in QA Automation with recent focus on AI/ML or LLM-based applications
Expert-level Python proficiency and experience with Pytest
Familiarity with AI evaluation frameworks: LangSmith, DeepEval, RAGAS, or Promptfoo
Experience with Playwright, Selenium, or Cypress for UI and strong API-level testing and database validation focus
Statistical mindset for scoring probabilistic AI outputs rather than binary pass/fail
Ability to work with SQL and JSON to validate data retrieved during RAG processes
Experience testing Multi-Agent Systems
Knowledge of Prompt Engineering and its impact on behavior
Background in Investment Banking or Fintech

View all jobs at Photon

View Photon Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: London

5,017 Employees

Year Founded: 2007

What We Do

Photon.com has emerged as one of the world’s largest and fastest-growing Digital Agencies. We work with 40% of the Fortune 100 on their Digital initiatives and are known for our ability to integrate Strategy Consulting, Creative Design, and Technology at scale. Please visit www.photon.com to learn more about us, how we work, and our customer case studies. Digital Transformation Starts Here.