Scaled Cognition

QA Engineer

Reposted 7 Days Ago

Country States, Pájaros Barrio, Bayamón

In-Office

Mid level

Artificial Intelligence • Information Technology

The Role

As a QA Manager, you will develop scalable QA plans for evaluating AI agents, mentor a QA team, and collaborate with engineering to enhance AI models.

Summary Generated by Built In

Scaled Cognition is the world’s only model lab dedicated exclusively to customer experience and pioneering agentic models purpose-built for reliable action-taking enterprise applications. Backed by Khosla Ventures, the company’s flagship Agentic Pretrained Transformer (APT) eliminates hallucinations, enforces enterprise policies and increases reliability in real-world CX workflows. Founded by serial AI entrepreneurs, former Microsoft Corporate Vice President of Conversational AI Dan Roth, and UC Berkeley AI Professor Dan Klein, and built by a team of world-class PhD researchers and engineers, Scaled Cognition advances the science of agentic AI to deliver safe, policy-aligned automation that enterprises can trust.

As an QA Manager at Scaled Cognition you will:

Develop and implement scalable QA plans for evaluating AI agents, defining key performance metrics to measure progress over time.
Collaborate with product and engineering teams to document findings, test fixes, and recommend improvements to the underlying models and conversational flows.
Lead and mentor a team of QA engineers, establishing best practices and processes for testing conversational AI agents.

Example projects could include:

Building test sets to track regressions, agent robustness, and end-to-end testing.
Reviewing and analyzing voice and chat transcripts, and quickly identify conversational gaps and provide data for faster iteration on customer deployments.
Designing and automating testing pipelines to scale QA capacity across a diverse portfolio of customers and to continuously evaluate the performance of our AI agents.

Preferred Qualifications:

Intermediate-level proficiency in Python and experience building and testing conversational AI/LLM systems.
Background in implementing evaluation benchmarks, and production monitoring metrics.
Experience working with libraries and tooling common in the AI/LLM ecosystem.
Demonstrated precision in documenting test plans, test cases, and bug reports, ensuring data is accurate and easily understandable by cross-functional teams.
Experience with leveraging AI-powered assistants/tooling to enable rapid iteration, prototyping, and accelerated delivery.

Top Skills

Python

View all jobs at Scaled Cognition

View Scaled Cognition Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

21 Employees

Year Founded: 2023

What We Do

Scaled Cognition is developing a new generation of rational, controllable AI models deployable as domain experts for grounded, real-world applications.