AI Quality Engineer

Posted 13 Days Ago
Atlanta, GA, USA
In-Office
Mid level
Cloud • Software • Analytics
The Role
Design and implement evaluation frameworks for AI systems, build test pipelines, track quality metrics, and collaborate with teams for optimal performance.
Summary Generated by Built In
Job Description

Key Responsibilities

•       Design and implement evaluation frameworks (evals) to assess LLM and agentic AI system quality, including accuracy, consistency, safety, and task completion rates.

•       Build and maintain automated test pipelines for AI features, covering unit, integration, and end-to-end scenarios across agentic workflows.

•       Develop tooling to detect regressions in model behavior, prompt outputs, and agent decision-making across releases.

•       Define and track quality metrics for AI systems (e.g., hallucination rates, tool-use accuracy, latency, failure recovery) and surface findings clearly to stakeholders.

•       Collaborate with engineers and product managers to identify edge cases, adversarial inputs, and failure modes specific to multi-step agentic pipelines.

•       Contribute to prompt evaluation strategies, including red-teaming, adversarial testing, and bias/fairness assessments.

•       Participate in design and code reviews with a quality-focused lens, raising concerns about testability and reliability early.

•       Help define and document quality standards and best practices for AI/ML features across the team.

•       Other duties as assigned.

Qualifications

Required

•       Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.

•       3–5 years of professional software engineering or quality engineering experience.

•       Hands-on experience working with LLMs or agentic AI systems (e.g., GPT-4, Claude, Gemini, or open-source models).

•       Proficiency in Python for scripting, test automation, and data analysis.

•       Experience designing and running evaluations (evals) for generative AI or LLM-powered features.

•       Solid understanding of software testing principles: unit, integration, regression, and end-to-end testing.

•       Familiarity with agentic frameworks and concepts (e.g., tool use, multi-step reasoning, retrieval-augmented generation, memory).

•       Experience with CI/CD pipelines and integrating automated tests into development workflows.

•       Strong analytical skills — able to interpret probabilistic outputs and distinguish meaningful regressions from expected variance.

•       Strong written and verbal communication skills; ability to clearly document findings and present quality data to non-technical stakeholders.

•       Detail-oriented, with a structured approach to exploring edge cases and failure scenarios.

•       Ability to work in a fast-paced environment and manage multiple priorities effectively.

Nice to Have

•       Experience with prompt engineering and systematic prompt evaluation methodologies.

•       Familiarity with AI safety, alignment, or responsible AI concepts (e.g., hallucination mitigation, bias detection, guardrails).

•       Exposure to agentic orchestration frameworks (e.g., LangChain, LangGraph, AutoGen, CrewAI, or similar).

•       Experience with vector databases or RAG pipelines (e.g., Pinecone, Weaviate, pgvector).

•       Knowledge of observability and monitoring tools for AI systems (e.g., LangSmith, Weights & Biases, Arize).

•       Background in data science or ML experimentation practices.

•       Experience with version control systems (Git) and defect-tracking tools (e.g., Jira).

•       Exposure to cloud platforms (e.g., AWS, Azure, GCP) in the context of deploying or testing AI services.

What Success Looks Like

•       Builds robust eval frameworks that catch meaningful regressions in AI behavior before they reach production.

•       Reduces time-to-detection for quality issues in agentic workflows through effective automation and monitoring.

•       Contributes clear, actionable quality signals that help the team make confident release decisions.

Grows into a trusted voice on AI quality standards, influencing engineering practices across the team.

#LI-MH1 #momentivesoftware

About Us

Momentive Software amplifies the impact of over 20,000 purpose-driven organizations in over 30 countries, with over $11 billion raised and 55 million members served to date. Mission-driven nonprofits and associations rely on Momentive’s cloud-based software and services to address their most pressing challenges – from engaging their communities to simplifying operations and growing revenue. Designed to help organizations connect more, manage more, and ultimately expect more, Momentive's solutions are built with reliability at the core and strategically focus on fundraising, learning, events, careers, volunteering, accounting, and association management. Momentive partners with organizations that believe "good enough" is never enough – so they can bring on better outcomes for everyone they serve. Learn more at momentivesoftware.com.
 

Why Work Here?

At Momentive Software, we’re a team of passionate problem-solvers, innovators, and volunteers who believe in using technology to make a real difference. We dream big, support each other, and take pride in creating solutions that help our customers drive meaningful change. If you’re looking for a place where your work matters and your ideas are valued, you’ll find it here.

Medical, Dental & Vision Benefits

401(k) Savings Plan with Company Match

Flexible Planned Paid Time Off

Generous Sick Leave

Inclusive & Welcoming Environment

Purpose-Driven Culture

Work-Life Balance

Commitment to Community Involvement

Employer-Paid Parental Leave

Employer-Paid Short-Term Disability

Remote Work Flexibility
Momentive Software actively embraces diversity and equal opportunity in a meaningful way. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the better our work will be, which is why we do not discriminate based on race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.
All persons hired will be required to verify identity, minimum age of 18, eligibility to work in the United States (without sponsorship), and to complete the required employment eligibility verification form upon hire.

Skills Required

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience
  • 3-5 years of professional software engineering or quality engineering experience
  • Hands-on experience with LLMs or agentic AI systems
  • Proficiency in Python for scripting and test automation
  • Experience designing evaluations for generative AI or LLM-powered features
  • Understanding of software testing principles: unit, integration, regression, end-to-end testing
  • Familiarity with agentic frameworks and concepts
  • Experience with CI/CD pipelines
  • Strong analytical skills for interpreting probabilistic outputs
  • Strong communication skills for documenting findings
  • Detail-oriented with a structured approach
  • Ability to manage multiple priorities in a fast-paced environment
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Petersburg, Florida
820 Employees
Year Founded: 2017

What We Do

Momentive Software (formerly Community Brands) amplifies the impact of over 30,000 purpose-driven organizations globally. Mission–driven organizations and associations rely on the company’s cloud-based software and services to solve their most critical challenges: engage the people they serve, simplify operations, and grow revenue. Built with reliability at the core and strategically focused on events, careers, fundraising, financials, and operations, our solutions suite is bound by a common purpose to serve the organizations that make our communities a better place to live.

Similar Jobs

Zeta Global Logo Zeta Global

Quality Assurance Engineer

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
Easy Apply
Remote or Hybrid
United States
2429 Employees
150K-200K Annually

Deposco Logo Deposco

SDET - AI Quality Engineer

Cloud • Software • Analytics
Hybrid
Alpharetta, GA, USA
218 Employees
Hybrid
Snellville, GA, USA
205000 Employees

Wells Fargo Logo Wells Fargo

Consultant

Fintech • Financial Services
Hybrid
15 Locations
205000 Employees
191K-305K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Software
US
100 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account