Boomi

Software Principal Engineer - AI Quality

Posted 2 Hours Ago

Be an Early Applicant

Hiring Remotely in India

Remote

Expert/Leader

Cloud • Information Technology • Productivity • Software • Automation

The leader in AI-driven automation and integration.

The Role

As a Principal AI Quality Lead, you will establish automated evaluation frameworks and quality standards for AI systems, ensuring reliability and safety. You'll design pipelines, metrics, and methods for continuous evaluation and collaborate with engineering teams to embed quality practices in the development lifecycle.

Summary Generated by Built In

About Boomi and What Makes Us Special

Are you ready to work at a fast-growing company where you can make a difference? Boomi aims to make the world a better place by connecting everyone to everything, anywhere. Our award-winning, intelligent integration and automation platform helps organizations power the future of business. At Boomi, you’ll work with world-class people and industry-leading technology. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact, and want to be part of building something big. If this sounds like a good fit for you, check out boomi.com or visit our Boomi Careers page to learn more.

How You'll Make An Impact

As a Principal AI Quality Lead, you will define and drive the quality engineering strategy for our production Generative AI and Agentic systems. You will establish automated evaluation frameworks, quality standards, and testing infrastructure that ensure our AI agents operate reliably, safely, and efficiently at scale. This is a high-impact technical leadership role where you'll build the foundation for trustworthy AI deployment, bridging AI/ML engineering expertise with quality engineering discipline. You'll architect the systems and practices that transform our approach from manual spot-checking to continuous, automated evaluation of complex agentic workflows.

What You Will Do

Quality Infrastructure & Automation

Architect and build automated evaluation frameworks for agentic workflows that assess behavior across effectiveness, efficiency, robustness, and safety dimensions.
Design continuous evaluation pipelines that automatically test multi-step reasoning, tool selection patterns, error handling, and behavioral regressions across diverse scenarios.
Establish observability requirements for AI agents including structured logging, trajectory tracing, and metrics collection for reasoning steps, tool calls, and execution paths.
Build regression detection systems that identify quality degradation when prompts, models, tools, or system components change.
Create synthetic test data generation pipelines and curated evaluation datasets that cover edge cases, adversarial scenarios, and real-world variability.
Define comprehensive quality standards and evaluation methodologies specifically designed for agentic AI systems and LLM-based applications.
Establish key quality metrics, SLIs, and SLOs for agent behavior including task completion rates, reasoning efficiency, cost per resolution, and safety compliance.
Create quality gates and acceptance criteria that balance speed-to-production with reliability requirements.
Develop responsible AI testing practices including bias detection, fairness evaluation, safety guardrails validation, and alignment verification.
Build tooling and frameworks that enable both automated evaluation at scale and targeted diagnostic testing for failure investigation.
Establish benchmark suites and golden datasets for continuous quality assessment across agent capabilities.
Architect evaluation approaches for complex AI behaviors including chain-of-thought reasoning, tool orchestration, multi-turn conversations, and context management.
Establish model evaluation practices including prompt testing, output validation, semantic correctness assessment, and hallucination detection.
Partner closely with AI Engineering, Platform, and Product teams to embed quality practices into the development lifecycle from design through deployment.
Serve as the technical authority on AI quality, influencing architectural decisions and advocating for quality as a foundational pillar.
Collaborate with Data Science and ML Engineering teams to align evaluation methodologies with model development practices.
Communicate quality insights, risk assessments, and recommendations clearly to technical and non-technical stakeholders.
Build cross-functional alignment on quality standards, evaluation criteria, and production readiness requirements.
Mentor and develop AI quality engineers, elevating team capabilities in evaluation frameworks, AI/ML concepts, and automation practices.
Foster a culture of quality-first thinking, continuous improvement, and data-driven decision making.
Build the organizational capability for AI quality as a core competency, not a reactive testing phase.

The Experience You Bring

7+ years of experience in AI/ML engineering, data science, or ML quality/evaluation roles with deep technical expertise in model development and evaluation.
Experience in LLM evaluation, Generative AI quality, or agentic system testing.
Strong understanding of transformer architectures, prompt engineering, retrieval-augmented generation (RAG), and agentic frameworks (ReAct, chain-of-thought, tool use patterns).
Deep knowledge of LLM failure modes including hallucinations, context limitations, prompt sensitivity, reasoning errors, and tool misuse.
Hands-on experience with major LLM platforms (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI) and their evaluation capabilities.
Proven track record building automated evaluation systems for AI/ML models or agentic workflows at scale.
Strong experience with ML evaluation frameworks and tools (MLflow, Weights & Biases, LangSmith, custom evaluation pipelines).
Expertise in designing evaluation metrics for non-deterministic systems beyond simple accuracy measures.
Experience with A/B testing, experimentation frameworks, and statistical analysis for model comparison.
Background in observability, instrumentation, and monitoring for production AI systems.
Advanced programming skills in Python with experience building production-quality evaluation frameworks and automation tooling.
Strong understanding of software architecture, APIs, distributed systems, and data pipelines.
Experience with test automation frameworks (pytest, unittest) and CI/CD integration.
Familiarity with infrastructure as code, containerization, and cloud platforms (AWS, Azure, GCP).
Ability to write clean, maintainable, well-documented code and technical specifications.
Strong analytical skills with ability to design statistically sound evaluation methodologies.
Experience working with large-scale datasets, data quality assessment, and synthetic data generation.
Understanding of experimental design, hypothesis testing, and confidence intervals for evaluation results.
Ability to translate business requirements into measurable quality metrics and acceptance criteria.

Communication & Collaboration

Excellent written and verbal communication skills with ability to explain complex technical concepts clearly.
Experience presenting technical recommendations and quality assessments to senior leadership.
Proven ability to build consensus and drive adoption of new practices across engineering organizations.
Strong documentation skills for creating standards, runbooks, evaluation reports, and architectural specifications.

Learning & Innovation Mindset

Self-directed learner who stays current with rapidly evolving AI quality practices and industry research.
Comfortable operating in ambiguity and building new capabilities from the ground up.

Education & Experience

Master's in Computer Science, Machine Learning, Data Science, Statistics, or related field (or equivalent experience).
7+ years of professional experience in AI/ML & Software engineering, data science, ML operations, or ML quality roles.
Hands-on experience with LLM evaluation, Generative AI testing, or agentic system quality.
Demonstrated experience building automated evaluation frameworks and quality infrastructure for production AI systems.

What Sets You Apart

Published research or contributions in AI evaluation, LLM quality, agent benchmarking, or responsible AI.
Experience building AI quality practices from the ground up in production environments.
Deep expertise in agentic AI architectures including multi-agent systems, tool use, and autonomous decision-making.
Background in both ML engineering/research and quality engineering/evaluation roles.
Contributions to open-source AI evaluation frameworks or benchmarking tools.
Hands-on experience fine-tuning or developing LLMs/SLMs with corresponding evaluation methodologies.
Domain expertise in specific agentic AI applications (customer support, process automation, code generation, etc.).

Be Bold. Be You. Be Boomi. We take pride in our culture and core values and are committed to being a place where everyone can be their true, authentic self. Our team members are our most valuable resources, and we look for and encourage diversity in backgrounds, thoughts, life experiences, knowledge, and capabilities.

All employment decisions are based on business needs, job requirements, and individual qualifications.

Boomi strives to create an inclusive and accessible environment for candidates and employees. If you need accommodation during the application or interview process, please submit a request to [email protected]. This inbox is strictly for accommodations, please do not send resumes or general inquiries.

Top Skills

AWS

Azure

GCP

Langsmith

Mlflow

Pytest

Python

Unittest

Weights & Biases

What the Team is Saying

View all jobs at Boomi

View Boomi Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Conshohocken, PA

2,200 Employees

Year Founded: 2000

What We Do

Boomi, the leader in AI-driven automation, enables organizations worldwide to connect everything, automate processes, and accelerate outcomes. The Boomi Enterprise Platform — including Boomi Agentstudio — unifies integration and automation along with data, API, and AI agent management, in a single, comprehensive solution, helping organizations radically simplify the complexity of enterprise software. Trusted by over 25,000 customers, with a user community of 250,000+ users, and supported by a network of 800+ partners, Boomi is driving agentic transformation — helping enterprises of all sizes achieve agility, efficiency, and innovation at scale.

Why Work With Us

Boomi boasts an award-winning work culture with an emphais on being transparent, innovative, accountable, true to our authentic selves, and winning together as One Boomi. As we grow rapidly, invest in talent, and cultivate careers, much opportunity exists for professional growth and participation in our vibrant culture and employee resource groups.