Senior Software Engineer, AI Evals

Reposted 8 Days Ago
Be an Early Applicant
San Francisco, CA
In-Office
240K-280K Annually
Senior level
Cloud • Information Technology • Software • Analytics • Business Intelligence
Sentry's platform helps every developer diagnose, fix, and optimize the performance of their code.
The Role
The Senior Software Engineer will build evaluation infrastructure for AI systems, ensuring reliability and accuracy. Responsibilities include designing datasets, benchmarks, and test harnesses for AI behavior assessment.
Summary Generated by Built In
About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster so we can get back to enjoying technology.

With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products.

Sentry embraces a hybrid work model across our global hubs, with Mondays, Tuesdays, and Thursdays set as in-office anchor days to encourage meaningful collaboration. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About the role

As a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for building the evaluation infrastructure that measures the accuracy, reliability, and real-world performance of our AI systems. This role is critical to ensuring that our debugging agents and AI-powered features behave correctly, safely, and predictably as they scale. You’ll design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals, helping the team ship AI with confidence.

In this role you will
  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems

  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data

  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows

  • Partner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria

  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

You’ll love this job if you
  • Care deeply about correctness, rigor, and measurement in AI systems

  • Enjoy turning fuzzy product goals and model behavior into concrete tests and metrics

  • Like building foundational infrastructure that unlocks faster iteration and higher confidence for the entire AI team

  • Thrive in cross-functional environments and enjoy influencing model design through better evaluation

Qualifications
  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field

  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)

  • Comfort writing production-quality code (we use Python and TypeScript)

  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines

  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)

  • Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools

The base salary range (or hourly wage range, if applicable) that Sentry reasonably expects to pay for this position is $240,000 to $280,000 USD. A successful candidate’s actual base salary (or hourly wage) amount will be determined by a variety of relevant factors including, without limitation, the candidate’s work location, education, work and other relevant experience, skills, and job-related knowledge. A successful candidate will be eligible to participate in Sentry’s employee benefit plans/programs applicable to the candidate’s position (including incentive compensation, equity grants, paid time off, and group health insurance coverage). See Sentry Benefits for more details about the Company’s benefit plans/programs.

Equal Opportunity at Sentry

Sentry is committed to providing equal employment opportunities to its employees and candidates for employment regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, or other legally-protected characteristic. This commitment includes the provision of reasonable accommodations to employees and candidates for employment with physical or mental disabilities who require such accommodations in order to (a) perform the essential functions of their jobs, or (b) seek employment with Sentry. We strive to build a diverse team, with an inclusive culture where every teammate can thrive. Sentry is an open-source company because we believe that everyone, everywhere, should have the ability and tools to make great software. Software should be accessible. That starts with making our industry accessible.

If you need assistance or an accommodation due to a disability, you may contact us at [email protected].

Want to learn more about how Sentry handles applicant data? Get the details in our Applicant Privacy Policy.

Top Skills

AI
Ml
Python
Typescript
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
241 Employees
Year Founded: 2012

What We Do

Sentry's platform helps every developer diagnose, fix, and optimize the performance of their code. With Sentry, software teams can easily trace issues related to errors, performance problems, and trends in code quality. Sentry supports native, mobile, web, IoT frameworks, and more than 30 languages.

Gallery

Gallery

Similar Jobs

Commure Logo Commure

Senior Software Engineer

Information Technology • Software
In-Office
Mountain View, CA, USA
159 Employees
140K-230K Annually

Airwallex Logo Airwallex

Content Marketing Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
2000 Employees

Motorola Solutions Logo Motorola Solutions

Contracts Coordinator

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Hybrid
Los Angeles, CA, USA
23000 Employees
34K-36K Hourly

General Motors Logo General Motors

Program Manager

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees
140K-187K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account