Applied AI Researcher, Benchmarking

Reposted 22 Days Ago
Be an Early Applicant
2 Locations
In-Office
Mid level
Artificial Intelligence • Software
The Role
The Applied AI Researcher will design benchmark frameworks, conduct statistical evaluations, and leverage AI to redefine enterprise software usage and measure intelligent system performance.
Summary Generated by Built In
About Distyl AI

Distyl AI develops production-grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in-house software accelerators, and deep enterprise AI expertise, we deliver working AI systems with rapid time to value – within a quarter.

Our products have helped Fortune 500 customers across diverse industries, from insurance and CPG to non-profits. As part of our team, you will help companies identify, build, and realize value from their GenAI investments, often for the first time. We are customer-centric, working backward from the customer’s problem and holding ourselves accountable for creating both financial impact and improving the lives of end-users.

Distyl is led by proven leaders from top companies like Palantir and Apple and is backed by Lightspeed, Khosla, Coatue, Dell Technologies Capital, Nat Friedman (Former CEO of GitHub), Brad Gerstner (Founder and CEO of Altimeter), and board members of over a dozen Fortune 500 companies.

What We Are Looking For

At Distyl we’re pushing the envelope of AI utilization in enterprise. This requires creative researchers who don’t just want to drive incremental improvements on benchmarks or optimize an existing process but instead are looking to creatively redefine how software is used.

Our researchers come from many academic backgrounds but have strong research track records, operate in an AI-native way, and would be bored staying on the rails of a traditional research org.

Key Responsibilities
  • The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged.

  • Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

Who You Are
  • Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance

  • Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results

  • Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.)

  • Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done

  • Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow

  • Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI

  • Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize

What We Offer
  • The base salary range for this role is $130K – $250K, depending on experience, location, and level. In addition to base compensation, this role is eligible for meaningful equity, along with a comprehensive benefits package

  • 100% covered medical, dental, and vision for employees and dependents

  • 401(k) with additional perks (e.g., commuter benefits, in‑office lunch)

  • Access to state‑of‑the‑art models, generous usage of modern AI tools, and real‑world business problems

  • Ownership of high‑impact projects across top enterprises

  • A mission‑driven, fast‑moving culture that prizes curiosity, pragmatism, and excellence

Distyl has offices in San Francisco and New York. This role follows a hybrid collaboration model with 3+ days per week (Tuesday–Thursday) in‑office.

Top Skills

AI
Data Analysis
Programming
Statistical Analysis
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
45 Employees

What We Do

Distyl AI is on a mission to create the most customer-centric AI company that revolutionizes how enterprises thrive in the AI-assisted economy. We collaborate with leading institutions worldwide to enhance their AI readiness and build dependable, seamlessly integrated AI-driven solutions tailored to their distinct data, workflows, and employee requirements. Using our proprietary platform of in-house tools and alliances such as the one with OpenAI, our team diligently develops and deploys generative AI products that adhere to the highest standards of integrity and reliability, empowering the institutions that require them the most.

Similar Jobs

Samsara Logo Samsara

Architect

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
United States
4000 Employees
167K-281K Annually

Agero Logo Agero

Operational Excellence Manager

Automotive • Big Data • Insurance • Software • Transportation
Remote or Hybrid
USA
1600 Employees
100K-130K Annually

EliseAI Logo EliseAI

Engagement Lead, Future Platforms | Housing

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Real Estate
In-Office
New York City, NY, USA
400 Employees
160K-230K Annually

Rapid7 Logo Rapid7

Director, Product Management (SIEM)

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
United States
2400 Employees
206K-278K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account