hud

Research Engineer (General)

Reposted 8 Days Ago

2 Locations

Hybrid

Entry level

Artificial Intelligence • Information Technology • Software

The Role

Research Engineers will build QA systems for training data, define quality standards, audit datasets, and collaborate with data vendors to enhance data generation processes.

Summary Generated by Built In

About HUD

HUD is building infrastructure to create RL training data and evals for frontier AI agents, as well as a marketplace to sell these to frontier labs through the HUD marketplace. Our platform is used by frontier labs, Fortune 500 companies, and startups. We’ve raised $16M from top VCs and were YC W25.

About the role

This is a general application for candidates who are unsure which research focus - QC Automation, Benchmarks, or Synthetic Data - they would be a fit for. We would love to meet you and figure it out together. However, if you already have a focus in mind, please apply to only that application.

We're looking for Research Engineers to build the technical foundation for training and evaluating frontier AI agents. You’ll build the systems for creating new environments, improve data quality, and translate real-world workflows into tasks and benchmarks.

Responsibilities

Build systems for creating, running, evaluating, and improving agent training environments
Design experiments to understand model behavior, agent failure modes, and data quality issues
Develop tools that help researchers, engineers, and data vendors create higher-quality tasks, trajectories, and feedback loops
Work across the full lifecycle of agent training data - task design, environment setup, trajectory collection, evaluation, and validation
Partner with external vendors to identify bottlenecks and improve the quality and throughput of HUD’s data engine
Build metrics and analyses that help us understand whether our tasks, environments, and evals are actually useful for training frontier agents

Experience

You may be a good fit if you have:

Proficiency in Python, Docker, and Linux environments
Experience working on benchmarks and evals - you can reason about what makes a task realistic, a rubric reliable, an environment usable, and a trajectory useful for RL training
Strong attention to detail and the ability to spot subtle inconsistencies in data, model behavior, or task design
Experience building tools, pipelines, experiments, or infrastructure without a fully prescribed roadmap
Early-stage startup experience with ability to work independently in fast-paced environments

Strong candidates may also have:

Experience building internal tools, research infrastructure, or data pipelines
Experience designing metrics and validation workflows
A background in competitive programming, Olympiad medaling, research, or unusually strong independent project experience
Thrive in unstructured problem spaces
Strong communication skills for remote collaboration across time zones

We prioritize technical aptitude and learning potential over years of experience. Motivated candidates are encouraged to apply even if they don't meet all criteria.

Team & company details

Team Size: ~15 people currently, mostly full-time in-person, but some remote.
Our team: Our team includes 4 International Olympiad medalists (IOI, ILO, IPhO), serial AI startup founders, and researchers with publications at ICLR, NeurIPS, etc.
Company stage: We have 8 figures in funding and high revenue growth. We’re scaling profitably and quickly to meet very strong demand.

Logistics

Employment: Full-time.
Location: On-site in the San Francisco Bay Area.
Visa Sponsorship: We provide support for relocation and visas for strong full-time candidates to the US.
Timeline: Applications are rolling. The process is 2 technical interviews and a 2-3 day work trial.

What we offer

Competitive compensation based on experience and location
100% covered top-of-the-line medical, dental, and vision from Blue Shield of CA
Lunch and dinner when you’re in the office
Company-wide holiday break (Christmas Eve to New Year’s Day) on top of PTO and paid holidays
Other perks including an Equinox membership, 401k, and commuter benefits
Unlimited* access to tokens for ChatGPT, Claude Code, Cursor, etc. *By unlimited, we mean no one on our token usage leaderboard has ever hit a limit. So we have no idea what the limit is.

Due to high volume, we may not actively respond to every application, but feel free to contact us at [email protected] or elsewhere if we missed your application!