RL Environments

Reposted 13 Days Ago
Be an Early Applicant
Mountain View, CA, USA
Hybrid
Mid level
Information Technology • Software
The Role
Develop systematic strategies for creating RL environments, analyze agent behavior and failures, validate and package benchmark environments for external use.
Summary Generated by Built In
About Bespoke Labs

Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents.

Recently, we curated Open Thoughts, one of the best open reasoning datasets used by multiple frontier labs, trained SOTA specialized models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, and taught agents to do multi-turn tool-calling with reinforcement learning.

Bespoke is uniquely positioned to capture a large market share of data and RL environment curation.

About The Role

We're looking for an RL Environment Research Engineer to accelerate how we create, evaluate, and benchmark training environments for AI agents. You'll develop systematic approaches to environment design, identify where agents fail, and turn those insights into high-quality training data and benchmarks.

This role combines research intuition with practical execution. You'll need to understand agent behavior deeply—spotting reward hacking, analyzing failure modes, and diagnosing why certain environments produce better training outcomes. Then you'll translate that understanding into repeatable processes and benchmark suites that we can showcase externally.

You're someone who enjoys both the detective work (analyzing agent rollouts, finding patterns in failures) and the building work (designing environments, creating evaluation pipelines). You can move between studying the science of what makes environments effective and actually producing those environments at scale.

What You'll Do
  1. Develop systematic strategies and recipes for creating high-quality RL environments that effectively train and evaluate agents.

  2. Study how LLMs and agents fail across different task types, identifying patterns that inform better environment design.

  3. Create benchmark environments that test specific agent capabilities, packaging them for external release on our evaluation platform.

  4. Verify environment quality through hands-on testing—training small-scale agents, checking for reward hacking, and analyzing training dynamics.

  5. Work with our environment creation pipeline to scale production of validated environments.

  6. Analyze agent rollout data to uncover insights about what makes environments challenging, diverse, and pedagogically valuable.

  7. Collaborate with the team to ensure benchmarks integrate smoothly into our external-facing dashboards.

  8. Establish quality standards and evaluation protocols that maintain high bars as we scale environment production.

What We're Looking For

Research and analytical skills:

  • Strong foundation in machine learning—either through a PhD/MS in ML, CS, or equivalent industry experience.

  • Deep curiosity about agent behavior and failure modes, with ability to form hypotheses and test them systematically.

  • Experience analyzing complex systems and extracting actionable insights from data.

  • Patience and attention to detail for studying agent rollouts and identifying subtle patterns.

Technical execution:

  • Proficiency in Python and ML frameworks (PyTorch, JAX, or similar).

  • Experience with RL concepts and agent training, even if not from a RL background.

  • Ability to design experiments, run training loops, and interpret results.

  • Comfortable working with cloud platforms (GCP, AWS) for running experiments at scale.

Practical engineering:

  • Can build pipelines and automation to scale research insights into production.

  • Experience with data analysis tools and creating reproducible workflows.

  • Systematic approach to quality verification and testing.

Nice to Have
  • Hands-on experience with reinforcement learning or agent training systems

  • Background in data curation, dataset creation, or evaluation benchmark design

  • Experience with AI safety, robustness testing, or adversarial evaluation

  • Publications or projects related to RL, agent evaluation, or data-centric AI

  • Understanding of how to design environments that surface specific failure modes

  • Experience shipping research artifacts (datasets, benchmarks, evaluation suites) to the community

Logistics

Location: Mountain View, CA.

Compensation: Competitive salary and equity based on experience and background

Benefits: Health coverage, flexible work arrangements, and the opportunity to shape how the AI community evaluates and trains agents

We encourage applications from candidates with diverse research backgrounds. If you're passionate about understanding agent behavior and creating systematic approaches to environment design, we'd love to hear from you.

Top Skills

AWS
GCP
Jax
Python
PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, California
13 Employees

What We Do

Bespoke Labs is a venture funded startup creating AI tools for data curation and post-training LLMs. (We are hiring!)

Similar Jobs

CSquared Labs Logo CSquared Labs

AI Research Engineer

Artificial Intelligence • Software • Business Intelligence • Generative AI • Big Data Analytics
Hybrid
2 Locations
7 Employees

Anduril Logo Anduril

Senior Program Manager

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
146K-194K Annually

Anduril Logo Anduril

Senior Program Manager

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
166K-220K Annually

Anduril Logo Anduril

Senior Supplier Quality Engineer, PCB/PCBA

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
146K-194K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account