Forward Deployed Engineer, RL Environments

Posted Yesterday
Easy Apply
Be an Early Applicant
7 Locations
In-Office or Remote
140K-200K Annually
Mid level
Artificial Intelligence • Information Technology • Machine Learning
Our mission is to build the best products to align with artificial intelligence.
The Role
The role involves designing and maintaining reinforcement learning environments, developing containerized execution environments, managing integrations, and collaborating with data operations for AI training.
Summary Generated by Built In
Shape the Future of AI

At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially.

About Labelbox

We're the only company offering three integrated solutions for frontier AI development:

  1. Enterprise Platform & Tools: Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale
  2. Frontier Data Labeling Service: Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models
  3. Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling
Why Join Us
  • High-Impact Environment: We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions.
  • Technical Excellence: Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence.
  • Innovation at Speed: We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution.
  • Continuous Growth: Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI.
  • Clear Ownership: You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics.

The Role

We’re hiring a Forward Deployed Engineer to own the design, development, and operationalization of reinforcement learning environments. You’ll build the sandboxed, reproducible execution environments that AI agents interact with during training and evaluation—things like terminal-based task benchmarks, browser and computer-use environments, and tool-augmented agentic workspaces.

This is a hands-on engineering role. You’ll write production-quality infrastructure code, integrate with open-source RL tooling, and work closely with our data operations team to ensure environments are robust, observable, and ready for human annotators and model agents alike. You won’t be doing ML research, but you’ll need to deeply understand how RL training loops consume environments and where the bottlenecks live.

What You’ll Do

  • Design, build, and maintain sandboxed RL environments for agentic AI training—including terminal emulators, browser automation harnesses, computer-use simulators, and tool-augmented workspaces (e.g., environments built on frameworks like TerminalBench, OSWorld, and Tau-bench)
  • Develop reproducible, containerized execution environments (Docker, VMs, lightweight sandboxes) that support deterministic task rollouts and reward signal collection
  • Integrate with and extend open-source agentic tooling and custom CLI/API harnesses to enable multi-step agent interaction
  • Build instrumentation and observability layers—structured logging, trajectory capture, state snapshotting—so training runs and human annotation sessions produce clean, auditable data
  • Collaborate with data operations to design task curricula and evaluation protocols that stress-test model capabilities across environment types
  • Own environment deployment and reliability: CI/CD pipelines, automated testing of environment configurations, and monitoring for drift or breakage across versions
  • Rapidly prototype new environment types as client and internal requirements evolve, moving from spec to working system in days, not weeks

What We’re Looking For

Required

  • 2+ years of professional software engineering experience, with strong fundamentals in Python and at least one systems-level language (Go, Rust, C++)
  • Demonstrated experience with containerization and sandboxing (Docker, Podman, Firecracker, or similar) in production or near-production contexts
  • Familiarity with RL concepts: MDPs, reward shaping, episode structure, observation/action spaces. You don’t need to have trained models, but you need to understand what an environment must provide to an RL training loop
  • Experience building or maintaining developer tooling, CLI tools, or infrastructure automation
  • Comfort working with browser automation frameworks or terminal interaction tooling
  • Strong debugging instincts—you can trace failures across process boundaries, container layers, and network calls
  • Ability to read and implement from academic papers and open-source benchmark repositories without extensive hand-holding

Preferred

  • Direct experience building or contributing to RL environments (Gymnasium/Gym, PettingZoo, or custom environment implementations)
  • Experience with agentic AI evaluation frameworks (SWE-bench, WebArena, OSWorld, TerminalBench, or similar)
  • Familiarity with GCP or AWS infrastructure (Compute Engine, ECS/EKS, Cloud Build)
  • Prior work at an AI data company, ML platform company, or AI research lab
  • Contributions to open-source projects in the RL, agents, or dev-tools space

Candidate Archetype

The ideal candidate is a strong software engineer first, with genuine curiosity and working knowledge of reinforcement learning. You’ve probably built infrastructure or developer tooling at a startup or mid-stage company, and you’ve been pulled toward the ML/AI space—maybe through side projects, open-source contributions, or a prior role adjacent to an ML team. You’re the kind of engineer who reads an RL benchmark paper and immediately thinks about how to make the environment more robust, not how to improve the policy gradient.

You thrive in ambiguity. You can take a loosely defined project requirement—“build an environment that tests an agent’s ability to navigate a file system and execute multi-step bash workflows”—and deliver a working, tested, documented system without needing a detailed spec. You move fast, but you care about reliability because you know environments that break silently poison training data.

Why This Role Matters

  • RL environment quality is one of the biggest bottlenecks in agentic AI training today. Environments that are brittle, non-deterministic, or poorly instrumented produce noisy reward signals that directly degrade model performance. You’ll be solving one of the highest-leverage infrastructure problems in AI.
  • You’ll work across a portfolio of projects spanning different AI labs and model capabilities—no single-product monotony. The environment types you build will evolve as the frontier of agent capabilities moves.
  • Alignerr is a small, high-impact team inside a well-funded company (Labelbox). You’ll have startup-level ownership with growth-stage resources.
Alignerr Services at Labelbox

Alignerr is Labelbox’s human data organization, purpose-built to generate the high-quality training data that powers the next generation of AI models. We partner directly with leading AI labs to produce reinforcement learning environments, evaluation benchmarks, and expert-annotated datasets that push model capabilities forward. Our team sits at the intersection of software engineering, ML infrastructure, and human-in-the-loop data production.

Labelbox strives to ensure pay parity across the organization and discuss compensation transparently.  The expected annual base salary range for United States-based candidates is below. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors, including skills and competencies, experience, and geographical location.

Annual base salary range
$140,000$200,000 USD
Life at Labelbox
  • Location: Join our dedicated tech hubs in San Francisco or Wrocław, Poland
  • Work Style: Hybrid model with 2 days per week in office, combining collaboration and flexibility
  • Environment: Fast-paced and high-intensity, perfect for ambitious individuals who thrive on ownership and quick decision-making
  • Growth: Career advancement opportunities directly tied to your impact
  • Vision: Be part of building the foundation for humanity's most transformative technology
Our Vision

We believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated, the need for high-quality, specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs.

Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs.

Your Personal Data Privacy: Any personal information you provide Labelbox as a part of your application will be processed in accordance with Labelbox’s Job Applicant Privacy notice.

Any emails from Labelbox team members will originate from a @labelbox.com email address. If you encounter anything that raises suspicions during your interactions, we encourage you to exercise caution and suspend or discontinue communications.

Top Skills

AWS
C++
Docker
GCP
Go
Gymnasium
Pettingzoo
Python
Rust
Vm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
115 Employees
Year Founded: 2017

What We Do

Labelbox is the data factory for generative AI, providing the highest quality training data for frontier and task-specific models. Labelbox’s comprehensive platform combines on-demand labeling services with the industry-leading data labeling platform. The Boost labeling service is powered by the Alignerr community of highly-educated experts, who span all major languages and a diverse range of advanced subjects. They are available on-demand to rapidly generate new data for supervised fine-tuning, RLHF, and more. Labelbox’s software-first approach delivers unmatched control and transparency into the labeling process, leading to the generation of high-quality, consistent data at scale. Customers include Fortune 500 enterprises and leading AI labs, and Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins.

Why Work With Us

Labelboxers are driven to master their craft and build the best products for AI, which is quickly becoming one of the most significant technologies of our time. Join us in supporting our customers as they create AI breakthroughs.

Gallery

Gallery

Similar Jobs

Optum Logo Optum

Systems Administrator

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Vancouver, BC, CAN
160000 Employees
25-53 Hourly

Coinbase Logo Coinbase

Senior Software Engineer

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
Canada
4700 Employees
191K-191K Annually

Circle (circle.so) Logo Circle (circle.so)

People Partner

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
90K-120K Annually

Zapier Logo Zapier

Solutions Architect

Artificial Intelligence • Productivity • Software • Automation
Remote
2 Locations
800 Employees
213K-300K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account