Applied Reinforcement Learning Engineer

Reposted 2 Days Ago
Be an Early Applicant
Hiring Remotely in Redmond, WA, USA
In-Office or Remote
150K-300K Annually
Mid level
Artificial Intelligence
The Role
Design and build reinforcement learning environments for enterprise workflows, post-train LLM-based agents, and create scalable training pipelines.
Summary Generated by Built In

About Centific

Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem—comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets—to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.

Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.

About Job

Role: Applied Reinforcement Learning Engineer

Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)

About the Team

Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. Our mission is to transform data, signals, and human insight into next-generation intelligent systems that redefine enterprise intelligence.

We're building a governed RL environment platform that enables enterprises to safely iterate and improve AI agent workflows through simulation-based learning, bridging human-labeled signal creation with automated RL training for high-stakes operations.

Role Overview

As an Applied RL Engineer, you will design and build RL environments that simulate complex enterprise workflows and train intelligent agents within them. You'll work at the intersection of RL research and production systems, translating customer requirements into bespoke simulation environments and post-training pipelines that deliver measurable improvements to AI agent performance.

This role requires deep expertise in both classical RL methodologies and modern LLM-based agent architectures. You'll shape our product direction and help make RL accessible to enterprise customers who need safe, compliant ways to improve their AI systems.

Core RL Competencies

Foundational RL

• MDPs & value methods: State/action spaces, Q-learning, DQN, Double DQN, Dueling DQN

• Policy gradient methods: REINFORCE, Actor-Critic, A2C/A3C, variance reduction

• Advanced optimization: PPO, TRPO, SAC, trust regions, entropy regularization

• TD learning: TD(0), TD(λ), eligibility traces, bootstrapping methods

LLM Alignment & Post-Training

• RLHF pipelines: Reward model training, preference learning, human feedback integration

• Direct optimization: DPO, IPO, KTO, offline preference optimization

• Group-based methods: GRPO, RLOO, sample-efficient policy improvement

• Reward modeling: Bradley-Terry models, reward hacking mitigation, KL constraints

Environment Design

• Gymnasium/OpenAI Gym: Custom environments, observation/action spaces, wrapper patterns

• Reward engineering: Sparse vs. dense rewards, potential-based shaping, intrinsic motivation

• Verifier design: Programmatic reward functions, outcome verification, ground-truth evaluation

• Simulation: Sim-to-real transfer, domain randomization, multi-agent dynamics

Advanced Techniques

• Offline RL: CQL, BCQ, IQL for learning from fixed datasets without environment interaction

• Model-based RL: World models, Dreamer, MuZero, learned dynamics

• Hierarchical RL: Options framework, goal-conditioned policies, temporal abstraction

• Imitation & exploration: Behavioral cloning, GAIL, curiosity-driven exploration, UCB

Key Responsibilities

• Design and build custom RL environments (digital twins) simulating enterprise workflows: document processing, compliance, onboarding, support automation

• Post-train LLM-based agents on domain-specific tasks using PPO, GRPO, DPO, and RLHF

• Build end-to-end pipelines converting human-labeled traces into RL training data

• Architect multi-step reasoning agents with tool-calling and closed learning loops

• Design reward functions, verifiers, and validation frameworks for pre-deployment testing

• Translate cutting-edge RL research into production systems; contribute to publications

Required Qualifications

• Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimization

• LLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similar

• Production skills: Software engineering beyond research with scalable pipelines and training infrastructure

• Agentic AI: Experience with LLM-based agents, tool use, multi-step reasoning

• Technical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlow

• Education: MS/PhD in CS, ML, or related field (or equivalent experience)

Preferred Qualifications

• Publications at NeurIPS, ICML, ICLR, ACL, or similar venues

• Enterprise workflow experience in healthcare, finance, logistics, or compliance

• Open-source contributions to CleanRL, TRL, veRL, or agent frameworks

• Experience with world models, synthetic data generation, and simulation

• Distributed training and large-scale RL experimentation

Why Join Centific

• Lead the frontier: Shape a new discipline at the intersection of RL, simulation, and enterprise AI

• Ship your science: See your research power real systems across healthcare, finance, and safety

• Collaborate with leaders: Work alongside NVIDIA, Microsoft, and the global AI community

• Build what matters: Create governed, compliant AI systems enterprises can trust.

Salary: $150K - $300K Annually

Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.

Skills Required

  • Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimization
  • LLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similar
  • Production skills: Software engineering beyond research with scalable pipelines and training infrastructure
  • Agentic AI: Experience with LLM-based agents, tool use, multi-step reasoning
  • Technical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlow
  • Education: MS/PhD in CS, ML, or related field (or equivalent experience)

Centific Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Centific and has not been reviewed or approved by Centific.

  • Healthcare Strength Feedback suggests core medical, dental, and vision coverage is comprehensive, with mental‑health support included. U.S. offerings also include HSA/FSA and are described as solid insurance options.
  • Flexible Benefits Feedback suggests flexible hours and work‑from‑anywhere options are widely promoted, with flexible PTO available in some contexts. Learning stipends and development programs add adaptable elements to the package.
  • Parental & Family Support Feedback suggests paid parental leave is explicitly available to all parents. This positions family leave as an accessible component of the overall package.

Centific Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Redmond, WA
2,900 Employees

What We Do

Zero distance innovation for GenAI creators and industries Expertly engineering platforms and curating multimodal, multilingual data, we empower the ‘Magnificent Seven’ and enterprise clients with safe, scalable AI deployment We a team of over 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We bring platforms, partners and 1.8 million vertical domain experts to create high-quality pre-trained datasets, fine-tuned industry-specific LLMs, and RAG pipelines supported by vector databases. These innovations can reduce GenAI costs by up to 80% and bring GenAI solutions to market 50% faster in 230 locales.

Similar Jobs

BuildOps Logo BuildOps

Sales Manager

Cloud • Mobile • Software
Easy Apply
Remote or Hybrid
United States
500 Employees
250K-270K Annually

Zeta Global Logo Zeta Global

Senior Associate, Data Cloud Applications

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
Easy Apply
Remote or Hybrid
United States
2429 Employees
70K-80K Annually

Apollo.io Logo Apollo.io

Application Security Engineer

Artificial Intelligence • Enterprise Web • Information Technology • Productivity • Sales • Software • Database
Easy Apply
Remote
2 Locations
850 Employees
218K-273K Annually

Affirm Logo Affirm

Senior Manager, Accounting Policy & Reporting

Big Data • Fintech • Mobile • Payments • Financial Services
Easy Apply
Remote
United States
2200 Employees
164K-245K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account