Research Engineer, Agents

Posted 13 Days Ago
Be an Early Applicant
San Francisco, CA, USA
In-Office
200K-400K Annually
Mid level
Artificial Intelligence • Software
The Role
As a Research Engineer, you'll design and build systems for Decagon agents, focusing on runtime efficiency and model orchestration. Responsibilities include optimizing systems for performance and reliability, conducting experiments, and improving visibility in real-time systems.
Summary Generated by Built In

About Decagon

Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences.

Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel.

We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others.

We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team.

About the Team

The Agent Orchestration team builds the runtime and model orchestration layer that powers Decagon’s agents in production. This is the orchestration layer that turns workflows, tools and guardrails into a reliable, low-latency, and delightful experience for end users.

At the core of this work is the agent harness: the routing, execution logic, tool orchestration, and control-plane systems that determine how an agent behaves in a live conversation. The team owns the full execution lifecycle of each conversation—from selecting workflows and orchestrating multiple models (e.g., router/planner/supervisor patterns), to coordinating tool calls, enforcing safety constraints, and communicating back to the user.

The team operates across both real-time systems (e.g., voice interactions with strict latency requirements) and longer-horizon execution (supporting more complex reasoning and workflows). Our research shows that an agent’s task execution reliability increasingly depends on the orchestration layer that wraps around it.

This is highly experimental, frontier-style engineering. The team continuously analyzes real-world failures, builds feedback loops through offline evaluation and online experimentation, and iterates quickly to improve quality, reliability, and capability. As model capabilities evolve, the team regularly rethinks system design to push agent performance forward in production.

 

About the Role

As a Research Engineer on the Agent Orchestration team, you will design and build the systems that govern how Decagon agents operate in real-world environments.

You will own complex, distributed systems that sit at the heart of the agent runtime: execution frameworks, model orchestration logic, and experimentation platforms that ensure agents are fast, reliable, and continuously improving. Your work will directly impact how agents reason, take actions, and deliver outcomes across millions of interactions.

This role operates in a fast-moving, ambiguous space with tight feedback loops. You’ll move fluidly between diagnosing production issues, designing new system abstractions, and running experiments to improve agent behavior. You’ll collaborate closely with Research, Infra, and Product teams to ship improvements safely and at scale.

 

In this role, you will

  • Design and evolve agent harnesses that power different product experiences

  • Build core runtime systems, including AOP execution and multi-model orchestration

  • Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees

  • Optimize agent systems for latency, reliability, and production correctness

  • Analyze real-world failures and use data to drive iterative improvements

  • Build and operate online experimentation (A/B testing) and contribute to offline evaluation frameworks

  • Improve observability, testing, and simulation systems to ensure safe, measurable progress

  • Contribute to voice and real-time systems (e.g., transcription pipelines, turn-taking, latency improvements)

  • Continuously adapt orchestration systems as model capabilities evolve

 

Your background looks something like this

  • Strong experience building distributed systems or backend platforms in production environments

  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles

  • Experience owning systems end-to-end, from design through production and iteration

  • Familiarity with experimentation, evaluation, or data-driven product improvement loops

  • A track record of improving system reliability, performance, and observability

  • Ability to debug complex systems and identify root causes of failures

 

Even better

  • You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks

  • You think in terms of control planes, feedback loops, and system-level optimization, not just features

  • You’re excited about diagnosing failure modes and iterating toward measurable improvements

  • You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable

  • You’re motivated by pushing the frontier of how intelligent systems behave in the real world

     

Compensation

$200K – $400K + Offers Equity

 

This range reflects the expected compensation for this role. Compensation within the range is determined based on experience, skills, and the scope of responsibilities, with flexibility for candidates who demonstrate exceptional impact.

 

In addition to base salary, we offer competitive equity. Final compensation may vary based on location within the United States.

Benefits

We proudly offer the following benefits for our full-time employees:

  • Take what you need vacation policy (subject to local requirements; UK employees receive 25 days of statutory leave)

  • Medical, Dental, and Vision benefits for you and your family

  • Life Insurance and Disability Benefits

  • Retirement Plan (e.g., 401K, pension)

  • Parental Leave

  • Fertility and family building benefits through Carrot

  • Daily lunches and snacks in the office to keep you at your best

These benefits are described in more detail in Decagon’s policies, may vary by location, and can change at any time according to applicable compensation and benefits plans.

Skills Required

  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures
  • You've built or worked on agent harnesses, orchestration layers, or execution frameworks
  • You think in terms of control planes, feedback loops, and system-level optimization
  • You're excited about diagnosing failure modes and iterating toward measurable improvements
  • You care deeply about production quality, making systems reliable, safe, and scalable
  • You're motivated by pushing the frontier of how intelligent systems behave in the real world
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
49 Employees

What We Do

Trusted by world-class companies, Decagon is the most advanced AI platform for customer support.

Similar Jobs

NVIDIA Logo NVIDIA

Developer Relations Manager, Higher Education and Research - AI Agents

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office or Remote
Santa Clara, CA, USA
21960 Employees
152K-288K Annually

Scale AI Logo Scale AI

Machine Learning Research Engineer, Agents - Enterprise GenAI

Artificial Intelligence • Big Data • Machine Learning
In-Office
2 Locations
523 Employees
250K-350K Annually

Labelbox Logo Labelbox

Applied Research Engineer, Agents

Artificial Intelligence • Information Technology • Machine Learning
In-Office or Remote
7 Locations
115 Employees
250K-300K Annually

Similar Companies Hiring

Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York City, NY
100 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account