Staff / Principal Machine Learning Engineer, Serving - USA

Reposted 23 Days Ago
Be an Early Applicant
Mountain View, CA, USA
Hybrid
270K-500K Annually
Expert/Leader
Software
The Role
This role involves optimizing inference performance, managing high-performance systems, scaling models, and full-cycle ownership from research to production in a fast-paced environment.
Summary Generated by Built In

About Inworld

Inworld is a research lab of top researchers and engineers, building the world’s top-ranked realtime voice models.

Today our models are the #1 ranked realtime voice models in the world. They are used to power the largest consumer-facing AI applications available, across categories like health, fitness, learning, therapy, companions, customer experience and media; representing 100s of millions of end users. Our work spans areas like research and development of state-of-the-art models, optimizing realtime inference, and creating best-in-class APIs and products that allow developers to engage their users.

We’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn’s Top 10 Startups in the USA.

Who We're Looking For

A year ago, reliably working agentic systems and sub-second multimodal inference at scale barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood.

Experience We Find Useful

You don't need all of this. But you need enough to make a case.

  • Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.

  • Model Acceleration. Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding.

  • High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs.

  • Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections.

  • Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups.

  • Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production.

  • Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.

Who Thrives Here

  • You don’t need a roadmap to start walking; you’re comfortable picking a direction and building the map as you go.

  • You believe engineering isn't finished until it’s shipped and stable. You have a bias for impact over purely theoretical optimizations.

  • You don't just ship code; you obsess over the why. You’re the first to question an architecture if you think there’s a better way to solve the core latency or throughput problem.

  • You aren't satisfied with "the PM said so." You thrive on deep context and want to understand the fundamental logic behind every decision we make.

What Working Here Is Like

We hand you unclear problems and expect you to make them clear. We value engineers who say "I don't know yet" and then design the benchmark or prototype that finds out. We treat performance, latency, and reliability as first-class product features, not a box to check before launch. Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward. Your work should be visible. Flat structure, fast iterations, minimal process theater.

We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.

The base salary range for this full-time position is $270,000 - $500,000+ bonus + equity + benefits.

Skills Required

  • PhD in CS, Physics, Math or equivalent practical experience building backend or ML systems
  • Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
  • Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding
  • Proficiency in C++, CUDA, Rust, or highly optimized Python
  • Experience with Kubernetes, Ray, and multi-GPU/multi-node inference
  • Experience with systems programming projects or open-source contributions to major inference engines
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Mountain View , CA
58 Employees
Year Founded: 2021

What We Do

Inworld is a fully integrated platform for AI characters that goes beyond large language models (LLMs) – by adding configurable safety, knowledge, memory, narrative controls, multimodality, and more. Inworld uses advanced AI to build generative characters whose personalities, thoughts, memories, and behaviors are designed to mimic the deeply social nature of human interaction. The Inworld platform lets you create characters with personality and contextual awareness to keep them in-world and on brand. Integrations make it easy for developers to deploy characters into immersive experiences, while scale and performance are optimized for real-time experiences. We are a team of experts that have pioneered conversational AI platforms and generative models at API.AI (acquired by Google and renamed Dialogflow), Google and DeepMind. We are continuing to build out our incredibly talented team, with experts in generative language models, emotions, speech synthesis, multimodal interaction, design, and 3D animation. Inworld is backed by top-tier investors including Section 32, Intel Capital, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, First Spark Ventures, The Venture Reality Fund, CRV, Meta, Microsoft’s M12 fund, Micron Ventures, LG Technology Ventures, NTT Docomo Ventures, and SK Telecom Venture Capital. Inworld was one of six companies selected for the 2022 Disney Accelerator. Prominent angels include Twitch Co-Founder, Kevin Lin; Oculus Co-Founder, Nate Mitchell; Animoca Brands Co-Founder, Yat Siu; The Sandbox Co-Founder, Sebastien Borget and NaHCO3, the family office of Riot Games Co-Founder, Marc Merrill.

Similar Jobs

UL Solutions Logo UL Solutions

Field Evaluations Engineer - West US Region

Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
Remote or Hybrid
Cañada De Los Coches, CA, USA
15000 Employees
97K-120K Annually

Wells Fargo Logo Wells Fargo

Relationship Banker Reseda

Fintech • Financial Services
Remote or Hybrid
California, USA
205000 Employees
27K-41K Hourly
Hybrid
Ontario, CA, USA
205000 Employees

Wells Fargo Logo Wells Fargo

Client Performance Analyst 1

Fintech • Financial Services
Hybrid
San Diego, CA, USA
205000 Employees
82K-125K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account