ML Platform Engineer

Reposted 6 Days Ago
San Francisco, CA, USA
In-Office
183K-310K Annually
Mid level
Robotics • Database
Visualize, debug, and manage multimodal data in one purpose-built platform for robotics and embodied AI development.
The Role
Design, deploy, and scale ML systems for production at Foxglove, focusing on data infrastructure for robotics, optimizing inference, and building evaluation workflows.
Summary Generated by Built In

Build the data infrastructure for robots operating in the real world.

Robotics is moving from research labs into production across factories, warehouses, vehicles, and field deployments. When robots fail, behave unexpectedly, or need to be improved, engineers rely on data to understand what actually happened.

At Foxglove, we build the observability, visualization, and data infrastructure that makes that possible. Our tools are used by robotics and autonomous systems teams to ingest, store, query, replay, and analyze massive volumes of multimodal sensor data from live systems and from production fleets.

About the Role

We're looking for a ML Platform Engineer with deep infrastructure instincts to help design, deploy, and scale the systems that power Foxglove's data platform. This is a platform-first role: you'll own the infrastructure layer that makes ML possible in production, not just the models that run on top of it.

You'll be responsible for the reliability, scalability, and performance of the ML platform itself, from inference serving and pipeline orchestration to training infrastructure and evaluation frameworks. The problems are real and urgent: petabyte-scale multimodal robotics data, high-throughput retrieval and embedding pipelines, and the internal ML flywheel that lets our team ship fast. This is a hands-on infrastructure role, not research.

Key Responsibilities

  • Design, deploy, and operate production inference infrastructure — including model serving, autoscaling, load balancing, and cost optimization across cloud environments

  • Own the platform architecture for embedding and retrieval pipelines that power semantic search over multimodal robotics data (image, video, point cloud, and timeseries)

  • Build and maintain the training and evaluation infrastructure that enables rapid iteration on model performance — including job orchestration, experiment tracking, and dataset versioning

  • Drive cloud infrastructure decisions (AWS/GCP) that directly impact latency, throughput, reliability, and cost at scale

  • Define platform abstractions and internal tooling that let product engineers ship ML-powered features without needing to manage infrastructure themselves

  • Evaluate, integrate, and operationalize third-party ML infrastructure components; establish clear build vs. buy frameworks for the team

What We're Looking For

  • Deep, hands-on experience owning production ML infrastructure: inference serving, model optimization (e.g., vLLM, Triton, TorchServe), orchestration, and cloud cost management

  • Strong foundation in distributed systems and cloud infrastructure (AWS/GCP) — you think in terms of system reliability, failure modes, and operational burden, not just model accuracy

  • Experience architecting and operating retrieval systems at scale, including vector databases (e.g., Pinecone, Lance, turbopuffer, pgvector) and embedding pipelines over large, heterogeneous datasets

  • A platform engineer's mindset: you build systems that other engineers depend on, and you take that responsibility seriously

  • Proven ability to operate with high ownership — you can make hard infrastructure tradeoffs independently and move fast without breaking things

  • Strong communication skills; you can explain infrastructure tradeoffs clearly to both ML and non-ML engineers

Bonus Points

  • Familiarity with fine-tuning and domain adaptation techniques for LLMs or embedding models (i.e. SFT, PEFT)

  • Familiarity with data mining or hybrid search workflows, especially as applied in robotics autonomous vehicles, or physical AI workflows

  • Prior experience building ML platforms, evaluation frameworks, or data management tooling from the ground up

What We Offer

  • $300 monthly budget towards commuter benefits or building your personal workspace (remote only)

  • Competitive equity grant in a Series B company

  • Medical, Dental, Vision, and Term Life insurance coverage at 100% for employees and 75% for dependents

  • 401(k) matching up to 4%

  • 4 weeks vacation, plus holidays and winter break

  • All expenses paid company off-sites 2× per year

Why Join Us
  • Impact: Own growth at a fast-growing, high-leverage moment for the company.

  • Mission: Accelerate the development of the next generation of robotics and embodied AI.

  • Team: Work with world-class engineers, designers, and researchers passionate about open-source and developer tools.

  • Ownership: Drive initiatives end-to-end, with high autonomy and visibility.

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
24 Employees
Year Founded: 2021

What We Do

At Foxglove, we’re building powerful tools to accelerate robotics development. We believe that robotics will have a massive impact on our daily lives and the world economy over the coming decade, and that better quality software tooling will significantly accelerate this trend. Our team’s years of experience working in the robotics and self-driving industries means we are uniquely positioned to bring the advanced tools built in-house at larger companies to the increasing number of startups in this space, across a wide range of verticals. Our first product, Foxglove Studio, is an open source visualization and diagnosis platform, specifically designed for working with robotics and sensor data. It allows you to easily inspect sensor inputs such as images, point clouds, and time series data, via a highly customizable 2D & 3D environment.

Similar Jobs

ServiceNow Logo ServiceNow

Senior Machine Learning Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Hybrid
Mountain View, CA, USA
28000 Employees

42dot Logo 42dot

Platform Engineer

Artificial Intelligence • Software • Transportation
Hybrid
Sunnyvale, CA, USA
739 Employees
133K-254K Annually

Ōura Logo Ōura

Senior Data Engineer

Artificial Intelligence • Information Technology • Machine Learning • Marketing Tech • Software • Biotech • Design
In-Office
San Francisco, CA, USA
850 Employees
148K-203K Annually

Autodesk Logo Autodesk

Principal Engineer

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
In-Office
San Francisco, CA, USA
13285 Employees
165K-296K Annually

Similar Companies Hiring

Apptronik Thumbnail
Software • Robotics • Machine Learning • Hardware • Computer Vision
Austin, TX
180 Employees
Doodle Labs Thumbnail
Wearables • Robotics • Internet of Things • Hardware • Automation • App development • Aerospace
SG
50 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account