ML Infrastructure Engineer

Posted 2 Days Ago
8 Locations
In-Office or Remote
Senior level
Artificial Intelligence • Robotics
The Role
Design, build, and operate ML infrastructure powering data, compute, artifacts, and orchestration across cloud and on-prem. Own backend services, storage, observability, security, and developer tools; collaborate with cloud/compute providers and lead reliability and scaling efforts.
Summary Generated by Built In
Company Overview

Maven Robotics is building the world’s leading general-purpose robots and providing physical AI solutions for the most challenging industrial autonomy tasks.

Operating in stealth, we are assembling a team of world-class innovators who think from first principles. Our mission is to achieve human-level task success rates in complex environments, even when faced with limited fine-tuning data or evolving robotic hardware. We value unwavering truth-seeking, humility, and relentless determination.

Role Description

We are looking to recruit an exceptional Infrastructure Engineer to own and build the backend systems that power machine learning at Maven Robotics. In this role, you will design and scale the core infrastructure used by our AI and robotics teams to manage data, run compute workloads, store artifacts, monitor systems, and support rapidly growing engineering workflows.

You should be excited about distributed systems, backend services, data infrastructure, GPU compute, and high-reliability internal platforms. The ideal candidate has successfully built and operated similar systems before and can independently drive complex infrastructure projects from architecture through production operation. The underlying systems may be sophisticated, but the interfaces and workflows they expose should be reliable, intuitive, and easy for engineers to use.

In this role you will:

  • Own the architecture, implementation, reliability, and evolution of Maven's machine learning infrastructure.
  • Build backend services and platforms for managing data, artifacts, jobs, logs, metadata, and compute resources across cloud and on-premise environments.
  • Design scalable systems for workload orchestration, storage, observability, security, and infrastructure automation.
  • Build intuitive internal tools and abstractions that make complex infrastructure easy for engineers to use.
  • Lead technical and commercial discussions with cloud and ML compute providers, including capacity planning, performance, reliability, and cost.
Qualifications

Must-have:

  • Significant experience designing, building, and operating production backend, distributed, or compute infrastructure.
  • A track record of independently owning complex infrastructure projects from architecture through deployment and ongoing operation.
  • Strong programming ability in Python, Go, Rust, C++, or a similar backend or systems language.
  • Experience operating GPU compute infrastructure and orchestrating distributed workloads using Kubernetes, Ray, ZenML, or similar systems.
  • Experience designing and operating storage systems, observability platforms, infrastructure-as-code, and secure access controls.
  • Experience managing large-scale GPU fleets or hybrid cloud and on-premise compute environments.
  • Experience building internal developer platforms, CLIs, SDKs, or other self-service infrastructure tools.
  • Strong technical judgment, leadership, and communication skills, with the ability to drive decisions across teams and external partners.
  • Self-starter attitude with the ability to identify priorities and deliver durable solutions in a fast-paced startup environment.

Nice-to-have:

  • Familiarity with GPU architecture, accelerator-aware software design, and profiling compute-intensive workloads.
  • Exposure to infrastructure supporting large-scale robot learning workloads, including policy training, simulation, and multimodal data pipelines.
  • Familiarity with SOC 2 controls, security practices, and audit readiness.

Skills Required

  • Significant experience designing, building, and operating production backend, distributed, or compute infrastructure.
  • Track record of independently owning complex infrastructure projects from architecture through deployment and operation.
  • Strong programming ability in Python, Go, Rust, C++, or a similar backend or systems language.
  • Experience operating GPU compute infrastructure and orchestrating distributed workloads using Kubernetes, Ray, ZenML, or similar systems.
  • Experience designing and operating storage systems, observability platforms, infrastructure-as-code, and secure access controls.
  • Experience managing large-scale GPU fleets or hybrid cloud and on-premise compute environments.
  • Experience building internal developer platforms, CLIs, SDKs, or other self-service infrastructure tools.
  • Strong technical judgment, leadership, and communication skills.
  • Self-starter attitude with ability to prioritize and deliver durable solutions in a fast-paced startup.
  • Familiarity with GPU architecture, accelerator-aware software design, and profiling compute-intensive workloads.
  • Exposure to infrastructure supporting large-scale robot learning workloads, including policy training, simulation, and multimodal data pipelines.
  • Familiarity with SOC 2 controls, security practices, and audit readiness.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Santa Clara, , CA
14 Employees
Year Founded: 2024

What We Do

Maven Robotics is building the world’s leading general-purpose AI robots

Similar Jobs

Samsara Logo Samsara

Machine Learning Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Canada
4000 Employees
196K-270K Annually
Remote or Hybrid
6 Locations
200 Employees
157K-234K Annually

Arena (arena.ai) Logo Arena (arena.ai)

Senior Software Engineer

Artificial Intelligence • Information Technology • Software
Remote or Hybrid
7 Locations
58 Employees
150K-350K Annually

Third Dimension AI Logo Third Dimension AI

Research Engineer - Data Infrastructure/ML

Artificial Intelligence • Information Technology • Software
Remote or Hybrid
9 Locations
16 Employees

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account