Maven Robotics

ML Infrastructure Engineer

Reposted Yesterday

8 Locations

In-Office or Remote

Senior level

Artificial Intelligence • Robotics

The Role

Design, build, and operate ML infrastructure powering data, compute, artifacts, and orchestration across cloud and on-prem. Own backend services, storage, observability, security, and developer tools; collaborate with cloud/compute providers and lead reliability and scaling efforts.

Summary Generated by Built In

Company Overview

Maven Robotics is building the world’s leading general-purpose robots and providing physical AI solutions for the most challenging industrial autonomy tasks.

Operating in stealth, we are assembling a team of world-class innovators who think from first principles. Our mission is to achieve human-level task success rates in complex environments, even when faced with limited fine-tuning data or evolving robotic hardware. We value unwavering truth-seeking, humility, and relentless determination.

Role Description

We are looking to recruit an exceptional Infrastructure Engineer to own and build the backend systems that power machine learning at Maven Robotics. In this role, you will design and scale the core infrastructure used by our AI and robotics teams to manage data, run compute workloads, store artifacts, monitor systems, and support rapidly growing engineering workflows.

You should be excited about distributed systems, backend services, data infrastructure, GPU compute, and high-reliability internal platforms. The ideal candidate has successfully built and operated similar systems before and can independently drive complex infrastructure projects from architecture through production operation. The underlying systems may be sophisticated, but the interfaces and workflows they expose should be reliable, intuitive, and easy for engineers to use.

In this role you will:

Own the architecture, implementation, reliability, and evolution of Maven's machine learning infrastructure.
Build backend services and platforms for managing data, artifacts, jobs, logs, metadata, and compute resources across cloud and on-premise environments.
Design scalable systems for workload orchestration, storage, observability, security, and infrastructure automation.
Build intuitive internal tools and abstractions that make complex infrastructure easy for engineers to use.
Lead technical and commercial discussions with cloud and ML compute providers, including capacity planning, performance, reliability, and cost.

Qualifications

Must-have:

Significant experience designing, building, and operating production backend, distributed, or compute infrastructure.
A track record of independently owning complex infrastructure projects from architecture through deployment and ongoing operation.
Strong programming ability in Python, Go, Rust, C++, or a similar backend or systems language.
Experience operating GPU compute infrastructure and orchestrating distributed workloads using Kubernetes, Ray, ZenML, or similar systems.
Experience designing and operating storage systems, observability platforms, infrastructure-as-code, and secure access controls.
Experience managing large-scale GPU fleets or hybrid cloud and on-premise compute environments.
Experience building internal developer platforms, CLIs, SDKs, or other self-service infrastructure tools.
Strong technical judgment, leadership, and communication skills, with the ability to drive decisions across teams and external partners.
Self-starter attitude with the ability to identify priorities and deliver durable solutions in a fast-paced startup environment.

Nice-to-have:

Familiarity with GPU architecture, accelerator-aware software design, and profiling compute-intensive workloads.
Exposure to infrastructure supporting large-scale robot learning workloads, including policy training, simulation, and multimodal data pipelines.
Familiarity with SOC 2 controls, security practices, and audit readiness.

Skills Required

Significant experience designing, building, and operating production backend, distributed, or compute infrastructure.
Track record of independently owning complex infrastructure projects from architecture through deployment and operation.
Strong programming ability in Python, Go, Rust, C++, or a similar backend or systems language.
Experience operating GPU compute infrastructure and orchestrating distributed workloads using Kubernetes, Ray, ZenML, or similar systems.
Experience designing and operating storage systems, observability platforms, infrastructure-as-code, and secure access controls.
Experience managing large-scale GPU fleets or hybrid cloud and on-premise compute environments.
Experience building internal developer platforms, CLIs, SDKs, or other self-service infrastructure tools.
Strong technical judgment, leadership, and communication skills.
Self-starter attitude with ability to prioritize and deliver durable solutions in a fast-paced startup.
Familiarity with GPU architecture, accelerator-aware software design, and profiling compute-intensive workloads.
Exposure to infrastructure supporting large-scale robot learning workloads, including policy training, simulation, and multimodal data pipelines.
Familiarity with SOC 2 controls, security practices, and audit readiness.