Machine Learning Infrastructure Engineer

Reposted Yesterday
Palo Alto, CA, USA
In-Office
Entry level
Artificial Intelligence • Robotics • Industrial • Manufacturing
The Role
The role involves building systems for large-scale model training, focusing on distributed training, ML infrastructure, and GPU performance optimization.
Summary Generated by Built In
The Role

At Mind Robotics, we’re building generalized physical AI—robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to iterate quickly on large-scale models depends on world-class ML infrastructure.

We’re looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable, and scalable model training—powering everything from experimentation to production deployment.

Responsibilities
  • Design and implement scalable systems for training large ML models

  • Enable efficient workflows for data ingestion, training, and iteration

  • Develop and optimize distributed training systems across hundreds of GPUs

  • Implement strategies for parallelization, sharding, and efficient compute utilization

  • Improve training efficiency through techniques such as attention optimizations, kernel fusion, and memory management

  • Partner closely with modeling teams to accelerate iteration speed and reduce training costs

  • Build internal tools for experiment tracking, monitoring, and debugging

  • Implement systems for tracking training performance, failures, and resource utilization

  • Debug and resolve bottlenecks across the training stack

  • Provide lightweight infrastructure support for deploying and running models in production environments

  • Optimize inference performance and reliability where needed

  • Support core cloud infrastructure needs for training workloads (without heavy DevOps overhead)

  • Manage compute resources efficiently across training jobs

Qualifications
  • Strong experience building infrastructure for large-scale ML training

  • Deep understanding of how modern LLM/VLM systems are trained and scaled

  • Proven experience setting up and scaling distributed training across hundreds of GPUs

  • Strong understanding of parallelization strategies (data, model, pipeline parallelism)

  • Strong proficiency in Python programming

  • Expert-level proficiency in PyTorch and/or JAX

  • Strong understanding of techniques like attention optimization, kernel fusion, and efficient memory usage

Nice to Have
  • Experience supporting inference systems in production

  • Familiarity with robotics or embodied AI workloads

  • Experience building tools for experiment management and researcher productivity

Skills Required

  • Experience with PyTorch or JAX
  • Knowledge of distributed training and core ML infrastructure
  • Ability to work with hundreds of GPUs
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
20 Employees
Year Founded: 2025

What We Do

Mind Robotics builds intelligent, AI-driven robotic systems for industrial deployment, focusing on creating collaborative platforms for manufacturing environments.

Similar Jobs

Whatnot Logo Whatnot

Software Engineer

eCommerce • Mobile • Retail
In-Office
4 Locations
1200 Employees
190K-300K Annually

Moloco Logo Moloco

Machine Learning Engineer

AdTech • Cloud • Machine Learning • Mobile • Retail
Hybrid
Menlo Park, CA, USA
700 Employees
167K-210K Annually

Gritt Robotics Logo Gritt Robotics

Infrastructure Engineer

Artificial Intelligence • Robotics • Software • Energy • Renewable Energy
In-Office
Belmont, CA, USA
14 Employees
In-Office
2 Locations
2359 Employees
170K-216K Annually

Similar Companies Hiring

Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account