Dyna Robotics

ML Infrastructure Engineer, Training

Reposted 22 Days Ago

Be an Early Applicant

Redwood City, CA, USA

In-Office

220K-320K Annually

Senior level

Robotics

We are at the forefront of revolutionizing robotic manipulation

The Role

The role involves designing and maintaining large-scale ML infrastructure, optimizing distributed training systems, and enhancing computing performance for model training.

Summary Generated by Built In

Dyna Robotics builds general-purpose robots powered by a proprietary embodied AI foundation model with top-in-industry generalization and real-world performance. Already deployed with customers across multiple industries, our robots do commercial-grade work in the physical world. Our team comes from Google DeepMind, Meta, and Cruise, and we're backed by CRV, First Round, and other leading investors.

The Role

As a ML Training Infrastructure Engineer, you will architect and build the systems that turn our multi-cloud GPU fleet into a training engine our researchers love. Your charter is singular and broad: own training infrastructure end-to-end so that every GPU is busy, every run is reproducible, and every researcher's next experiment is one command away.

What You’ll Do

Scale Distributed Training: Architect and own the infrastructure for large-scale GPU clusters. You’ll implement sharding, activation checkpointing, and memory optimization (ZeRO, FSDP) to enable the training of massive multimodal models.
Optimize Researcher Ergonomics: Build a research codebase and job scheduling system (Kubernetes/SLURM) that prioritizes fast iteration, automated retries, and seamless failure recovery.
High-Performance Data Handling: Design high-throughput pipelines to ingest and transform terabytes of multimodal robot data (video, proprioception, 3D signals), ensuring dataloaders never starve the GPUs.
Production Inference: Build low-latency inference pipelines for real-time robot control. You’ll apply quantization, distillation, and model compilation (TensorRT, Triton) to move models from the lab to the physical world.
Deep Systems Profiling: Dive into the weeds of GPU utilization, I/O bottlenecks, and memory fragmentation to squeeze every bit of performance out of our expanding compute fleet.

What You’ll Bring

7+ Years of Engineering: With a track record of leading technical projects in high-performance computing (HPC) or ML infrastructure.
ML Systems Mastery: Deep experience with PyTorch and distributed training frameworks (DeepSpeed, Accelerate). You understand the nuances of mixed precision and gradient accumulation.
Infrastructure Expertise: Hands-on experience managing cloud GPU environments (GCP/AWS) and container orchestration (Kubernetes).
Low-Level Intuition: A fundamental understanding of distributed systems, including race conditions, memory management, and NCCL/inter-node communication.
Ownership Mindset: You don't just "deploy" code; you design, build, and operate systems end-to-end to unblock fast-moving research.

Bonus Points For

Experience with Robotics Data Formats (MCAP, Protobuf) or multimodal models (VLAs).
Deep ML systems experience: custom kernels (Triton), compilers, or runtime optimization.
Experience as a founding or early-stage infrastructure hire.

At Dyna Robotics, we build technology for the real world, which requires a team as diverse as the environments our robots inhabit. We are an equal opportunity employer committed to technical rigor and mutual respect.

Don’t let a checklist stop you. Data shows that underrepresented groups often only apply if they meet 100% of the criteria. We value problem-solving and grit over keyword matching. If you’re passionate about the intersection of geometry and robotics, we want to hear from you—even if you don't check every box.

Skills Required

Bachelor's degree or higher in Computer Science or related field
At least 7 years of professional experience in the software industry
Minimum 2 years in a tech lead role
Proven experience with high-performance computing environments
Hands-on experience with job scheduling systems and managing cloud GPU environments
Deep understanding of distributed computing concepts
Hands-on experience in ML model tuning for performance
Strong analytical and problem-solving skills

View all jobs at Dyna Robotics

View Dyna Robotics Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

What We Do

Our mission is to empower businesses by automating repetitive, stationary tasks with affordable, intelligent robotic arms. Leveraging the latest advancements in foundation models, we're driving the future of general-purpose robotics—one manipulation skill at a time