In this role, you will:
- Design, build, and optimize a petabyte-scale, in-house HPC storage infrastructure, ensuring high performance and reliability for our machine learning workloads across both cloud and on-premise data centers.
- Drive GPU efficiency by strategically collocating storage and compute, architecting a storage layer that keeps tens of thousands of GPUs fully utilized and prevents bottlenecks.
- Drive key initiatives in training and storage optimization by partnering with ML practitioners, applying your deep understanding of frameworks such as PyTorch and TensorFlow to meet their evolving demands.
- Investigate and adopt new distributed system paradigms and cutting-edge technologies to ensure our infrastructure can scale to meet ever-growing computational and storage demands.
- Create production-grade web service APIs, SDKs, and other essential tools to deliver a world-class developer experience for all software teams at Zoox.
Qualifications:
- Experience designing and building high-performance, distributed storage systems (object/file) for large-scale, GPU-bound workloads.
- Proficiency in Python, Java, or similar languages for developing data-intensive, high-performance applications.
- Hands-on experience with cloud platforms (AWS, GCP, Azure), using their storage, GPU, and observability services to provide usage showback for ML practitioners.
- Bachelor's degree in Computer Science or a related field with a strong foundation in data structures and systems design.
Bonus Qualification:
- Experience with parallel filesystems (e.g., Lustre, FSx) and their integration with container orchestrators via Kubernetes CSI drivers.
- Deep knowledge of ML frameworks like PyTorch and TensorFlow, and workload schedulers such as SLURM or Kubernetes.
- Familiarity with emerging AI paradigms, including agentic systems, and observability tools like OpenTelemetry.
Top Skills
What We Do
Zoox is an autonomous mobility company that was founded to provide a safer, cleaner, and more enjoyable future on the road. To achieve that goal, the company has spent the past 10 years creating a purpose-built robotaxi that gives the world a better way to ride.
Why Work With Us
At Zoox, we are working to solve one of the greatest technological challenges of our generation.
From the beginning, we have been focused on our goal of reimagining transportation from the ground up. We are a mission-driven community of innovators working together to create a safer, cleaner, and more enjoyable future on the road.
Gallery







