The Role
As a Senior Distributed Systems Engineer, you'll collaborate with researchers to build platforms for next-gen ML models, optimize system performance, and enhance distributed computing across massive GPU clusters.
Summary Generated by Built In
We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.
Responsibilities
- Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
- Profile and optimize our model training code-base to achieve best in class hardware efficiency.
- Build systems to distribute work across massive GPU clusters efficiently.
- Design and implement methods to robustly train models in the presence of hardware failures.
- Build tooling to help us better understand problems in our largest training jobs.
Experience
- 5+ years of work experience.
- Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.
- Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.
- Experience building stable and highly efficient distributed systems.
- Strong generalist Python and Software skills including significant experience with Pytorch.
- Good to have experience working with high performance C++ or CUDA.
Compensation
- The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.
Your application is reviewed by real people.
Top Skills
C++
Cuda
Python
PyTorch
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
Luma is a multimedia platform that delivers personalized movie and TV program selections from a range of sources to its viewers.