The Role
Looking for a Senior Distributed Systems Engineer to work within the Research team, collaborating with researchers to build platforms for training next-generation foundation models. Requires 5+ years of experience with multi-modal ML pipelines, high performance computing, and low level systems. Must have a passion for system implementations and experience building stable distributed systems. Strong skills in Python and Pytorch are essential, with preferred experience in C++ and CUDA.
Summary Generated by Built In
We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.
Responsibilities
- Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
- Profile and optimize our model training code-base to achieve best in class hardware efficiency.
- Build systems to distribute work across massive GPU clusters efficiently.
- Design and implement methods to robustly train models in the presence of hardware failures.
- Build tooling to help us better understand problems in our largest training jobs.
Experience
- 5+ years of work experience.
- Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.
- Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.
- Experience building stable and highly efficient distributed systems.
- Strong generalist Python and Software skills including significant experience with Pytorch.
- Good to have experience working with high performance C++ or CUDA.
- Please note this role is not meant for recent grads.
Your application is reviewed by real people.
Top Skills
C++
Python
The Company
What We Do
Luma is a multimedia platform that delivers personalized movie and TV program selections from a range of sources to its viewers.