Senior Distributed Systems Engineer

Posted 8 Days Ago
Be an Early Applicant
Palo Alto, CA
180K-250K Annually
5-7 Years Experience
Digital Media
The Role
Looking for a Senior Distributed Systems Engineer to work within the Research team, collaborating with researchers to build platforms for training next-generation foundation models. Requires 5+ years of experience with multi-modal ML pipelines, high performance computing, and low level systems. Must have a passion for system implementations and experience building stable distributed systems. Strong skills in Python and Pytorch are essential, with preferred experience in C++ and CUDA.
Summary Generated by Built In

We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.

Responsibilities

  • Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
  • Profile and optimize our model training code-base to achieve best in class hardware efficiency.
  • Build systems to distribute work across massive GPU clusters efficiently.
  • Design and implement methods to robustly train models in the presence of hardware failures.
  • Build tooling to help us better understand problems in our largest training jobs.

Experience

  • 5+ years of work experience.
  • Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.
  • Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.
  • Experience building stable and highly efficient distributed systems.
  • Strong generalist Python and Software skills including significant experience with Pytorch.
  • Good to have experience working with high performance C++ or CUDA.
  • Please note this role is not meant for recent grads.

Your application is reviewed by real people.

Top Skills

C++
Python
The Company
Minneapolis, MN
0 Employees
On-site Workplace

What We Do

Luma is a multimedia platform that delivers personalized movie and TV program selections from a range of sources to its viewers.

Jobs at Similar Companies

JuiceMedia.AI Logo JuiceMedia.AI

Business Development Manager - Mobile applications

AdTech • Agency • Digital Media • Machine Learning • Marketing Tech • Analytics • Big Data Analytics
Hybrid
Marina del Rey, CA, USA
50 Employees
102K-167K Annually

Artlist Logo Artlist

Brand & Marketing Designer

Digital Media • Music • Other • Social Media
IL
450 Employees

Effectv Logo Effectv

Advertising Operations Analyst- Digital

AdTech • Digital Media • Marketing Tech
Remote
Pennsylvania, USA
2157 Employees

Similar Companies Hiring

JuiceMedia.AI Thumbnail
Marketing Tech • Machine Learning • Digital Media • Big Data Analytics • Analytics • Agency • AdTech
Marina Del Rey, CA
50 Employees
Effectv Thumbnail
Marketing Tech • Digital Media • AdTech
New York, NY
2157 Employees
Artlist Thumbnail
Social Media • Other • Music • Digital Media
Tel Aviv, IL
450 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account