Distributed Training Engineer

Reposted 2 Days Ago
Be an Early Applicant
Hiring Remotely in Menlo Park, CA
In-Office or Remote
Mid level
Artificial Intelligence • Hardware • Information Technology • Robotics
From bits to atoms.
The Role
Optimize and develop large-scale distributed LLM training systems, support reinforcement learning workflows, and contribute to open-source frameworks.
Summary Generated by Built In

About Periodic Labs

We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries. We are well funded and growing rapidly. Team members are owners who identity and solve problems without boundaries or bureaucracy. We eagerly learn new tools and new science to push forward our mission.

About the role

You will optimize, operate and develop large-scale distributed LLM training systems that power AI scientific research. You will work closely with researchers to bring up, debug, and maintain mid-training and reinforcement learning workflows. You will build tools and directly support frontier-scale experiments to make Periodic Labs the world’s best AI + science lab for physicists, computational materials scientists, AI researchers, and engineers. You will contribute open-source large scale LLM training frameworks.

You might thrive in this role if you have experience with:

  • Training on clusters with ≥5,000 GPUs

  • 5D parallel LLM training

  • Distributed training frameworks such as Megatron-LM, FSDP, DeepSpeed, TorchTitan

  • Optimizing training throughput for large scale Mixture-of-Expert models

Top Skills

Deepspeed
Fsdp
Megatron-Lm
Torchtitan
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
32 Employees
Year Founded: 2025

What We Do

We're building AI scientists and the autonomous laboratories for them to operate.

Similar Jobs

In-Office or Remote
San Francisco, CA, USA
16 Employees

SailPoint Logo SailPoint

Sales Executive

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
6 Locations
2461 Employees
109K-203K Annually

Cox Enterprises Logo Cox Enterprises

Enterprise Solutioning Director

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
149K-248K Annually

Cox Enterprises Logo Cox Enterprises

OEM Regional Advocate

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
67K-101K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account