Senior Machine Learning Engineer

Reposted 21 Days Ago
Be an Early Applicant
Las Vegas, NV
In-Office
Senior level
Artificial Intelligence • Cloud • Software
The Role
Manage distributed machine learning workloads using Slurm and Kubernetes, ensuring cluster operations and mentoring engineers in best practices.
Summary Generated by Built In

Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.

About the role

We are seeking a Senior Machine Learning Engineer to build and operate the core systems that power large-scale ML training and inference across TensorWave’s GPU platform.

This role spans workload orchestration, cluster operations, performance optimization, and developer enablement for production ML workloads.

Responsibilities

  • Design, operate, and improve ML infrastructure systems supporting distributed training and inference workloads

  • Build reliable, repeatable workload execution and orchestration patterns across shared GPU environments

  • Troubleshoot performance, reliability, and scalability issues across the ML stack

  • Partner with ML, systems, and platform teams to improve developer experience and operational efficiency

Required Experience

  • Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience

  • Expertise supporting production ML systems using SLURM and Kubernetes

  • Strong understanding of GPU-accelerated workloads and distributed systems concepts

  • Solid Linux fundamentals and experience debugging infrastructure-level issues

  • Ability to build automation and tooling - Python, Go, etc.

Preferred Experience

  • Experience working across schedulers, orchestration platforms, or cluster managers

  • Familiarity with large-scale GPU environments or HPC-style systems

  • Experience improving infrastructure reliability, utilization, or performance at scale

What We Bring

  • Mission driven company

  • Competitive Salary

  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Flexible PTO

  • Paid Holidays

  • 401(k)

  • Parental Leave

  • Flexible Spending Account

  • Short Term Disability Insurance

  • Life and Voluntary Supplemental Insurance

  • Mental Health Benefits through Spring Health

We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.

Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.

Top Skills

C10D
Kubernetes
Megatron
Mpi
Python
PyTorch
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
56 Employees

What We Do

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.
Send us a message to try it for free.

Similar Jobs

Quora Logo Quora

Senior Machine Learning Engineer

Artificial Intelligence • Consumer Web • Digital Media • Machine Learning • Software
In-Office or Remote
4 Locations
240 Employees
190K-275K Annually

Babylist Logo Babylist

Senior Machine Learning Engineer

eCommerce • Healthtech • Kids + Family • Retail • Social Media
Easy Apply
Remote or Hybrid
2 Locations
300 Employees
190K-237K Annually

Motional Logo Motional

Senior Software Engineer

Artificial Intelligence • Automotive • Machine Learning • Transportation
Remote or Hybrid
U.S.
765 Employees
155K-207K Annually

Samsara Logo Samsara

Senior Machine Learning Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
United States
4000 Employees
135K-228K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account