OS / K8s Systems Engineer

Posted 4 Days Ago
Be an Early Applicant
2 Locations
Hybrid
165K-330K Annually
Senior level
Software
The Role
The OS / K8s Systems Engineer will build and automate systems to turn GPU hardware into production-ready compute, focusing on reproducibility, scalability, and reliability across data centers.
Summary Generated by Built In

ABOUT BASETEN

Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products.

THE ROLE

As an OS / K8s Systems Engineer at Baseten, you’ll build the automation and systems that turn raw GPU hardware into production-ready compute. From provisioning to orchestration, you’ll own the software layer that makes our infrastructure reproducible, scalable, and reliable across data centers.

This is a senior, hands-on role focused on building systems not operating them. You’ll work close to the metal designing OS images, building provisioning pipelines, and automating cluster bring-up from scratch. Your work will define how quickly we can turn new capacity into usable compute.

EXAMPLE INITIATIVES

  • Zero-to-cluster automation Build workflows that take new hardware from unprovisioned to fully operational cluster.

  • Provisioning systems Design PXE-based or equivalent systems for imaging and lifecycle management.

  • Reproducible infrastructure — Ensure clusters deploy consistently across data centers.

RESPONSIBILITIES

  • Own the end-to-end automation of cluster bring-up and lifecycle management.

  • Build and maintain OS images, provisioning systems, and configuration pipelines.

  • Deploy and operate cluster orchestration platforms (Kubernetes, Slurm, or similar).

  • Design systems for reproducibility across sites and hardware generations.

  • Automate upgrades, rollouts, and failure recovery.

  • Optimize system performance, including GPU utilization and networking.

  • Partner with hardware and network teams to validate and improve system behavior.

REQUIREMENTS

  • Experience building and operating automated infrastructure systems.

  • Strong programming skills (e.g., Python, Go, or similar).

  • Deep familiarity with Linux systems, including boot processes, drivers, and performance.

  • Experience with provisioning systems (PXE, imaging, configuration management).

  • Experience with Kubernetes.

  • Strong debugging skills across system layers (hardware → OS → network).

  • Experience working with GPU or high-performance workloads is a plus.

BENEFITS

  • Competitive compensation, including meaningful equity.

  • 100% coverage of medical, dental, and vision insurance for employee and dependents

  • Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)

  • Paid parental leave

  • Fertility and family-building stipend through Carrot

  • Company-facilitated 401(k)

  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
59 Employees

What We Do

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature

Similar Jobs

Celonis Logo Celonis

Client Value Partner - Consumer Packaged Goods

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
New York, NY, USA
3000 Employees
233K-270K Annually

ProCon Home Inc Logo ProCon Home Inc

Customer Service Representative

Information Technology • Logistics • Machine Learning • Industrial • Infrastructure as a Service (IaaS) • Manufacturing
Remote or Hybrid
United States
150 Employees
35K-46K Annually

NBCUniversal Logo NBCUniversal

Product Integration Lead

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
68000 Employees
105K-135K Annually

NBCUniversal Logo NBCUniversal

Close Protection Officer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Hybrid
New York, NY, USA
68000 Employees
145K-150K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account