Senior Software Engineer — LLM Post-Training Platform

Posted Yesterday
Be an Early Applicant
Bellevue, WA, USA
In-Office
200K-288K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Machine Learning • Software • Database • Analytics
Let's build a world where data and AI turn possibilities into reality.
The Role
Build and scale an LLM post-training platform: design public training APIs and SDKs, control plane and GPU data plane, implement multi-tenant scheduling and capacity-aware routing, optimize end-to-end performance and throughput, and productionize research components for reliable enterprise-scale training and inference.
Summary Generated by Built In

At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era, we seek AI-native thinkers across every function who are energized by the opportunity to reinvent how they work. You don’t just use tools; you possess an innate curiosity, treating AI as a high-trust collaborator that is core to how you solve problems and accelerate your impact. We look for low-ego individuals who thrive in dynamic and fast-moving environments and move with an experimental mindset — who rapidly test emerging capabilities to discover simpler, more powerful ways to deliver results. At Snowflake, your role isn't just to execute a function, but to help redefine the future of how work gets done.

Senior Software Engineer — LLM Post-Training Platform

The Snowflake ML Platform team's mission is to let customers run their most demanding ML/AI workloads inside Snowflake. Cortex Training is our LLM post-training platform: it turns scarce, expensive GPU capacity into a simple, composable service, so customers can adapt open-weight foundation models to their own business problems while we handle the hard distributed-systems parts, including scheduling, orchestration, multi-node training and inference, fault tolerance, and throughput.

The platform already runs post-training at scale. Under the hood, it decouples GPU computation from the training loop and exposes it as primitive APIs that compose into everything from SFT to full RL workflows. You'll work alongside a team that ships fast & sweats reliability and the researchers behind DeepSpeed. We're looking for an engineer who thrives in the ML infrastructure layer and brings a solid understanding of LLMs and post-training to help us scale and grow it.

YOU WILL:
  • Design and build across the full stack — from the public training APIs and SDK through the control plane to the GPU data plane.

  • Scale the distributed systems that make GPU compute serverless — multi-tenant scheduling, placement, and capacity-aware routing across regional GPU pools, with fault tolerance built in.

  • Drive end-to-end performance at scale — keep the training, inference, and RL loops fast and the data plane responsive under heavy concurrent load, with GPUs kept saturated.

  • Productionize research building blocks — partner with Snowflake Research to turn state-of-the-art training and inference techniques into reliable, composable components customers can run at enterprise scale.

QUALIFICATIONS:
  • 5+ years building and shipping production ML systems

  • Strong distributed systems and infrastructure foundation — designing scalable, fault-tolerant services and operating them on Kubernetes in production.

  • Familiarity with GPU and LLM infrastructure — e.g., PyTorch, DeepSpeed/FSDP, Ray, CUDA/NCCL, vLLM; able to debug across the data, infrastructure, and GPU layers.

  • Demonstrated ability to harden complex systems for reliability, throughput, and cost efficiency.

  • BS in Computer Science or a related field (MS/PhD a plus).

  • (Bonus) Hands-on LLM post-training / modeling experience — the strongest candidates pair deep infra skills with real post-training intuition.

Snowflake is growing fast, and we’re scaling our team to help enable and accelerate our growth. We are looking for people who share our values, challenge ordinary thinking, and push the pace of innovation while building a future for themselves and Snowflake.

How do you want to make your impact?

For jobs located in the United States, please visit the job posting on the Snowflake Careers Site for salary and benefits information: careers.snowflake.com

Skills Required

  • 5+ years building and shipping production ML systems
  • Strong distributed systems and infrastructure foundation; design scalable, fault-tolerant services and operate them on Kubernetes in production
  • Familiarity with GPU and LLM infrastructure (e.g., PyTorch, DeepSpeed/FSDP, Ray, CUDA/NCCL, vLLM); ability to debug across data, infrastructure, and GPU layers
  • Demonstrated ability to harden complex systems for reliability, throughput, and cost efficiency
  • BS in Computer Science or a related field
  • MS/PhD in Computer Science or related field
  • Hands-on LLM post-training or modeling experience

Snowflake Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Snowflake and has not been reviewed or approved by Snowflake.

  • Equity Value & Accessibility Equity grants (RSUs) and an ESPP are central to total compensation and are described as highly valuable. Feedback suggests many see equity as a major satisfaction driver with meaningful upside potential.
  • Fair & Transparent Compensation Pay is considered competitive and accompanied by clear communication on salary, equity, and advancement. Feedback suggests pay practices emphasize fairness and transparency.
  • Parental & Family Support Paid parental leave, fertility benefits, adoption assistance, and family planning resources are notably comprehensive. Feedback suggests these programs materially support major life events.

Snowflake Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bozeman, MT
9,023 Employees
Year Founded: 2012

What We Do

Snowflake powers the end-to-end data lifecycle – from ingesting and processing data to analyzing and modeling it, to building and sharing data and AI applications – helping engineers, analysts, and leaders innovate faster and achieve more with their data. We're on a mission to empower every enterprise to achieve its full potential through data and AI.

Why Work With Us

Snowflake is where data does more, and so do you. More innovating, more growing, and more collaborating. Here, you’ll find the sweet spot between building big and moving fast, in technology and your career.

Gallery

Gallery

Similar Jobs

MetLife Logo MetLife

Customer Care Advocate AMS Service - Omaha, NE 9.21.26 - 18275

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

MetLife Logo MetLife

Customer Care Advocate Disability Intake - Cary, NC 9.14.26 - 18272

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

MetLife Logo MetLife

Customer Care Advocate Disability Intake - Cary, NC 9.21.26 - 18274

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually
Remote or Hybrid
United States
240 Employees
210K-275K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account