Backend Software Engineer (ML Infra)

Posted 3 Days Ago
San Francisco, CA, USA
In-Office
Junior
Agency • Artificial Intelligence • HR Tech • Professional Services
The Role
Build and scale backend systems and cloud-native infrastructure for large-scale ML workloads. Implement distributed training/inference pipelines, developer tools, and observability for GPU-heavy jobs while collaborating with ML engineers.
Summary Generated by Built In

Rockstar is recruiting for a fast-growing startup that is building the AI backbone for the next generation of intelligent products. They help fast-growing AI startups design, fine-tune, evaluate, deploy, and maintain specialized models across text, vision, and embeddings. Think of them as “AWS for AI models”—not data or raw compute, but a full-stack backend for fine-tuning, reinforcement learning, inference, and long-term model maintenance. Their customers are Series A–C AI companies building enterprise-grade products. Their promise is simple: they make your AI system better.

They are hiring a Backend Software Engineer (ML Infrastructure) to help design, build, and scale the core systems that power large-scale model training and deployment.

The candidate will work on distributed training pipelines, cloud-native infrastructure, and internal developer platforms that support fine-tuning, reinforcement learning, and inference at scale. This role sits at the intersection of backend engineering and ML systems—the candidate will collaborate closely with ML engineers while owning production-grade infrastructure.

This is an ideal role for an early-career engineer who wants to work on real distributed systems, GPU workloads, and modern ML infrastructure—not dashboards or CRUD apps.

What You’ll Do

Build & Scale Core Infrastructure

- Design and implement backend systems that support large-scale ML workloads, including fine-tuning and reinforcement learning.

- Build distributed training and inference pipelines that are efficient, fault-tolerant, and observable.

- Develop internal developer tools and platforms that make it easier for ML engineers to train, evaluate, and deploy models.

Cloud & Systems Engineering

- Work on cloud-native systems using containers and orchestration (e.g., Kubernetes).

- Optimize systems for performance, reliability, and cost efficiency, especially for GPU-heavy workloads.

- Implement monitoring, logging, and observability for long-running training jobs and production services.

Collaborate with ML Engineers

- Partner closely with ML engineers to support evolving model architectures, training workflows, and evaluation needs.

- Translate ML requirements into scalable backend and infrastructure solutions.

Who You Are

Required

- 1–3 years of backend engineering experience, ideally working on production systems.

- Strong fundamentals in distributed systems, networking, and backend architecture.

- Experience building systems that scale under real load.

- Comfortable working in Python and/or Go (or similar backend languages).

- Excited to work on-site in San Francisco with a fast-moving early-stage team.

Strongly Preferred

- Experience with or exposure to ML infrastructure or ML platforms.

- Familiarity with GPU workloads, training pipelines, or inference systems.

- Experience with containerization and orchestration (Docker, Kubernetes).

- Contributions to or deep familiarity with ML infrastructure libraries such as:

  - Ray

  - vLLM

  - SGLang

  - or similar distributed ML systems

Bonus

- Computer science background from a top-tier program or equivalent demonstrated excellence.

- Open-source contributions, research projects, or side projects in systems or ML infrastructure.

- A track record of high ownership and technical curiosity.

Skills Required

  • 1-3 years of backend engineering experience
  • Strong fundamentals in distributed systems, networking, and backend architecture
  • Experience building systems that scale under real load
  • Comfortable working in Python and/or Go
  • Willingness to work on-site in San Francisco
  • Experience with or exposure to ML infrastructure or ML platforms
  • Familiarity with GPU workloads, training pipelines, or inference systems
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Familiarity with Ray, vLLM, SGLang, or similar distributed ML systems
  • Open-source contributions, research projects, or side projects in systems or ML infrastructure
  • Computer science background from a top-tier program or equivalent demonstrated excellence
  • Track record of high ownership and technical curiosity
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
6,000 Employees
Year Founded: 1998

What We Do

Rockstar is a full-service recruitment company that leverages a blend of human expertise and artificial intelligence to help businesses hire better and faster at a lower cost. They offer comprehensive recruitment services across a wide range of professional roles, utilizing proprietary AI to efficiently match candidates to job descriptions and conducting custom screening calls to ensure high-quality hires.

Similar Jobs

UL Solutions Logo UL Solutions

Field Evaluations Engineer - West US Region

Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
Remote or Hybrid
Cañada De Los Coches, CA, USA
15000 Employees
97K-120K Annually

Wells Fargo Logo Wells Fargo

Relationship Banker Reseda

Fintech • Financial Services
Remote or Hybrid
California, USA
205000 Employees
27K-41K Hourly
Hybrid
Ontario, CA, USA
205000 Employees

Wells Fargo Logo Wells Fargo

Client Performance Analyst 1

Fintech • Financial Services
Hybrid
San Diego, CA, USA
205000 Employees
82K-125K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account