ML Platform & Infrastructure Engineer

Posted 3 Days Ago
San Francisco, CA, USA
In-Office
Mid level
Information Technology
Designing everyday AGI
The Role
As a ML Platform & Infrastructure Engineer, you'll design CI/CD pipelines for ML workflows, build evaluation infrastructure, and develop SDKs and tools to enhance experimentation. You'll track and visualize model performance while optimizing resources.
Summary Generated by Built In
Think Different. Build the Future. 🚀

Our Mission

Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions. Software shouldn’t wait for commands; it should partner with you, amplifying what you can do every single day.

Why AGI, Inc.

We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind. We’re industry leaders in mobile and computer-use agents, bringing these capabilities to consumer scale.

Grounded in years of agent research, our AI is designed with trustworthiness and reliability as core pillars, not afterthoughts.

We are supported by tier-1 investors who funded the first generation of AI giants; now they’re backing us to build the next: everyday AGI. (Watch the demo)

If you see possibility where others see limits, read on.

What You’ll Do

Training Automation: Design and implement robust CI/CD pipelines for machine learning workflows. Automate nightly and on-demand training runs, including data ingestion, job orchestration, checkpointing, and artifact management, with reliability as a first-class requirement.

Evaluation Infrastructure: Build scalable evaluation harnesses that automatically benchmark models on every merge. Optimize latency and resource usage so experimentation stays fast, and performance regressions are caught immediately.

Research Tooling: Develop internal SDKs, CLIs, and lightweight UIs (e.g., Streamlit, Retool) that empower researchers to:

  • Inspect trajectories and traces

  • Visualize model failures

  • Curate and manage datasets

  • Iterate without friction

You’ll make experimentation ergonomic.

Observability & Performance: Implement comprehensive tracking for:

  • Model latency, throughput, and error rates

  • GPU utilization and cluster health

  • Inference cost and unit economics

Build dashboards and alerting systems that give real-time visibility into system performance and reliability.

Minimum Qualifications
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience

  • 3+ years in Software Engineering, MLOps, or ML Infrastructure

  • Strong Python proficiency

  • Experience building internal developer tools, CLIs, or dashboards

  • Experience with cloud infrastructure (AWS or GCP) and containerization (Docker, Kubernetes)

Preferred Qualifications
  • Experience designing CI/CD pipelines specifically for ML workflows

  • Familiarity with LLM serving stacks such as vLLM or TGI

  • Experience managing GPU clusters and optimizing distributed workloads

Why This Role Matters

Great research without great infrastructure slows to a crawl.
Great infrastructure multiplies the impact of every researcher.

You will define how experiments scale, how reliability is measured, and how quickly we can ship improvements to real users. The systems you build will directly shape the speed and quality of our progress toward everyday AGI.

Our Culture

🏢 All in, in person — work moves faster face-to-face
🚀 Ship by default — novel and polished can coexist, speed is the feature
🤝 One band, one sound — radical candor, zero politics, help each other win

Perks

🏥 Competitive company-sponsored medical, dental, and vision insurance
✈️ Top-tier relocation and immigration support

How to Apply

Send us:

  • A link — or 60-second video — of something you built and why it matters

  • Your resume or LinkedIn

  • Two sentences on the hardest problem you've cracked

Every exceptional candidate hears back within 48 hours.
If you see possibility where others see limits, we'd love to meet you.

Top Skills

AWS
Docker
GCP
Kubernetes
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
28 Employees
Year Founded: 2025

What We Do

Designing everyday AGI

Similar Jobs

General Motors Logo General Motors

Infrastructure Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
4 Locations
165000 Employees
155K-206K Annually

Dynatrace Logo Dynatrace

Sr. Web Analyst, Marketing

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Remote or Hybrid
United States
5200 Employees
91K-125K Annually

Dynatrace Logo Dynatrace

Marketing Analyst

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Remote or Hybrid
United States
5200 Employees
116K-145K Annually

Leader Bank Logo Leader Bank

SBL Underwriting Specialist

Fintech • Insurance • Payments • Social Impact • Financial Services
Remote or Hybrid
United States
420 Employees
90K-110K Annually

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
19 Employees
Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account