AI Platform Engineer

Sorry, this job was removed at 02:10 a.m. (CST) on Friday, Jun 05, 2026
Be an Early Applicant
San Francisco, CA, USA
Hybrid
Artificial Intelligence • Productivity • Software • Design
The Role
About Noon

We are on a mission to reinvent how designers work in the AI era. We’re backed by top investors including First Round, Chemistry, Homebrew, Scribble and senior leaders from OpenAI, Meta, Google, Ramp, Stripe and more. We’re building the next-generation AI design tool for product teams.

About the Role

We’re hiring an AI Platform Engineer to own how our models run in production. You’ll build the inference stack that delivers sub-second responses to designers at scale, optimize latency and cost, and own the reliability of every AI capability in the product. This is the role for someone who lives in serving infrastructure and treats GPU utilization like a craft.

You’ll own the platform layer end-to-end: serving, autoscaling, observability, deployment, and the cost-and-latency economics of running models at scale.

What You’ll Do
  • Architect and operate the inference platform: serving stack, autoscaling, multi-tenancy, observability

  • Optimize end-to-end latency (TTFT, TPOT, p95) with quantization, batching, KV-cache management, and speculative decoding

  • Design multi-GPU parallelism strategies (DP / TP / PP) and own GPU utilization and cost economics

  • Build a hybrid local + cloud serving architecture — small models on the user’s Mac for fast paths, larger models in the cloud for slow paths

  • Own canary deployment, rollback automation, and SLO/SLA-driven reliability for all AI features

  • Build production observability: latency, drift, quality, and cost dashboards

  • Evaluate and integrate inference engines (vLLM, Triton, TGI, TensorRT, MLX) for cloud and on-device paths

  • Take fine-tuned models from research artifacts to production traffic

Must-Have Requirements
  • 8+ years software engineering experience

  • 2+ years deploying ML or LLM systems at production scale

  • Deep, demonstrable experience with one or more inference serving systems (vLLM, Triton, TGI, TensorRT, ONNX Runtime)

  • Concrete production wins on latency and throughput engineering (p50/p95/p99, GPU utilization, cost-per-token)

  • Reliability engineering depth: canary deployment, rollback, SLO-driven ops, on-call readiness

  • Cloud and Kubernetes-based ML deployment experience

  • Multi-GPU parallelism experience (FSDP, DDP, TP, PP) a strong plus

Nice to Have
  • On-device inference experience (MLX, Core ML, ONNX Runtime on consumer hardware)

  • Production experience with quantization, distillation, and mixed-precision inference

  • Experience with streaming inference and real-time AI UX

  • Background running inference at startup scale — comfortable with cost-per-user economics, not just raw throughput

What You’ll Build
  • The inference platform powering every AI feature in the product

  • Sub-second response paths for high-frequency design actions

  • A hybrid local + cloud serving architecture, with intelligent routing between fast and slow paths

  • Observability infrastructure: latency, drift, quality, and cost

  • Multi-model orchestration with on-device fast paths and cloud slow paths

  • Reliable, measurable, real-time streaming AI experiences

Benefits
  • Salary: $300,000-$400,000 base salary

  • Equity: Meaningful stock options

  • Health Insurance: Best-in-class coverage for the employee and their entire family

  • Location: San Francisco HQ

 

Similar Jobs

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
5 Locations
55000 Employees
230K-286K Annually

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
5 Locations
55000 Employees
179K-246K Annually

ServiceNow Logo ServiceNow

Staff Software Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Hybrid
Mountain View, CA, USA
29000 Employees
70K-150K Annually

Drata Logo Drata

Artificial Intelligence Engineer

Security • Software • Cybersecurity • Automation
Hybrid
San Francisco, CA, USA
600 Employees
192K-260K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2024

What We Do

Noon is an AI-native product design platform that provides a dual-canvas tool for product designers. By integrating design and production-ready code, it eliminates the gap between the two, allowing designers to create, iterate, build, test, and ship products directly from a single canvas. Founded in 2024, the company aims to redefine product design workflows through AI-driven, code-centric solutions that work in seconds rather than minutes.

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account