Noon Jobs

AI Platform Engineer

Noon

AI Platform Engineer

Sorry, this job was removed at 02:10 a.m. (CST) on Friday, Jun 05, 2026

Be an Early Applicant

San Francisco, CA, USA

Hybrid

Artificial Intelligence • Productivity • Software • Design

The Role

About Noon

We are on a mission to reinvent how designers work in the AI era. We’re backed by top investors including First Round, Chemistry, Homebrew, Scribble and senior leaders from OpenAI, Meta, Google, Ramp, Stripe and more. We’re building the next-generation AI design tool for product teams.

About the Role

We’re hiring an AI Platform Engineer to own how our models run in production. You’ll build the inference stack that delivers sub-second responses to designers at scale, optimize latency and cost, and own the reliability of every AI capability in the product. This is the role for someone who lives in serving infrastructure and treats GPU utilization like a craft.

You’ll own the platform layer end-to-end: serving, autoscaling, observability, deployment, and the cost-and-latency economics of running models at scale.

What You’ll Do

Architect and operate the inference platform: serving stack, autoscaling, multi-tenancy, observability
Optimize end-to-end latency (TTFT, TPOT, p95) with quantization, batching, KV-cache management, and speculative decoding
Design multi-GPU parallelism strategies (DP / TP / PP) and own GPU utilization and cost economics
Build a hybrid local + cloud serving architecture — small models on the user’s Mac for fast paths, larger models in the cloud for slow paths
Own canary deployment, rollback automation, and SLO/SLA-driven reliability for all AI features
Build production observability: latency, drift, quality, and cost dashboards
Evaluate and integrate inference engines (vLLM, Triton, TGI, TensorRT, MLX) for cloud and on-device paths
Take fine-tuned models from research artifacts to production traffic

Must-Have Requirements

8+ years software engineering experience
2+ years deploying ML or LLM systems at production scale
Deep, demonstrable experience with one or more inference serving systems (vLLM, Triton, TGI, TensorRT, ONNX Runtime)
Concrete production wins on latency and throughput engineering (p50/p95/p99, GPU utilization, cost-per-token)
Reliability engineering depth: canary deployment, rollback, SLO-driven ops, on-call readiness
Cloud and Kubernetes-based ML deployment experience
Multi-GPU parallelism experience (FSDP, DDP, TP, PP) a strong plus

Nice to Have

On-device inference experience (MLX, Core ML, ONNX Runtime on consumer hardware)
Production experience with quantization, distillation, and mixed-precision inference
Experience with streaming inference and real-time AI UX
Background running inference at startup scale — comfortable with cost-per-user economics, not just raw throughput

What You’ll Build

The inference platform powering every AI feature in the product
Sub-second response paths for high-frequency design actions
A hybrid local + cloud serving architecture, with intelligent routing between fast and slow paths
Observability infrastructure: latency, drift, quality, and cost
Multi-model orchestration with on-device fast paths and cloud slow paths
Reliable, measurable, real-time streaming AI experiences

Benefits

Salary: $300,000-$400,000 base salary
Equity: Meaningful stock options
Health Insurance: Best-in-class coverage for the employee and their entire family
Location: San Francisco HQ

View all jobs at Noon

View Noon Profile

Report Job

Similar Jobs

Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services

Hybrid

55000 Employees

230K-286K Annually

Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services

Hybrid

55000 Employees

179K-246K Annually

ServiceNow

Staff Software Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

Hybrid

Mountain View, CA, USA

29000 Employees

70K-150K Annually

Drata

Artificial Intelligence Engineer

Security • Software • Cybersecurity • Automation

Hybrid

San Francisco, CA, USA

600 Employees

192K-260K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

0 Employees

Year Founded: 2024

What We Do

Noon is an AI-native product design platform that provides a dual-canvas tool for product designers. By integrating design and production-ready code, it eliminates the gap between the two, allowing designers to create, iterate, build, test, and ship products directly from a single canvas. Founded in 2024, the company aims to redefine product design workflows through AI-driven, code-centric solutions that work in seconds rather than minutes.