ML Infrastructure Engineer

Reposted 22 Days Ago
Easy Apply
Menlo Park, CA
In-Office
180K-200K Annually
Senior level
Artificial Intelligence • Information Technology • Consulting
Talent Solutions for the AI Era
The Role
The ML Infrastructure Engineer will design distributed systems for ML training, optimize inference, build automation pipelines, and monitor production performance.
Summary Generated by Built In

ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire


Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus, no vision/audio.

Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.

We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities
  • Design and manage distributed infrastructure for ML training at scale

  • Optimize model serving systems for low-latency inference

  • Build automated pipelines for data processing, model training, and deployment

  • Implement observability tools to monitor performance in production

  • Maximize resource utilization across GPU clusters and cloud environments

  • Translate research requirements into robust, scalable system designs

Must-Haves
  • Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)

  • Strong foundation in software engineering, systems design, and distributed systems

  • Experience with cloud platforms (AWS, GCP, or Azure)

  • Proficient in Python and at least one systems-level language (C++/Rust/Go)

  • Hands-on experience with Docker, Kubernetes, and CI/CD workflows

  • Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective

  • Understanding of GPU programming and high-performance infrastructure

Nice-to-Haves
  • Experience with large-scale ML training clusters and GPU orchestration

  • Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)

  • Experience with distributed training strategies (e.g., data/model/pipeline parallelism)

  • Familiarity with orchestration tools like Kubeflow or Airflow

  • Background in performance tuning, system profiling, and MLOps best practices

At Phizenix, we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.

California Pay Range
$180,000$200,000 USD

Top Skills

Airflow
AWS
Azure
C++
Ci/Cd
Cuda
Docker
GCP
Go
Gpu Optimization
Kubeflow
Kubernetes
Llm Inference
Onnx Runtime
Python
PyTorch
Rust
TensorFlow
Tensorrt
Vllm
Vllms
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Livermore, CA
9 Employees
Year Founded: 2025

What We Do

We provide Talent Solutions for the AI Era. Our mission is to connect businesses with exceptional talent and consulting solutions that align with your company’s culture and values. We offer AI consulting services to enable businesses in leveraging cutting-edge artificial intelligence. We help discover, design and deploy AI solutions that streamline operations, boost productivity, and unlock new growth opportunities. Our team of AI experts, strategists, and technology specialists work closely with organizations to integrate AI-driven solutions that align with their unique goals and challenges. From automation and data analytics to predictive modeling and AI-based customer experiences, we provide end-to-end support for businesses embarking on their AI transformation journey.

Similar Jobs

Snap Inc. Logo Snap Inc.

Software Engineer

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
5 Locations
5000 Employees
178K-313K Annually

Nuro Logo Nuro

Software Engineer

Artificial Intelligence • Automotive • Information Technology • Robotics
In-Office
Mountain View, CA, USA
908 Employees
160K-241K Annually

Boson AI Logo Boson AI

Site Reliability Engineer

Artificial Intelligence • Machine Learning
In-Office
Santa Clara, CA, USA
21 Employees
150K-250K Annually
In-Office
Redwood City, CA, USA
180K-270K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account