Staff Software Engineer, ML Infrastructure

Posted Yesterday
Be an Early Applicant
San Francisco, CA, USA
Hybrid
220K-260K Annually
Senior level
Artificial Intelligence • Security • Software
The Role
The Staff Software Engineer will lead ML infrastructure, architect systems for model training and deployment, and mentor engineers on best practices.
Summary Generated by Built In
Who We Are

Voxel is building the future of Computer Vision and Machine Learning for operations, risk, and safety. We use computer vision and AI to enable existing security cameras to automatically detect hazards and high-risk activities, keep people safe and drive operational efficiencies. Our technology addresses the key cost drivers for workers’ compensation, general liability, and property damage, which cost US employers over $500 billion annually. Our customers include Fortune 500 companies across grocery, retail, manufacturing, food and beverage, logistics, and pharmaceutical distribution. We’ve passed $10M ARR with strong expansion revenue. Based in SF, backed by industry-leading VCs.

 
About the Role

Voxel’s perception system is the technical core of everything we ship. Our models detect human activity, equipment interactions, environmental hazards, and operational state in real time across thousands of cameras in manufacturing, logistics, retail, and pharmaceutical environments. Safety was our wedge; it proved our platform works. Now customers are pulling us into operations: equipment utilization, workflow compliance, process efficiency. Every new use case runs through the perception team.

We're hiring a Staff Software Engineer to own ML Infrastructure at Voxel. Our applied ML team is shipping vision models into production every week, across thousands of cameras at Fortune 500 customers, and the infrastructure underneath determines how fast we can move. You'll set the technical direction for how we train, track, and ship vision models, build the foundational systems that the applied ML team relies on, and shape the architectural decisions that will define our ML stack for the next several years.

This is a hands-on role. You'll write code, make architecture calls, and own outcomes end to end. You'll partner closely with applied CV engineers, the ML Data team, and the Platform team, and you'll be the technical voice in the room when ML infrastructure tradeoffs come up.

What You'll Do
  • Set the technical direction for ML infrastructure at Voxel: what we build, what we buy, and how the pieces fit together as the team and model portfolio scale

  • Architect and build the training infrastructure that lets the applied ML team run multiple experiments concurrently and iterate quickly on new architectures (PyTorch, AWS)

  • Own the train-to-deploy handoff: export trained models to optimized inference formats (TensorRT, ONNX), quantify accuracy and latency impact, and partner with Platform on production deployment

  • Pick and roll out the experiment tracking and lifecycle stack (Weights & Biases, MLflow, ClearML, or similar) so researchers can run, compare, and reproduce experiments efficiently

  • Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely

  • Mentor engineers across Vision & AI on ML infrastructure best practices, raising the bar for how the org thinks about training, evaluation, and deployment

  • Anticipate where the infrastructure needs to be in 12 to 18 months, including the upcoming move to on-device inference, and architect for that future

What We're Looking For
  • 7+ years building and shipping large-scale software systems, with at least 3 years focused on ML infrastructure or large-scale data infrastructure

  • A track record of being the person who decides the architecture, not just the person who implements it. You've owned tool selection, framework choices, and build-vs-buy calls for systems other engineers depend on

  • Deep fluency in PyTorch and the modern ML training stack. You know what good experiment tracking looks like, what makes a training pipeline reliable at scale, and where the failure modes live

  • Strong Python. Performant, maintainable code that holds up in production

  • A pragmatic shipping orientation. You can tell the difference between architectural decisions that need to be right and ones that can be revisited later, and you don't over-engineer the latter

  • Strong communication skills. You can explain complex tradeoffs clearly to ML researchers, infra peers, and leadership

Nice to Have
  • Production experience on AWS (S3, EC2, EKS, or similar) for ML workloads

  • Hands-on experience with model export and inference optimization (TensorRT, ONNX, or similar), including measuring accuracy and latency tradeoffs against training-time baselines

  • Experience with modern ML orchestration tools (Ray, Sematic, Flyte, Metaflow, Prefect, or similar)

  • Familiarity with GPU performance profiling and optimization (Nsight, PyTorch profiler, or similar)

  • Background in computer vision model training

Compensation & Benefits
  • Equity through Voxel’s Equity Incentive Plan

  • Total compensation includes base salary, annual bonus, and equity

  • Comprehensive health, dental, and vision insurance

  • Competitive paid parental leave

  • Unlimited PTO and flexible work arrangements

  • Daily meals in-office, team events, annual company onsite

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
62 Employees
Year Founded: 2020

What We Do

Voxel uses computer vision and AI to enable security cameras to automatically identify hazards and high-risk activities in real-time, keeping people safe and driving operational efficiencies. Our technology targets the key drivers for workers’ compensation, general liability, and property costs while providing full site visibility. The Voxel platform works by sending real-time notifications of safety violations and risky behaviors to on-site personnel and providing detailed reports with analysis of past incidents.

Similar Jobs

Nuro Logo Nuro

Staff Software Engineer

Artificial Intelligence • Automotive • Information Technology • Robotics
In-Office
Mountain View, CA, USA
908 Employees
194K-352K Annually

Decagon Logo Decagon

Staff Software Engineer

Artificial Intelligence • Software
Hybrid
San Francisco, CA, USA
49 Employees
300K-430K Annually

Forge Logo Forge

Staff Quality Engineer

Fintech • Financial Services
Easy Apply
Hybrid
San Francisco, CA, USA
320 Employees
180K-200K Annually

Parsec Automation Logo Parsec Automation

Product Manager

Artificial Intelligence • Information Technology • Internet of Things • Software • Analytics • Automation • Manufacturing
Easy Apply
In-Office
Anaheim, CA, USA
99 Employees

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account