Machine Learning Operations Manager

Sorry, this job was removed at 06:25 p.m. (CST) on Monday, Aug 04, 2025
Be an Early Applicant
Cambridge, MA
In-Office
Artificial Intelligence • Robotics
The Role
Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.

Who we are looking for:
We are seeking a Machine Learning Operations (ML-OPs) Manager who is both technically adept and an effective leader. In this role, you will lead a small team of engineers while also being hands-on in designing, building, and maintaining infrastructure that supports the entire lifecycle of Machine Learning (ML) projects. If you have a passion for building scalable ML infrastructure, mentoring engineers, and collaborating with world-class researchers, this is the role for you!

What You Will Do

  • Technical Leadership & Strategy: Drive the design, development, and maintenance of company-wide MLOps platforms and tools, leveraging Kubernetes infrastructure for ML and data processing applications.
  • Team Management & Mentorship: Manage and mentor a small team of engineers, providing technical guidance, setting priorities, and fostering a collaborative team culture
  • Scalability & Performance: Enable self-service access to ML-compute resources across on-prem and cloud environments, ensuring workload scalability, fault tolerance, and efficient job scheduling
  • Monitoring & Observability: Enhance system observability through integrations with tools and services such as FluentD, Prometheus, Grafana, and DataDog to improve reliability and debugging
  • Experiment & Model Lifecycle Management: Integrate ML applications with experiment tracking and model management services such as Weights and Biases
  • Best Practices & Collaboration: Champion engineering best practices, drive improvements in CI/CD, infrastructure automation, and reproducibility. Work closely with ML Engineers, Data Engineers, DevOps teams, and researchers to accelerate research and deployment.

What You Will Bring

  • BS or MS in Computer Science, Engineering, or equivalent
  • 5+ years of experience in an ML-Ops, DevOps, ML Engineering, or software engineering role
  • 2+ years of experience managing engineers (can be formal management or technical leadership)
  • Strong, hands-on experience with Kubernetes for ML applications
  • Experience developing ML-Ops platforms (covering data/artifact management, reproducibility, fault tolerance, experiment tracking, and model serving)
  • Proficiency in Python, Docker, and environment management tools (pip, poetry, uv, or similar)Familiarity with CI/CD tools (GitHub Actions, ArgoCD) and Infrastructure as Code (Terraform)

Skills We Value

  • Experience with job scheduling mechanisms like Kueue
  • Hands-on experience with workflow orchestration tools (Airflow, Metaflow, Argo Workflows)
  • Experience managing cloud infrastructure (GCP, AWS) and hybrid-cloud environments
  • Knowledge of scalable AI/ML platforms like Ray or PyTorch Lightning
  • Experience with logging & monitoring tools (FluentD, Prometheus, Grafana, DataDog or similar 
  • Exposure to ML model serving frameworks (TorchServe, ONNX Runtime, or similar)
  • Previous experience collaborating with research teams in academic or industrial settings

We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Similar Jobs

ServiceNow Logo ServiceNow

Senior Manager, GTM - Field Readiness

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Waltham, MA, USA
28000 Employees
138K-241K Annually

Imprivata Logo Imprivata

Technical Writer

Healthtech • Information Technology • Security • Software • Cybersecurity
Hybrid
3 Locations
1372 Employees
82K-106K Annually

Imprivata Logo Imprivata

Senior Manager, Product Management

Healthtech • Information Technology • Security • Software • Cybersecurity
Hybrid
Waltham, MA, USA
1372 Employees
225K-230K Annually

Cox Enterprises Logo Cox Enterprises

Solutions Architect

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Remote or Hybrid
MA, USA
50000 Employees
139K-208K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cambridge, MA
279 Employees
Year Founded: 2022

What We Do

We aim to solve the most important and fundamental problems in robotics and AI. (Formerly The AI Institute)

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account