MLOps Lead

Posted 5 Hours Ago
Be an Early Applicant
27 Locations
Remote
Senior level
Artificial Intelligence • Software
The Role
Lead and grow an MLOps team, define the ML infrastructure roadmap, and architect scalable automated pipelines, low-latency model serving, feature stores, monitoring, and CI/CD to move research into production.
Summary Generated by Built In
About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

Key responsibilities
  • Lead and mentor a team of MLOps engineers, fostering technical growth and a culture of operational excellence

  • Define and drive the MLOps roadmap, aligning infrastructure capabilities with Research, Engineering and product objectives

  • Establish best practices, standards, and processes for ML infrastructure, deployment, and operations

  • Own technical decision-making for ML infrastructure architecture and tooling choices

  • Architect and oversee scalable, automated machine learning pipelines, CI/CD workflows, and orchestration frameworks

  • Drive the design and implementation of robust model serving infrastructure using platforms like Triton, TorchServe, TensorFlow Serving, and KServe

  • Define inference architecture strategy optimized for ultra-low latency and high throughput

  • Design and maintain feature stores, robust data pipelines, and scalable storage solutions to efficiently handle large volumes of data

  • Collaborate with research teams to bridge the gap between experimentation and production

  • Define logging, alerting, and monitoring strategy to track model performance, drift, and system reliability

Must have
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)

  • 7+ years of experience in MLOps, with 3+ years in a technical leadership role

  • Strong software engineering skills in Python, with experience in Bash and/or Go

  • Proven track record of building and leading high-performing MLOps or infrastructure teams

  • Experience building and designing MLOps infrastructure from the ground up

  • Deep experience with MLOps platforms (MLflow, WandB, etc.) and frameworks (PyTorch, TensorFlow, etc.)

  • Deep experience with model serving frameworks (Triton, TorchServe, TensorFlow Serving, KServe) for high scalability and low latency inference

  • Experience building and managing data pipelines to support both model training and inference

  • Good experience with Kubernetes on a major cloud provider (AWS, GCP, or Azure) and with infrastructure as code (Terraform, Helm, GitOps)

  • Proficient with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry)

  • Excellent communication skills with ability to translate between research and production contexts

Nice to have
  • Experience with workflow orchestration tools (Kubeflow, Airflow, Argo Workflows)

  • Experience with FastAPI and backend applications

  • Familiarity with data platforms like Databricks or Snowflake

  • Experience with LLM/foundation model serving and optimization

  • Exposure to SRE practices or cloud security certifications

  • Experience scaling ML infrastructure for AI startups

Benefits
  • Competitive compensation with salary and equity

  • Comprehensive health coverage for you and your dependents

  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys

  • Relocation support for employees moving to join the team in one of our office locations

  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Skills Required

  • Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 7+ years of experience in MLOps with 3+ years in a technical leadership role
  • Strong software engineering skills in Python
  • Experience in Bash and/or Go
  • Proven track record of building and leading high-performing MLOps or infrastructure teams
  • Experience building and designing MLOps infrastructure from the ground up
  • Deep experience with MLOps platforms (MLflow, WandB) and ML frameworks (PyTorch, TensorFlow)
  • Deep experience with model serving frameworks (Triton, TorchServe, TensorFlow Serving, KServe) for high scalability and low latency
  • Experience building and managing data pipelines for training and inference
  • Experience with Kubernetes on a major cloud provider (AWS, GCP, or Azure)
  • Experience with infrastructure as code (Terraform, Helm, GitOps)
  • Proficiency with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry)
  • Excellent communication skills with ability to translate between research and production contexts
  • Experience with workflow orchestration tools (Kubeflow, Airflow, Argo Workflows)
  • Experience with FastAPI and backend applications
  • Familiarity with data platforms like Databricks or Snowflake
  • Experience with LLM/foundation model serving and optimization
  • Exposure to SRE practices or cloud security certifications
  • Experience scaling ML infrastructure for AI startups
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Sparks, MD
54 Employees
Year Founded: 2024

What We Do

For decades companies have relied on archaic tools to inform decisions and make bets on the future. Until now. Fundamental empowers businesses to turn gambles into guarantees and determine their future with far greater accuracy than ever before. Built by DeepMind alumni and trusted by Fortune 100 enterprises, NEXUS is our most powerful Large Tabular Model (LTM). By revealing the hidden language of tables, NEXUS unlocks trillions of dollars of value by giving businesses the Power to Predict™.

Similar Jobs

Mondelēz International Logo Mondelēz International

o9 Change Readiness Lead

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
11 Locations
90000 Employees

Zapier Logo Zapier

Manager or Sr. Manager, Sales Assist

Artificial Intelligence • Productivity • Software • Automation
Remote
30 Locations
800 Employees
Remote
26 Locations
393 Employees
179K-179K Annually

Mondelēz International Logo Mondelēz International

Manager, Procurement Data Science and Analytics (F/M/X)

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
3 Locations
90000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account