MLOps Engineer

Posted Yesterday
2 Locations
Remote or Hybrid
Senior level
Artificial Intelligence
The Role
Build and maintain reproducible data pipelines, experiment orchestration, CI/CD for models, Terraform-based ML infrastructure, observability, security controls, and automation to deploy and operate ML systems in production.
Summary Generated by Built In
Position Overview

As the first dedicated ML Ops Engineer, you’ll own the tooling and infrastructure that make our ml engineers wildly productive and ensure we are able to efficiently iterate on ML models, prompts, and datasets and deploy our AI systems into a predictable production environment. You’ll bridge the gap between research and DevOps—designing reproducible dataset pipelines, automated experiment workflows, and Terraform-based cloud deployments that scale.

Key Responsibilities

Dataset Management

• Design version-controlled data pipelines (feature stores, data registries) using tools such as Delta Lake, Apache Iceberg
• Implement systems for data validation, lineage tracking, and automated quality checks (e.g., Great Expectations).

Experiment Execution & Tracking

• Build and maintain experiment orchestration with platforms like MLflow, torchx, and Apache Airflow.
• Provide templated systems and tools to ML Engineers that easily launch training/evaluation data processing systems
• Automate hyper-parameter sweeps and A/B tests, exposing clear dashboards for results.

CI/CD

Models/Agents

• workflows that package, test, and promote models and agents through staging to production.
• Implement canary deployments and rollbacks for models/agents services

Terraform Infrastructure-as-Code

• Author and maintain Terraform modules for all ML infra—networking, GPU/TPU clusters, object storage, secrets, monitoring.
• Enforce best practices for state management, workspaces, and automated plan/apply stages via CI.

Observability & Reliability

• Integrate logging, tracing, and metric collection (Prometheus, Grafana, Datadog) across data pipelines and model endpoints.
• Set SLIs/SLOs for data freshness and model latency; implement alerts and runbooks.

Security & Compliance• Work with Security to implement IAM least-privilege, key rotation, and data-encryption policies.
• Support audit requirements (SOC 2, GDPR, HIPAA where applicable).

Minimum Qualifications
  • 5+ years combined experience in DevOps, Data Engineering, or ML Ops roles.

  • Strong Terraform skills; ability to craft reusable modules and navigate complex state.

  • Production experience with at least one cloud provider (AWS, GCP, or Azure).

  • Proficiency in Python and containerization (Docker); familiarity with Kubernetes or serverless batch systems.

  • Hands-on knowledge of ML experiment platforms (MLflow, Kubeflow, Weights & Biases, or similar).

  • Experience with workflow execution frameworks (Kubeflow, Apache Airflow)

  • Understanding of modern data-versioning/feature-store concepts and tools.

  • Solid grasp of CI/CD principles, Git workflows, and infrastructure testing.

  • Excellent communication skills—capable of partnering with Data Scientists, Software Engineers, and Security teams.

Preferred (Nice-to-Have)
  • Experience with GPU orchestration (NVIDIA DGX, Karpenter, or Ray).

  • Familiarity with IaC security scanning (Checkov, tfsec).

  • Exposure to policy-as-code (OPA/Gatekeeper).

  • Prior work in real-time streaming (Kafka, Flink) and online feature serving.

  • Contributions to open-source ML Ops projects.

Reporting Structure

Reports to: Director of Infra

Skills Required

  • 5+ years combined experience in DevOps, Data Engineering, or ML Ops roles.
  • Strong Terraform skills; craft reusable modules and manage complex state.
  • Production experience with at least one cloud provider (AWS, GCP, or Azure).
  • Proficiency in Python.
  • Containerization experience with Docker.
  • Familiarity with Kubernetes or serverless batch systems.
  • Hands-on knowledge of ML experiment platforms (MLflow, Kubeflow, Weights & Biases, or similar).
  • Experience with workflow execution frameworks (Kubeflow, Apache Airflow).
  • Understanding of data-versioning and feature-store concepts and tools.
  • Solid grasp of CI/CD principles, Git workflows, and infrastructure testing.
  • Excellent communication skills and ability to partner with Data Scientists, Software Engineers, and Security teams.
  • Experience with GPU orchestration (NVIDIA DGX, Karpenter, or Ray).
  • Familiarity with IaC security scanning (Checkov, tfsec).
  • Exposure to policy-as-code (OPA/Gatekeeper).
  • Prior work in real-time streaming (Kafka, Flink) and online feature serving.
  • Contributions to open-source ML Ops projects.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA

What We Do

The Enterprise Web Agent company. Our AI agents run complex business workflows at web scale millions of times to deliver measurable outcomes.

Similar Jobs

Scopic Logo Scopic

Talent Community - MLOps Engineer

Artificial Intelligence • Software • Consulting • Cybersecurity • App development • Generative AI • SEO
In-Office or Remote
Massachusetts, Massachusetts, USA
249 Employees

Scopic Logo Scopic

Remote DevOps/MLOps Engineer

Artificial Intelligence • Software • Consulting • Cybersecurity • App development • Generative AI • SEO
In-Office or Remote
Massachusetts, Massachusetts, USA
249 Employees

EXL Logo EXL

Senior Machine Learning Engineer

Information Technology • Database • Consulting
Remote or Hybrid
United States
30246 Employees
In-Office or Remote
Remote, OR, USA
1014 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account