Staff ML Ops Engineer

Reposted Yesterday
Hiring Remotely in Oakland, CA, USA
In-Office or Remote
Senior level
Artificial Intelligence • Software
The Role
As a Staff ML Ops Engineer, you'll architect high-performance systems for AI/ML, manage Kubernetes infrastructure, and develop scalable backend APIs, enabling scientific discoveries across the materials science industry.
Summary Generated by Built In

Albert’s mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here. 


We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities—whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before—accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.


What you'll do

Infrastructure & Kubernetes: 

  • Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads 
  • Manage containerized services, autoscaling, networking, and resource optimization 

Backend Development: 

  • Design and build high-performance Python APIs and services using FastAPI or similar frameworks 
  • Architect backend systems for scalability, reliability, and low latency 
  • Build integrations between AI/ML systems and the broader Albert platform 

Distributed Systems: 

  • Build and operate distributed systems that handle compute-intensive and high-throughput workloads 
  • Design for fault tolerance, graceful degradation, and horizontal scalability 
  • Implement async workflows, job queues, and task orchestration as needed 

Data Infrastructure: 

  • Architect and maintain data pipelines and storage systems supporting AI/ML workflows 
  • Work with vector databases, caches, and other data stores as required by ML systems 
  • Ensure efficient data access patterns for training and inference workloads 

Reliability & Operations: 

  • Implement observability including logging, metrics, tracing, and alerting 
  • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve 
  • Design CI/CD pipelines and promote automation best practices 
  • Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools 

Collaboration: 

  • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure 
  • Translate ML prototypes and research code into scalable, maintainable systems 
  • Contribute to technical decisions that shape the team's architecture 
You will have
  • Deep expertise in Python backend development and distributed systems 
  • Strong Kubernetes and cloud infrastructure experience 
  • A builder's mindset—you want to create foundational systems that others build on 
  • Genuine interest in science and technology; curiosity about how your work enables scientific discovery 
  • A commitment to building systems that are reliable, maintainable, and scalable 


Key competencies
  • A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering 
  • Experience supporting AI/ML teams or deploying ML systems in production 
  • Experience with GPU workloads and scheduling 
  • Advanced proficiency in Python including async programming and performance optimization 
  • Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting 
  • Strong background in distributed systems and microservices architecture 
  • Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code 
  • Proficiency in REST API development using FastAPI, Flask, or similar 
  • Experience with containerization and CI/CD pipelines 
  • Track record of operating production systems at scale  


Preferred/Bonus Points
  • Familiarity with scientific computing or research environments 
  • Background in or curiosity about chemistry, materials science, or related fields 
  • Familiarity with data engineering tools (Airflow, Dagster, or similar) 
  • Experience with vector databases or search infrastructure 
  • Expertise in observability tools (Prometheus, Grafana, Datadog) 
  • Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ) 
  • Contributions to open-source projects 
  • Experience mentoring engineers 
Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert’s home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day. We’re always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!




Top Skills

Airflow
Argocd
AWS
Azure
Dagster
Datadog
Fastapi
GCP
Grafana
Helm
Kafka
Kubernetes
Prometheus
Pulumi
Python
RabbitMQ
Redis
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bay Area, California
158 Employees
Year Founded: 2022

What We Do

Meet Albert, the partner for the scientist of the future. Enterprises partner with Albert to reimagine how they invent. From digitalizing R&D and managing change at scale, to unlocking entirely new business models with AI, chemical and materials science leaders rely on Albert to achieve digital transformation — and ensure it delivers lasting value. Every day, scientists in 30+ countries use Albert to accelerate R&D with AI trained like a chemist, bringing better products to market, faster. Your science moves the world forward. We move everything else out of your way.

Similar Jobs

Pragmatike Logo Pragmatike

Staff / Principal ML Ops Engineer

Information Technology • Software
Remote or Hybrid
13 Locations
11 Employees
Easy Apply
Remote
United States
1300 Employees
175K-210K Annually

ServiceNow Logo ServiceNow

Controller

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
San Francisco, CA, USA
28000 Employees
184K-288K Annually

ServiceNow Logo ServiceNow

AI Implementation Engineer - Moveworks

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
San Francisco, CA, USA
28000 Employees
93K-143K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account