Senior AI Ops Engineer

Reposted 13 Days Ago
Be an Early Applicant
Ann Arbor, MI, USA
In-Office
135K-229K Annually
Senior level
Hardware
The Role
The AI Ops Engineer will design and implement production-grade ML pipelines, manage experiment tracking, CI/CD processes, and optimize GPU training workflows.
Summary Generated by Built In

Company Overview

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.

Group/Division

The KLA Services team headquartered in Milpitas, CA is our service organization that consists of Service Sales and Marketing, Spares Supply Chain management, Field Operations, Engineering, Product Training, and Technical Support. The KLA Services organization partners with our field teams and customers in all business sectors to maintain the high performance and productivity of our products through a flexible portfolio of services. Our comprehensive services include: proactive management of tools to identify and improve performance; expertise in optics, image processing and motion control with worldwide service engineers, 24/7 technical support teams and knowledge management systems; and an extensive parts network to ensure worldwide availability of parts.

Job Description/Preferred Qualifications

We seek a highly skilled and passionate Senior AI Ops Engineer to join our team. This role will be pivotal in architecting and delivering the automation layer that enables fast, reproducible, and scalable model development—spanning end-to-end experiment management, model fine-tuning pipelines, and Reinforcement Learning with Human Feedback (RLHF). We encourage you to apply if you’re a systems-minded engineer who loves turning research workflows into reliable production-grade pipelines, setting standards, and mentoring others to raise the bar across the organization. 

Key Responsibilities:

  • Implement and operate experiment tracking, lineage, and reproducibility standards (datasets, code, configs, artifacts, metrics) using MLflow/W&B or equivalents.
  • Build CI/CD for ML: tests (unit/integration), packaging, reproducibility checks, policy gates, automated deployment and rollback strategies.
  • Design workflow orchestration for large-scale ML jobs (scheduled runs, triggered retrains, parameter sweeps, gated releases) using tools such as Airflow/Kubeflow/Argo or equivalents.  
  • Architect, build, and own automated pipelines for model training, fine-tuning (e.g., PEFT/LoRA), evaluation, and promotion across environments (dev → staging → production).
  • Establish standardized training “recipes” (configs, templates, golden paths) to reduce time-to-first-experiment and improve consistency across teams.
  • Enable and optimize distributed GPU training (throughput, reliability, and cost), including checkpointing, mixed precision, fault tolerance, and spot/preemptible handling where applicable.
  • Develop evaluation harnesses and automated benchmark suites (quality, safety, latency, and cost) with clear, repeatable reporting to compare runs and releases.

Qualifications:

  • Strong proficiency in Python and experience building robust automation frameworks and production-grade services for ML workloads
  • Hands-on experience with experiment tracking and model lifecycle tooling (e.g., MLflow, Weights & Biases) and reproducible ML workflows
  • Practical experience fine-tuning modern deep learning models (e.g., Transformers) and familiarity with parameter-efficient approaches (LoRA/PEFT)
  • Working knowledge of RLHF concepts and pipelines (preference data, reward models, policy optimization) and how to operationalize human-in-the-loop workflows.
  • Experience with containerization (Docker), orchestration (Kubernetes), and operating GPU workloads reliably at scale.
  • Experience with CI/CD, version control (Git), and Infrastructure-as-Code (Terraform/Bicep or equivalent).
  • Excellent problem-solving skills across distributed systems (training jobs, pipelines, compute infrastructure) and strong communication to partner with research and engineering teams.
  • Prior experience in a similar industry and/or operating ML platforms with stringent IP/security requirements is a plus.
  • Bachelor’s degree in Computer Science, Software Engineering, or related field
  • 5+ years of experience in MLOps/Platform Engineering/DevOps/ML Engineering (or demonstrated equivalent impact), including owning production systems and leading cross-team initiatives

Minimum Qualifications

  • Master's Level Degree and related work experience of 6 years; OR Bachelor's Level Degree and related work experience of 8 years; OR equivalent work experience

Base Pay Range: $134,800.00 - $229,200.00 Annually

Primary Location: USA-MI-Ann Arbor-KLA

KLA’s total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.

Interns are eligible for some of the benefits listed. Our pay ranges are determined by role, level, and location. The range displayed reflects the pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including state minimum pay wage rates, location, job-related skills, experience, and relevant education level or training. We are committed to complying with all applicable federal and state minimum wage requirements where applicable. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.

               

KLA is proud to be an Equal Opportunity Employer. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us at [email protected] or at +1-408-352-2808 to request accommodation.

Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees.  KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA’s Careers website for legitimate job postings.  KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers.  If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to [email protected] to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

Skills Required

  • Strong proficiency in Python
  • Experience with ML lifecycle tooling and reproducible ML workflows
  • Fine-tuning modern deep learning models and knowledge of RLHF
  • Experience with containerization and orchestration of GPU workloads
  • Experience with CI/CD and Infrastructure-as-Code
  • Bachelor's degree in Computer Science or related field
  • 5+ years of experience in MLOps/Platform Engineering/DevOps

KLA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about KLA and has not been reviewed or approved by KLA.

  • Retirement Support Retirement offerings include a 401(k) plan with company matching and financial planning support. Student debt assistance and related financial benefits reinforce long-term savings and security.
  • Equity Value & Accessibility Ownership programs include an Employee Stock Purchase Plan and broad-based RSU participation that extend equity beyond a narrow group. These elements complement competitive pay and bonuses to strengthen total rewards.
  • Leave & Time Off Breadth Time-off programs span paid time off, paid company holidays, and paid volunteer time. Family care and bonding leave and back-up care services add flexibility during life events.

KLA Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Milipitas, CA
10,001 Employees

What We Do

KLA develops industry-leading equipment and services that enable innovation throughout the electronics industry. We provide advanced process control and process-enabling solutions for manufacturing wafers and reticles. In close collaboration with leading customers across the globe, our expert teams of physicists, engineers, data scientists and problem-solvers design solutions that move the world forward.

Similar Jobs

BAE Systems, Inc. Logo BAE Systems, Inc.

Team Lead

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Sterling Heights, MI, USA
40000 Employees
133K-226K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Armored Vehicle Technician

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
Sterling Heights, MI, USA
40000 Employees
In-Office or Remote
3 Locations
175 Employees
In-Office or Remote
3 Locations
175 Employees

Similar Companies Hiring

Blissway Thumbnail
Computer Vision • Fintech • Hardware • Internet of Things • Machine Learning • Software • Transportation
Denver, Colorado
24 Employees
Turion Space Thumbnail
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Irvine, CA
150 Employees
Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account