Senior MLOps & AI Infrastructure Engineer

Posted 2 Days Ago
Be an Early Applicant
San Jose, CA, USA
In-Office
149K-216K Annually
Senior level
Artificial Intelligence • Internet of Things • Machine Learning
The Role
Design, build, and operate scalable ML pipelines and infrastructure across cloud and on‑prem HPC. Implement MLOps tooling (tracking, registries, feature stores), CI/CD/CT, containerized GPU orchestration, model optimization (LLMs, GNNs, RL), data/versioning pipelines, monitoring, and mentor engineers to productionize ML for EDA and simulation workloads.
Summary Generated by Built In
Job Details:

Job Description:

About Altera

At Altera™, our independence as the world’s largest pure‑play FPGA solutions provider gives us the focus, speed, and agility to innovate without compromise. With more than four decades of industry‑leading FPGA expertise, our singular mission is to deliver the programmable technologies that help customers differentiate, innovate, and scale across rapidly evolving markets like AI, cloud, networking, and edge. As an independent company, we move faster, invest deeper, and partner more closely—empowering our teams to drive breakthrough innovation and shape the future of the FPGA industry.

About the Role

We are looking for a Senior MLOps & AI Infrastructure Engineer to architect, build, and operationalize machine learning systems at scale. This role sits at the intersection of data science, software engineering, and infrastructure — combining deep ML expertise with the DevOps/MLOps discipline required to ship models reliably into production.

You will partner closely with software, data, and infrastructure teams to design end-to-end ML pipelines, automate model lifecycle management, and deliver AI-powered capabilities across our EDA, HPC, and cloud environments.

Key Responsibilities:

ML Platform & Pipeline Engineering

•    Design, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments

•    Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows

•    Implement CI/CD/CT (Continuous Training) pipelines for ML models using tools such as Kubeflow, MLflow, Airflow, or similar

•    Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools

Model Development & Optimization

•    Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents for EDA and chip design applications

•    Apply advanced techniques: transfer learning, quantization, pruning, distillation, and RLHF for production-grade model efficiency

•    Implement A/B testing frameworks and shadow deployments for safe model rollout

•    Benchmark and optimize model inference performance on GPU/TPU clusters

Data Engineering & Feature Management

•    Build and maintain data pipelines for large-scale structured and unstructured datasets (terabyte-scale)

•    Collaborate with data teams to design feature engineering systems and maintain data quality for ML training

•    Implement data versioning and lineage tracking (DVC, Delta Lake, or similar)

Infrastructure & Operations

•    Manage cloud ML infrastructure on AWS (SageMaker), Azure (AML), or GCP (Vertex AI) with cost and performance optimization

•    Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed ML environments

•    Build monitoring, alerting, and observability systems for model performance drift, data quality, and system health

•    Support HPC schedulers (LSF, Slurm) for large-scale distributed training jobs

Collaboration & Leadership

•    Partner with research scientists to productionize experimental models with engineering rigor

•    Mentor junior engineers and define ML engineering best practices across the organization

•    Drive adoption of AI/ML solutions within semiconductor, EDA, and simulation workflows

Technology Stack

ML Frameworks:

PyTorch • TensorFlow • JAX • Hugging Face • scikit-learn • XGBoost

MLOps & Pipelines:

MLflow • Kubeflow • Airflow • Weights & Biases • DVC • Feast

Infrastructure & Cloud:

AWS SageMaker / GCP Vertex AI / Azure ML • Terraform • Docker • Kubernetes • Slurm / LSF

Languages:

Python • Bash • Go • SQL

Monitoring & Observability:

Prometheus • Grafana • ELK Stack • Evidently AI • Arize

Key Competencies

•    Strong ownership mindset — you drive ML initiatives from prototype to production without being asked

•    Bias toward automation: if you do it twice, you automate it

•    Ability to bridge research and engineering — translating papers into production-grade systems

•    Thrives in fast-paced, ambiguous environments typical of deep-tech and semiconductor companies

•    Clear communicator who can explain complex ML concepts to non-technical stakeholders

Salary Range

The pay range below is for Bay Area California only. Actual salary may vary based on a number of factors including job location, job-related knowledge, skills, experiences, trainings, etc. We also offer incentive opportunities that reward employees based on individual and company performance. 

$149,100 - $215,925 USD

We use artificial intelligence to screen, assess, or select applicants for the position. Applicants must be eligible for any required U.S. export authorizations.

Qualifications:

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Statistics, or related field and 10+ years of industry experience

  • 10+ years of experience across ML engineering, data science, and MLOps — including frameworks (PyTorch, TensorFlow, JAX, Hugging Face) and production model deployment at scale

  • 8+ years of experience experience with parallelism strategies (FSDP, DeepSpeed, data/model parallelism)

  • 10+ years of experience and proficiency in Python programming

  • 8+ years of experience in cloud ML platforms (AWS, GCP, Azure), Docker/Kubernetes, and CI/CD pipelines

  • 5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility

Preferred Qualifications

  • Phd in Computer Science, Machine Learning, Statistics, or related field

  • Experience applying ML/AI to semiconductor, EDA, or chip design domains (e.g., timing prediction, place & route optimization, DRC closure)

  • Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management for training workloads

  • Knowledge of LLM fine-tuning, Retrieval-Augmented Generation (RAG) architectures, and AI agent frameworks such as LangChain or AutoGen

  • Experience with graph neural networks (GNNs) or geometric deep learning for circuit and netlist analysis

  • Background in reinforcement learning for optimization problems

  • Exposure to zero-trust security, DevSecOps, and compliance automation for ML systems

  • Experience working with large-scale simulation pipelines and synthetic data generation

  • Experience at organizations such as NVIDIA, AMD, Intel, Google DeepMind, or similar AI/HPC-focused companies

  • Published research or open-source contributions in ML, MLOps, or AI for EDA

  • Experience building AI-powered developer tools or copilot-style products

  • Familiarity with Synopsys, Cadence, or Siemens EDA toolchains and associated data formats

Job Type: Regular

Shift:Shift 1 (United States of America)

Primary Location:San Jose, California, United States

Additional Locations:

Posting Statement:All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Skills Required

  • Bachelor's or Master's degree in Computer Science, Machine Learning, Statistics, or related field and 10+ years industry experience
  • 10+ years experience across ML engineering, data science, and MLOps including PyTorch, TensorFlow, JAX, Hugging Face and production model deployment at scale
  • 8+ years experience with parallelism strategies (FSDP, DeepSpeed, data/model parallelism)
  • 10+ years proficiency in Python programming
  • 8+ years experience with cloud ML platforms (AWS, GCP, Azure), Docker/Kubernetes, and CI/CD pipelines
  • 5+ years hands-on experience with MLflow, Weights & Biases, or Neptune for tracking and reproducibility
  • Experience with HPC schedulers (LSF, Slurm) and GPU cluster management
  • Experience applying ML/AI to semiconductor, EDA, or chip design domains
  • Knowledge of LLM fine-tuning, RAG architectures, RLHF, and AI agent frameworks (e.g., LangChain, AutoGen)
  • Experience with graph neural networks (GNNs) or geometric deep learning
  • Familiarity with zero-trust security, DevSecOps, and compliance automation for ML systems
  • Experience building AI-powered developer tools or copilot-style products; published research or OSS contributions

Altera (altera.com) Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Altera (altera.com) and has not been reviewed or approved by Altera (altera.com).

  • Retirement Support Feedback suggests retirement programs are robust, with offerings such as a 401(k) and a pension. This breadth supports long-term financial security.
  • Leave & Time Off Breadth Feedback suggests time-off policies are generous, including PTO, paid sick days, and paid holidays. Wellness initiatives like gym memberships further support balance.
  • Parental & Family Support Feedback suggests parental leave is generous. Family-building support, including fertility benefits and adoption reimbursement, is highlighted as part of the package.

Altera (altera.com) Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Jose, California
1,612 Employees
Year Founded: 1983

What We Do

Altera: Accelerating Innovators Altera provides leadership programmable solutions that are easy-to-use and deploy in applications from cloud to edge, offering limitless AI possibilities. Our end-to-end broad portfolio of products including FPGAs, CPLDs, Intellectual Property, development tools, System on Modules, SmartNICs and IPUs provide the flexibility to accelerate innovation. Altera is helping to shape the future through pioneering innovation that unlocks extraordinary possibilities for everyone on the planet.

Similar Jobs

Altera (altera.com) Logo Altera (altera.com)

Infrastructure Engineer

Artificial Intelligence • Internet of Things • Machine Learning
In-Office
San Jose, CA, USA
1612 Employees
149K-216K Annually

Zscaler Logo Zscaler

Development Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
102K-145K Annually

Comcast Logo Comcast

Senior Account Executive

Digital Media • Information Technology • News + Entertainment
Hybrid
Livermore, CA, USA
115000 Employees
73K-123K Annually

Comcast Logo Comcast

Account Executive

Digital Media • Information Technology • News + Entertainment
Hybrid
Sacramento, CA, USA
115000 Employees
50K-97K Annually

Similar Companies Hiring

Legora Thumbnail
Artificial Intelligence • Legal Tech • Software
Chicago, Illinois
700 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account