MLOps Engineer

Posted Yesterday
Be an Early Applicant
Oxford, MS, USA
Hybrid
Senior level
Artificial Intelligence • Hardware • Machine Learning • Semiconductor
The Role
Own and operate end-to-end ML pipelines from research to hardware-validated production. Build experiment tracking, CI/CD, benchmarking, observability, and resource scheduling for on-prem accelerator clusters and cloud. Collaborate with ML researchers, compiler and hardware teams to optimise models, automate testing, and enable reproducible, hardware-in-the-loop deployments.
Summary Generated by Built In
The Opportunity

Lumai is redefining how the world computes. We are an ambitious, venture-backed UK startup pioneering a breakthrough AI accelerator for data centers which uses 3D optical compute. Our radical technology uses light to perform computation at orders of magnitude faster speeds and at far greater scales than ever before, all whilst consuming far less energy than traditional approaches.

Lumai is unlocking performance and efficiency gains that could transform the economics of AI and compute infrastructure and reshape how intelligence scales globally.

If you are passionate about bringing groundbreaking technology to market, and want to be part of a team pushing the boundaries of what is physically possible, Lumai is where you can make it happen.

 
About Lumai

Founded in 2022, Lumai is a University of Oxford spinout using optical processing to accelerate large language models (LLMs) and other transformer-based AI systems. The team combines expertise in optical computing, machine learning, and physics.

Lumai has already secured over $15 million in investment from leading deep-tech investors like Constructor Capital, IP Group, PhotonVentures and government grants, and is scaling rapidly to deploy the fastest optical compute currently available globally.

 
The Role

We are building custom AI hardware and the full-stack software ecosystem to run it. As our first dedicated MLOps Engineer, you will own the infrastructure that takes models from research to silicon-validated production — designing, building, and operating the pipelines, tooling, and platforms that let our AI and hardware teams move fast without breaking things. This is a high-impact, high-ownership role at the intersection of ML research, compiler stacks, and novel hardware.

 
What You'll Do
  • Design and operate end-to-end ML pipelines: data ingest, training, evaluation, quantisation, and deployment onto custom AI accelerator hardware

  • Build and maintain experiment tracking, model registry, and versioning infrastructure (e.g. MLflow, W&B, or equivalent) tuned to our hardware-in-the-loop workflows

  • Own CI/CD for ML: automated testing of model correctness, numerical accuracy, and on-chip performance after every change to models, compilers, or firmware

  • Develop and maintain tooling for benchmarking model inference on custom silicon, including latency, throughput, power, and utilisation metrics

  • Collaborate closely with ML researchers, compiler engineers, and hardware architects to identify and remove bottlenecks across the model-to-chip workflow

  • Instrument and monitor production inference deployments; design alerting and rollback strategies appropriate to hardware-accelerated serving

  • Manage compute resource scheduling across on-premises accelerator clusters and cloud (GPU/CPU) for training and simulation workloads

  • Drive infrastructure-as-code practices: containerisation, orchestration (Kubernetes/Slurm), and reproducible environment management

  • Contribute to the internal developer platform: self-service tooling, documentation, and runbooks that raise engineering productivity across the company

 
What We're Looking For

Must-Have

  • 5+ years of software or infrastructure engineering experience, with at least 2 years in an ML or AI-adjacent role

  • Strong Python skills and familiarity with major ML frameworks (PyTorch or JAX); comfortable reading and modifying model code

  • Hands-on experience building and operating ML pipelines in production: data pipelines, training orchestration, evaluation, and serving

  • Experience with experiment tracking and model lifecycle management tools (MLflow, W&B, DVC, or similar)

  • Solid understanding of containerisation (Docker) and orchestration (Kubernetes or Slurm) for distributed compute workloads

  • Infrastructure-as-code mindset: Terraform, Ansible, or equivalent; CI/CD pipelines (GitHub Actions, Jenkins, or similar)

  • Experience with hardware-accelerated compute (CUDA/GPU workflows, profiling, performance tuning) — even if not on custom silicon

  • Strong debugging and observability skills: distributed tracing, logging, metrics dashboards

  • Ability to work effectively in a fast-moving, ambiguous environment where the hardware and software are both being built simultaneously

Strong Preference For

  • Experience with custom or novel accelerator hardware (FPGAs, ASICs, NPUs, or research chips)

  • Familiarity with ML compiler stacks: MLIR, LLVM, TVM, XLA, or vendor-specific compilers (NVCC, TensorRT, etc.)

  • Experience with model optimisation techniques: quantisation (INT8/INT4/FP8), pruning, distillation, or mixed-precision training

  • Background in on-chip performance profiling and roofline analysis

  • Exposure to chip bring-up workflows: running early software stacks on pre-silicon simulation or first-silicon hardware

  • Contributions to open-source ML infrastructure or compiler tooling

  • Experience in a deeptech, semiconductor, or hardware startup environment

 
Compensation & Benefits
  • Highly Competitive Salary: We are not saying our salary is a blank check, but let's just say it won't be a source of your stress

  • Share Option Scheme: We are all in this together! We believe in shared success while we build the Lumai of tomorrow

  • Pension Scheme: Plan for retirement with AVIVA

  • Private Health Insurance: We firmly believe that you come first, and a happy you is a healthy you! Look after yourself and your loved ones with AXA

  • Cycle to Work: Spread the cost of a bike, a bike and accessories or just accessories ​and save on tax

  • L&D Allowance: Stay at the forefront of your field with a £500 annual development budget

  • Subsidised On-site Lunches: Enjoy on-site healthy meals at half the price, as Lumai covers 50% of the cost

  • Holidays: Enjoy some deserved "me time" with 25 days paid holiday (plus bank holidays) per year

  • Socials: Be part of an inclusive community enjoying occasional all-company off-sites, lunches and socials

 
Interview Process

Our process is four stages. An initial conversation with our HR team to understand what you want from the role and what we want from it. Two technical sessions with various members of our engineering team. Finally, an HR-team session covering scope, terms, and any final questions. We aim to move fast on candidates we are excited about; expect roughly three to four weeks end to end.

Lumai is an equal opportunity employer. We make hiring decisions on merit, scope-fit, and the strength of the working relationship we expect to build with each hire. Applications welcome from candidates of any background. If you are not sure whether you are a fit, send a note anyway.

Skills Required

  • 5+ years of software or infrastructure engineering experience, with at least 2 years in an ML or AI-adjacent role
  • Strong Python skills
  • Familiarity with major ML frameworks (PyTorch or JAX); comfortable reading and modifying model code
  • Hands-on experience building and operating ML pipelines in production: data pipelines, training orchestration, evaluation, and serving
  • Experience with experiment tracking and model lifecycle tools (MLflow, W&B, DVC, or similar)
  • Containerisation (Docker) and orchestration (Kubernetes or Slurm) for distributed compute workloads
  • Infrastructure-as-code experience (Terraform, Ansible, or equivalent) and CI/CD pipelines (GitHub Actions, Jenkins, or similar)
  • Experience with hardware-accelerated compute (CUDA/GPU workflows, profiling, performance tuning)
  • Strong debugging and observability skills: distributed tracing, logging, metrics dashboards
  • Ability to work effectively in a fast-moving, ambiguous environment where hardware and software are built simultaneously
  • Experience with custom or novel accelerator hardware (FPGAs, ASICs, NPUs, or research chips)
  • Familiarity with ML compiler stacks (MLIR, LLVM, TVM, XLA, vendor-specific compilers)
  • Experience with model optimisation techniques: quantisation (INT8/INT4/FP8), pruning, distillation, or mixed-precision training
  • Background in on-chip performance profiling and roofline analysis
  • Exposure to chip bring-up workflows: pre-silicon simulation or first-silicon software stacks
  • Contributions to open-source ML infrastructure or compiler tooling
  • Experience in a deeptech, semiconductor, or hardware startup environment
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2022

What We Do

Lumai is an optical compute company building the next generation of AI infrastructure for the inference era. By utilizing 3D optical computing, the company develops energy-efficient AI processors that surpass the limitations of silicon-based architectures, delivering significantly higher performance and lower power consumption to unlock sustainable intelligence at scale.

Similar Jobs

C-Gen.AI Logo C-Gen.AI

HPC & MLOps Engineer

Artificial Intelligence • Cloud • Information Technology • Infrastructure as a Service (IaaS)
Remote or Hybrid
32 Locations
9 Employees
In-Office or Remote
2 Locations
3295 Employees
73K-171K Annually

PNC Bank Logo PNC Bank

Software Engineer

Machine Learning • Payments • Security • Software • Financial Services
Remote or Hybrid
USA
55000 Employees

PNC Bank Logo PNC Bank

Detection and Response Manager, Tempus Technologies

Machine Learning • Payments • Security • Software • Financial Services
Remote or Hybrid
USA
55000 Employees
100K-223K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account