Software Inference Deployment Engineer

Posted 2 Hours Ago
Be an Early Applicant
Oxford, MS, USA
Hybrid
Mid level
Artificial Intelligence • Hardware • Machine Learning • Semiconductor
The Role
Integrate and harden the Lumai Iris software stack for data-center deployments, support model onboarding and conversion, work with disaggregated prefill/decode runtimes, troubleshoot software issues in the field, enable and train customer ML and infra teams, and relay field feedback into product and engineering.
Summary Generated by Built In
The Opportunity

Lumai is redefining how the world computes. We are an ambitious, venture-backed UK startup pioneering a breakthrough AI accelerator for data centers which uses 3D optical compute. Our radical technology uses light to perform computation at orders of magnitude faster speeds and at far greater scales than ever before, all whilst consuming far less energy than traditional approaches.

Lumai is unlocking performance and efficiency gains that could transform the economics of AI and compute infrastructure and reshape how intelligence scales globally.

If you are passionate about bringing groundbreaking technology to market, and want to be part of a team pushing the boundaries of what is physically possible, Lumai is where you can make it happen.

 
About Lumai

Founded in 2022, Lumai is a University of Oxford spinout using optical processing to accelerate large language models (LLMs) and other transformer-based AI systems. The team combines expertise in optical computing, machine learning, and physics.

Lumai has already secured over $15 million in investment from leading deep-tech investors like Constructor Capital, IP Group, PhotonVentures and government grants, and is scaling rapidly to deploy the fastest optical compute currently available globally.

 
The Role

We are bringing the world's first optical AI compute platform to market. As we move from development into field deployment, we are looking for a Software Inference Deployment Engineer to own the software-side integration and customer support of Lumai Iris servers in third-party data centre environments.

You will begin by working alongside our software and engineering teams - helping integrate the Iris software stack, supporting model onboarding through the toolchain, and getting hands-on with the disaggregated prefill/decode runtime. This is intentional: the best way to develop deep expertise in a novel platform is to build with it. As deployments go live, you will take ownership in the field - supporting customer integration into their inference stacks, troubleshooting software issues, and acting as a primary technical contact for customer ML and infrastructure engineering teams.

This is an opportunity to work at the cutting edge of efficient AI inference - deploying a genuinely novel compute platform into production for the first time, and playing a central role in how it reaches the world.

What You'll Do
  • Work alongside Lumai's software and engineering teams to integrate, test, and harden the Iris software stack ahead of deployment

  • Support model onboarding through the Iris toolchain - loading, conversion, and framework integration

  • Develop hands-on familiarity with the disaggregated prefill/decode runtime, including how Iris servers operate alongside decode processors

  • Support customer integration of Lumai Iris into their own frameworks

  • Own software-side troubleshooting in the field, acting as the first line of response post-deployment

  • Train and enable customer ML and infrastructure engineering teams on the Iris software platform

  • Feed field findings, integration issues, and customer feedback back into product and engineering

 
What We're Looking For

Must-Have

  • Hands-on software engineering experience in AI infrastructure, inference serving, accelerator integration, or comparable deep-tech hardware-software environments

  • Strong Python skills and familiarity with major ML frameworks (PyTorch in particular)

  • Practical experience with model deployment workflows - loading, format conversion, quantisation, or framework integration

  • Comfortable working with inference serving stacks (for example vLLM, TensorRT-LLM, or similar)

  • Familiarity with Linux, containerisation (Docker), and cluster environments

  • Comfortable in a customer-facing role, able to communicate clearly with ML and infrastructure engineering teams

  • Comfortable working in a fast-moving, early-stage environment where the product and the deployment approach are both still being developed

Strong Preference For

  • Experience integrating accelerator hardware (GPUs, FPGAs, ASICs, NPUs, or novel architectures) into customer inference workflows

  • Familiarity with the NVIDIA inference stack - CUDA, TensorRT, Triton

  • Exposure to disaggregated inference architectures, prefill/decode separation, or KV cache management

 
Compensation & Benefits
  • Highly Competitive Salary: We are not saying our salary is a blank check, but let's just say it won't be a source of your stress

  • Share Option Scheme: We are all in this together! We believe in shared success while we build the Lumai of tomorrow

  • Pension Scheme: Plan for retirement with AVIVA

  • Private Health Insurance: We firmly believe that you come first, and a happy you is a healthy you! Look after yourself and your loved ones with AXA

  • Cycle to Work: Spread the cost of a bike, a bike and accessories or just accessories ​and save on tax

  • L&D Allowance: Stay at the forefront of your field with a £500 annual development budget

  • Subsidised On-site Lunches: Enjoy on-site healthy meals at half the price, as Lumai covers 50% of the cost

  • Holidays: Enjoy some deserved "me time" with 25 days paid holiday (plus bank holidays) per year

  • Socials: Be part of an inclusive community enjoying occasional all-company off-sites, lunches and socials

 
Interview Process

Our process is four stages. An initial conversation with our HR team to understand what you want from the role and what we want from it. Two technical sessions with our Product and Leadership team. Finally, an HR-team session covering scope, terms, and any final questions. We aim to move fast on candidates we are excited about; expect roughly three to four weeks end to end.

Lumai is an equal opportunity employer. We make hiring decisions on merit, scope-fit, and the strength of the working relationship we expect to build with each hire. Applications welcome from candidates of any background. If you are not sure whether you are a fit, send a note anyway.

Skills Required

  • Hands-on software engineering experience in AI infrastructure, inference serving, or accelerator integration
  • Strong Python skills
  • Familiarity with major ML frameworks (PyTorch in particular)
  • Practical experience with model deployment workflows: loading, format conversion, quantisation, framework integration
  • Experience with inference serving stacks (e.g., vLLM, TensorRT-LLM, or similar)
  • Familiarity with Linux, containerisation (Docker), and cluster environments
  • Able to work in a customer-facing role and communicate with ML and infrastructure engineering teams
  • Comfortable working in a fast-moving, early-stage environment
  • Experience integrating accelerator hardware (GPUs, FPGAs, ASICs, NPUs, or novel architectures) into inference workflows
  • Familiarity with NVIDIA inference stack: CUDA, TensorRT, Triton
  • Exposure to disaggregated inference architectures, prefill/decode separation, or KV cache management
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2022

What We Do

Lumai is an optical compute company building the next generation of AI infrastructure for the inference era. By utilizing 3D optical computing, the company develops energy-efficient AI processors that surpass the limitations of silicon-based architectures, delivering significantly higher performance and lower power consumption to unlock sustainable intelligence at scale.

Similar Jobs

Wipfli Logo Wipfli

M&A Accounting & Integration Manager

Cloud • Fintech • Software • Business Intelligence • Consulting • Financial Services
Remote or Hybrid
United States
3000 Employees
106K-140K Annually

PNC Bank Logo PNC Bank

Software Engineer

Machine Learning • Payments • Security • Software • Financial Services
Remote or Hybrid
USA
55000 Employees

PNC Bank Logo PNC Bank

Detection and Response Manager, Tempus Technologies

Machine Learning • Payments • Security • Software • Financial Services
Remote or Hybrid
USA
55000 Employees
100K-223K Annually

Enverus Logo Enverus

Account Director

Big Data • Information Technology • Software • Analytics • Energy
In-Office or Remote
2 Locations
1800 Employees

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account