Engineering Manager - Training & Inference Platform

Reposted 13 Hours Ago
Be an Early Applicant
London, England
In-Office
Mid level
Artificial Intelligence • Transportation
The Role
Lead the ML Optimisation team to improve driving models for efficient deployment in vehicles, ensuring resource and latency constraints are met.
Summary Generated by Built In

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition  (including breastfeeding) or any other basis as protected by applicable law.  

About us   

Founded in 2017, Wayve is the leading developer of Embodied AI technology.  Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.

Our vision is to create autonomy that propels the world forward.  Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. 
In our fast-paced environment big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.

At Wayve, your contributions matter.  We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.  

Make Wayve the experience that defines your career!  

The Role

This role is vital as it places you at the core of our organization's capacity to scale and effectively deploy advanced machine learning solutions. Your leadership directly impacts hundreds of ML researchers and engineers by providing seamless access to GPU resources and intelligent scheduling tools that accelerate model training and inference workflows. By building robust, efficient, and reliable platforms, you will significantly enhance productivity, foster innovation, and enable rapid experimentation. 

As the leader shaping our ML infrastructure, your contributions will drive the future direction and success of the company. You’ll oversee two closely-aligned functions:

  1. Training Platform • Maintain and enhance the existing training scheduler (fair-share, preemption, checkpoint/restore).
    • Provide training introspection (W&B integration, MFU metrics) and debug-node tooling for rapid iteration.
  2. Inference Platform
    • Deliver and optimize large-scale GPU inference capacity (persistent & burst).
    • Enhance Flyte-driven smart scheduling, multi-model inference pipelines, and throughput for hundreds of petabytes of labeling workloads.

Your leadership ensures that both training and inference demand from hundreds of ML engineers is met with fast self-serve platform capabilities.

Challenges You Will Own
  1. Team Leadership & Roadmap
    – Grow and mentor the team (will grow to 8+ engineers across both functions).
    – Define and drive a unified roadmap, balancing near-term demand spikes with long-term platform resilience.
  2. Scheduling & Orchestration
    – Evolve the scheduler with smart-scheduling features across training and inference workloads.
    – Develop advanced analytics and self-service interfaces to empower ML engineers to configure and monitor their inference workloads effectively.
  3. Operational Excellence
    – Implement observability & alerting to maintain 99%+ uptime for both platforms.
    – Improve efficiency of the platform with intelligent scheduling techniques and automatic cancellation of non convergent training jobs.
    – Partner with SRE to automate scaling, failover, and incident response.
  4. Talent Development
    – Recruit and develop platform engineers with a broad range of experience (junior through staff).
    – Foster a culture of ownership, cross-team collaboration, and continuous learning.
What We Are Looking For in Our Candidate

Essential:

  • Proven Leadership: Strong experience in software engineering with experience managing SWE platform engineering teams.
  • Technical Expertise: Hands-on with Flyte (or comparable orchestration) and GPU cluster management (e.g., Kubernetes/AKS).
  • Collaboration: Exceptional communication to partner with AI researchers, data platform, and SRE.
  • Talent Development: Track record recruiting, mentoring, and retaining high-performing engineers.
  • Education: BS/MS in Computer Science, Engineering, or related field.
  • Strategic Vision: Strong judgment and vision in defining multi-phase platform roadmaps.

Desirable:

  • Scaling ML Systems: Demonstrated success delivering software to support hundreds of petaflop-hours of training or millions of inference hours.
  • Experience optimizing data locality and multi-region workflows.
  • Familiarity with W&B, MLflow, or similar introspection tooling.
  • Prior work with Flyte or Ray and other large-scale orchestration frameworks.

 #LI-LA1

We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply.

For more information visit Careers at Wayve. 

To learn more about what drives us, visit Values at Wayve 

DISCLAIMER: We will not ask about marriage or pregnancy, care responsibilities or disabilities in any of our job adverts or interviews. However, we do look to capture information about care responsibilities, and disabilities among other diversity information as part of an optional DEI Monitoring form to help us identify areas of improvement in our hiring process and ensure that the process is inclusive and non-discriminatory.



Top Skills

PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
200 Employees
Year Founded: 2017

What We Do

We're Wayve, a leading developer of embodied intelligence for autonomous vehicles. We use AI to pioneer a next-generation approach to self-driving: AV2.0, which enables fleet operators to unlock the benefits of AV technology at scale.

Founded in 2017, Wayve is made up of a diverse team of experts in machine learning and robotics. We were the first to deploy AVs on public roads with end-to-end deep learning. Today, our teams are based in London and California, and we're testing AVs in cities across the UK.

Inspired by our vision for a smarter, safer, more sustainable world, we're looking for people who are passionate about building breakthrough solutions to some of the world’s most important challenges. If you're looking for an exciting opportunity with a dynamic team, get in touch!

Similar Jobs

Simply Business Logo Simply Business

Infrastructure Engineer

Fintech • Information Technology • Insurance • Software
Easy Apply
Hybrid
2 Locations
1100 Employees
30K-90K Annually

Wise Logo Wise

Senior Software Engineer

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
London, Greater London, England, GBR
6500 Employees
50K-120K Annually

BlackRock Logo BlackRock

Data Engineer

Fintech • Information Technology • Financial Services
In-Office
London, Greater London, England, GBR
25000 Employees

BlackRock Logo BlackRock

Real Estate Portfolio Modelling and Analytics - Associate - London

Fintech • Information Technology • Financial Services
In-Office
London, Greater London, England, GBR
25000 Employees

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account