Runware

Senior Machine Learning Engineer

Reposted 16 Days Ago

Be an Early Applicant

Hiring Remotely in United Kingdom

Remote

Senior level

Artificial Intelligence • Information Technology • Software

The Role

Lead end-to-end ML initiatives: integrate and fine-tune models, optimize GPU inference for latency and throughput, build evaluation and monitoring tooling, collaborate on scalable serving systems, and mentor engineers.

Summary Generated by Built In

Join Runware as a Senior Machine Learning Engineer and be at the forefront of developing innovative AI solutions across various media modalities including text, image, video, 3D, and audio. We're building a powerful AI media creation platform designed to revolutionize how content is generated.

As a Senior Machine Learning Engineer, you’ll take the lead on critical projects, guiding the end-to-end lifecycle from research and experimentation to production deployment and performance monitoring. Your work will help shape the capabilities of our platform and enhance the experiences of users who rely on our cutting-edge AI technologies.

What You'll Be Doing

Integrate open-source and third-party models into our inference platform
Lead fine-tuning initiatives (LoRA, adapters, PEFT, domain adaptation)
Optimise inference workloads for latency, batching, memory efficiency, and throughput
Benchmark model quality vs cost vs performance across modalities
Improve inference startup times and stability under high load
Build evaluation frameworks and internal tooling for model validation
Work closely with Infrastructure and Backend teams on scalable serving systems
Monitor production performance and drive continuous optimisation
Mentor engineers and help raise the ML engineering bar across the team

RequirementsWhat We’re Looking For

Proven experience delivering ML systems to production environments
Strong, low-level Python skills and deep hands-on experience with PyTorch
Experience working with diffusion models, LLMs, or multimodal architectures
Practical experience fine-tuning large models (LoRA, PEFT, adapters, etc.)
Experience optimizing inference workloads in GPU environments
Strong understanding of model evaluation, experimentation, and monitoring
Ability to debug performance, memory, and reliability issues in production
Strong systems thinking understanding how ML decisions impact infrastructure
High ownership and comfort operating in a fast-paced startup environment

Nice to have

Experience with vLLM or custom inference servers
Experience with Kubernetes or containerised ML workloads
Experience working in high-throughput distributed systems
Background in AI media generation (image, video, audio)
Experience building internal ML tooling or developer-facing APIs
Experience with kernels in CUDA/C++

Benefits

We’re a remote-first team that comes together in person twice a year to plan, collaborate, and celebrate wins. Day to day we keep a few core hours for teamwork, but outside of that you set the schedule that helps you do your best work.

Our environment is fast-moving and ambitious. Big pushes are part of building category-defining products, but we balance that with flexible working, generous time off, and regular retreats so the team can stay sharp and motivated.

Generous paid time off – vacation, sick days, public holidays
Meaningful stock options – share in the upside you create
Remote-first setup – work from home anywhere we can employ you
Flexible hours – own your schedule outside core collaboration blocks
Family leave – paid maternity, paternity, and caregiver time
Company retreats – twice-yearly gatherings in inspiring locations

Skills Required

Proven experience delivering ML systems to production environments
Strong, low-level Python skills
Deep hands-on experience with PyTorch
Experience working with diffusion models, LLMs, or multimodal architectures
Practical experience fine-tuning large models (LoRA, PEFT, adapters)
Experience optimizing inference workloads in GPU environments
Strong understanding of model evaluation, experimentation, and monitoring
Ability to debug performance, memory, and reliability issues in production
Strong systems thinking understanding how ML decisions impact infrastructure
High ownership and comfort operating in a fast-paced startup environment
Experience with vLLM or custom inference servers
Experience with Kubernetes or containerised ML workloads
Experience working in high-throughput distributed systems
Background in AI media generation (image, video, audio)
Experience building internal ML tooling or developer-facing APIs
Experience with kernels in CUDA/C++

View all jobs at Runware

View Runware Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: London

19 Employees

Year Founded: 2023

What We Do

Runware delivers AI-as-a-Service at 5–10x lower cost and with higher speed than competitors. Built for scale, the service has already powered 4 billion+ creations for +100K developers and +250M end-users worldwide. Founded in 2023 and headquartered in San Francisco, Runware is backed by Insight Partners and a16z.