Staff Software Engineer - Inference & Performance

Posted 12 Days Ago
Be an Early Applicant
Hiring Remotely in United Kingdom
Remote
Senior level
Artificial Intelligence • Information Technology • Software
The Role
The Staff Software Engineer will lead the architecture for AI inference platform performance, focusing on latency and throughput across systems and GPU executions.
Summary Generated by Built In

We’re looking for a Staff Engineer to take technical ownership of latency, throughput, and reliability across Runware’s AI inference platform.

This is a senior technical leadership role for someone who obsesses over performance at scale, from request ingress through GPU execution to result delivery, and who can consistently turn ambitious targets such as sub-one-second inference into production reality.

As a Staff Engineer, you will define and drive the architecture, standards, and execution needed to make Runware one of the fastest and most reliable inference platforms in the market. You will work deeply across backend services, distributed systems, GPU workloads, and infrastructure, partnering closely with product, ML, and platform teams.

This role is ideal for someone who enjoys operating at the intersection of systems design, performance engineering, and real-world scale, and who wants clear ownership over outcomes that matter directly to customers.

What you’ll do
  • Own end-to-end inference performance across the platform, with clear responsibility for latency, throughput, and reliability targets
  • Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery
  • Drive the platform toward sub-1 second inference where feasible, identifying bottlenecks across networking, services, storage, and GPU execution
  • Make high-impact architectural decisions with performance, scalability, and operational simplicity as first-class concerns
  • Partner with ML and model teams to ensure models are production-ready from a performance perspective (cold starts, batching, memory usage, concurrency)
  • Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and actively improved
  • Lead deep-dive investigations into latency spikes, throughput degradation, and system-level performance issues
  • Influence and mentor engineers across teams on performance engineering, distributed systems thinking, and operational excellence
  • Improve tooling, observability, and profiling capabilities to make performance issues easier to detect and reason about
  • Advocate for pragmatic engineering best practices around testing, benchmarking, rollouts, and documentation

RequirementsWhat We’re Looking For
  • Excellent experience in software engineering, with a strong focus on backend and systems development (PHP, Python, Go, Rust, or similar)
  • Proven experience building and operating high-performance, low-latency distributed systems in production
  • Deep understanding of asynchronous processing, queues, concurrency models, and back pressure
  • Strong intuition for performance trade-offs across CPU, GPU, networking, storage, and application layers
  • Experience making and defending critical architectural decisions in complex systems
  • Hands-on experience troubleshooting real production issues under load (latency, saturation, cascading failures)
  • Familiarity with modern cloud infrastructure, CI/CD, and observability stacks (metrics, tracing, profiling)
  • Ability to communicate clearly and influence across teams in a remote-first environment
  • Strong mentorship mindset and a desire to raise the technical bar across the organisation
Nice to haves
  • Experience working on AI/ML inference platforms, GPU-backed workloads, or performance-critical compute systems
  • Knowledge of model optimisation techniques (batching, quantisation, warm-starts, memory management)
  • Experience with infrastructure-as-code and DevOps practices
  • Background in startups or fast-paced environments where speed, ownership, and pragmatism matter
  • Prior ownership of latency or throughput SLOs at scale

Benefits

We’re a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they’re followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

  • Generous paid time off – vacation, sick days, public holidays
  • Meaningful stock options – share in the upside you create
  • Remote-first setup – work from home anywhere we can employ you
  • Flexible hours – own your schedule outside core collaboration blocks
  • Family leave – paid maternity, paternity, and caregiver time
  • Company retreats – twice-yearly gatherings in inspiring locations

Please note: We are unable to offer visa sponsorship in the UK at this time. Candidates must have existing right to work in the UK.

Top Skills

Ci/Cd
Cloud Infrastructure
Go
Observability Stacks
PHP
Python
Rust
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
19 Employees
Year Founded: 2023

What We Do

Runware delivers AI-as-a-Service at 5–10x lower cost and with higher speed than competitors. Built for scale, the service has already powered 4 billion+ creations for +100K developers and +250M end-users worldwide. Founded in 2023 and headquartered in San Francisco, Runware is backed by Insight Partners and a16z.

Similar Jobs

TransUnion Logo TransUnion

Credit Sales Specialist

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Remote or Hybrid
United Kingdom
13000 Employees

Perk Logo Perk

Consultant

Artificial Intelligence • Fintech • Greentech • Sales • Software • Travel • Hospitality
Remote or Hybrid
Birmingham, West Midlands, England, GBR
1800 Employees

Perk Logo Perk

Consultant

Artificial Intelligence • Fintech • Greentech • Sales • Software • Travel • Hospitality
Remote or Hybrid
Birmingham, West Midlands, England, GBR
1800 Employees

Optum Logo Optum

Business Intelligence Engineer

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Leeds, West Yorkshire, England, GBR
160000 Employees

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account