Eventual

Software Engineer, High Performance Computing

Reposted 15 Days Ago

San Francisco, CA, USA

In-Office

150K-250K Annually

Mid level

Artificial Intelligence • Machine Learning • Software

The Role

The Software Engineer will build core products, collaborate with teams, design reliable features, and optimise GPU workloads using Kubernetes and cloud tech.

Summary Generated by Built In

About Eventual

Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics, not the multimodal corpora that power AI. Robotics and video-AI teams now lose 20-40% of their training time to dataloading alone. GPU bandwidth has grown 2-3× per generation. Storage and pipelines haven't. The gap widens every year.

Eventual was founded in 2022 to close it. Our open-source engine, Daft, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a video-native index on top of our engine for Physical AI that streams curated datasets to GPUs at line rate. Saturates B200s today. Aimed at NVL72 and Vera Rubin tomorrow.

We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next.

Join our small (but powerful!) team working together 4 days/week in our SF Mission district office.

Your Role

As a Systems Engineer on the Dataloading team, you'll build the layer that turns multi-petabyte video corpora into dict[str, Tensor] already on the GPU at line rate. We work with the top labs training Physical AI on the newest generation hardware — H100, B200, GB200, NVL72, with Vera Rubin on the horizon — on billions of dollars worth of compute, in collaboration with partners that are the largest public AI companies on Earth. Our job is to keep those GPUs fed: rank-aware sampling, NVMe caching, video and sensor co-loading, random access into clips, decode pipelining. Streaming alone can already saturate a B200; the hard part is enabling the complex sampling patterns researchers actually need without giving up a single percentage point of MFU.

This is a systems engineering role for someone who feels physical pain when a system is slow. You won't need GPU experience on day one — we'll uplevel you on NVL72, CUDA, and SLURM. We will need you to bring real expertise on what happens between NVMe, network, memory, and CPU, and a deep instinct for where bytes go.

Key Responsibilities

Design and build the video-native dataloader: rank-aware, NVMe-cached, random-access into clips, returns tensors directly to the GPU.
Profile and optimize the full data path from object store → NVMe → page cache → host RAM → device RAM. Eliminate every avoidable copy and stall.
Saturate the latest hardware (B200, GB200, NVL72) on real customer training jobs. Push toward Vera Rubin bandwidth requirements.
Own performance benchmarks against customer baselines (custom DataLoaders, DALI, decord, LeRobot) and against our own historical numbers — regressions get caught at PR time.
Partner with researchers at our partner labs to land the loader in their training stack and measure MFU end-to-end.
Work cross-team with Storage Infrastructure on the index/format boundary and with Visual Understanding on the model-output ingestion path.

What we look for

Obsession with systems-level performance. You can recite Jeff Dean's "numbers every programmer should know" in your sleep. You eat flamegraphs for breakfast.
Strong opinions on io_uring — love it or hate it, you've earned the opinion.
Live and breathe Rust, C++, or C. You reach for them when it matters and you know why.
Strong familiarity with operating systems — page cache, scheduling, syscalls, NUMA, memory hierarchies.
A sense for where bytes actually go: NVMe vs. memory vs. network vs. PCIe vs. NVLink, and the throughput and latency budgets of each.

Nice to have

Experience working with GPUs is a plus, but you don't need it on day one.
Experience working with SLURM, Kubernetes for GPU workloads, or other HPC schedulers.
Hands-on CUDA experience.
Deep expertise on memory and caching subsystems — page cache tuning, hugepages, NUMA pinning, GPU-Direct Storage.
Worked on video decode pipelines (PyAV, decord, NVDEC) or PyTorch DataLoader internals.
Contributed to open-source systems projects in Rust/C++.

Perks & Benefits

In-person, tight-knit team — 4 days/week in our SF Mission office.
Competitive comp and meaningful startup equity.
Catered lunches and dinners for SF employees.
Commuter benefit.
Team-building events and poker nights.
Health, vision, and dental coverage.
Flexible PTO.
Latest Apple equipment.
401(k) plan with match.

If slow systems evoke emotional pain for you and you want to spend the next few years making the most expensive GPU clusters on the planet earn their keep, we'd love to talk.

Skills Required

3+ years of experience working with complex infrastructure projects, ideally involving GPUs
Experience supporting ML/AI workloads
Experience optimising GPU utilisation through scheduler extensions
Familiarity with cloud technologies, e.g. AWS S3
Experience taking a product from ground zero to production

View all jobs at Eventual

View Eventual Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, California

20 Employees

What We Do

Eventual is building a Data Warehouse from the ground up that is designed to tackle the challenges of working with traditional data engineering and analytics alongside modern ML/AI workloads. Eventual has raised over $2.5M from investors including YCombinator, Array VC, Caffeinated Capital and top Silicon Valley executives and founders in companies such as Meta, Lyft and Databricks.