Software Engineer, High Performance Computing

Reposted 13 Days Ago
San Francisco, CA, USA
In-Office
150K-250K Annually
Mid level
Artificial Intelligence • Machine Learning • Software
The Role
The Software Engineer will build core products, collaborate with teams, design reliable features, and optimise GPU workloads using Kubernetes and cloud tech.
Summary Generated by Built In
About Eventual

Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics, not the multimodal corpora that power AI. Robotics and video-AI teams now lose 20-40% of their training time to dataloading alone. GPU bandwidth has grown 2-3× per generation. Storage and pipelines haven't. The gap widens every year.

Eventual was founded in 2022 to close it. Our open-source engine, Daft, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a video-native index on top of our engine for Physical AI that streams curated datasets to GPUs at line rate. Saturates B200s today. Aimed at NVL72 and Vera Rubin tomorrow.

We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next.

Join our small (but powerful!) team working together 4 days/week in our SF Mission district office.

Your Role

As a Systems Engineer on the Dataloading team, you'll build the layer that turns multi-petabyte video corpora into dict[str, Tensor] already on the GPU at line rate. We work with the top labs training Physical AI on the newest generation hardware — H100, B200, GB200, NVL72, with Vera Rubin on the horizon — on billions of dollars worth of compute, in collaboration with partners that are the largest public AI companies on Earth. Our job is to keep those GPUs fed: rank-aware sampling, NVMe caching, video and sensor co-loading, random access into clips, decode pipelining. Streaming alone can already saturate a B200; the hard part is enabling the complex sampling patterns researchers actually need without giving up a single percentage point of MFU.

This is a systems engineering role for someone who feels physical pain when a system is slow. You won't need GPU experience on day one — we'll uplevel you on NVL72, CUDA, and SLURM. We will need you to bring real expertise on what happens between NVMe, network, memory, and CPU, and a deep instinct for where bytes go.

Key Responsibilities
  • Design and build the video-native dataloader: rank-aware, NVMe-cached, random-access into clips, returns tensors directly to the GPU.

  • Profile and optimize the full data path from object store → NVMe → page cache → host RAM → device RAM. Eliminate every avoidable copy and stall.

  • Saturate the latest hardware (B200, GB200, NVL72) on real customer training jobs. Push toward Vera Rubin bandwidth requirements.

  • Own performance benchmarks against customer baselines (custom DataLoaders, DALI, decord, LeRobot) and against our own historical numbers — regressions get caught at PR time.

  • Partner with researchers at our partner labs to land the loader in their training stack and measure MFU end-to-end.

  • Work cross-team with Storage Infrastructure on the index/format boundary and with Visual Understanding on the model-output ingestion path.

What we look for
  • Obsession with systems-level performance. You can recite Jeff Dean's "numbers every programmer should know" in your sleep. You eat flamegraphs for breakfast.

  • Strong opinions on io_uring — love it or hate it, you've earned the opinion.

  • Live and breathe Rust, C++, or C. You reach for them when it matters and you know why.

  • Strong familiarity with operating systems — page cache, scheduling, syscalls, NUMA, memory hierarchies.

  • A sense for where bytes actually go: NVMe vs. memory vs. network vs. PCIe vs. NVLink, and the throughput and latency budgets of each.

Nice to have
  • Experience working with GPUs is a plus, but you don't need it on day one.

  • Experience working with SLURM, Kubernetes for GPU workloads, or other HPC schedulers.

  • Hands-on CUDA experience.

  • Deep expertise on memory and caching subsystems — page cache tuning, hugepages, NUMA pinning, GPU-Direct Storage.

  • Worked on video decode pipelines (PyAV, decord, NVDEC) or PyTorch DataLoader internals.

  • Contributed to open-source systems projects in Rust/C++.

Perks & Benefits
  • In-person, tight-knit team — 4 days/week in our SF Mission office.

  • Competitive comp and meaningful startup equity.

  • Catered lunches and dinners for SF employees.

  • Commuter benefit.

  • Team-building events and poker nights.

  • Health, vision, and dental coverage.

  • Flexible PTO.

  • Latest Apple equipment.

  • 401(k) plan with match.

If slow systems evoke emotional pain for you and you want to spend the next few years making the most expensive GPU clusters on the planet earn their keep, we'd love to talk.

Skills Required

  • 3+ years of experience working with complex infrastructure projects, ideally involving GPUs
  • Experience supporting ML/AI workloads
  • Experience optimising GPU utilisation through scheduler extensions
  • Familiarity with cloud technologies, e.g. AWS S3
  • Experience taking a product from ground zero to production
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
20 Employees

What We Do

Eventual is building a Data Warehouse from the ground up that is designed to tackle the challenges of working with traditional data engineering and analytics alongside modern ML/AI workloads. Eventual has raised over $2.5M from investors including YCombinator, Array VC, Caffeinated Capital and top Silicon Valley executives and founders in companies such as Meta, Lyft and Databricks.

Similar Jobs

In-Office
Sunnyvale, CA, USA
8879 Employees
160K-225K Annually

SpaceX Logo SpaceX

Software Engineer

Aerospace • Other
In-Office
Sunnyvale, CA, USA
8879 Employees
135K-185K Annually
In-Office
2 Locations
2700 Employees
125K-229K Annually
In-Office
Berkeley, CA, USA
39 Employees
140K-200K Annually

Similar Companies Hiring

Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account