Software Engineer, Multimodal Storage Infrastructure

Posted 2 Days Ago
San Francisco, CA, USA
In-Office
150K-250K Annually
Mid level
Artificial Intelligence • Machine Learning • Software
The Role
Design and build the storage and indexing layer for petabyte-scale multimodal datasets (video, sensors, embeddings). Implement predicate and projection pushdown, efficient file formats and random-access into clips, versioning and schema evolution, and integrations with lakehouse formats and dataloaders to minimize bytes read. Partner with visual understanding and dataloading teams to land model outputs into the index and optimize query paths for content-aware reads.
Summary Generated by Built In
About Eventual

Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics. They don't know how to index a clip by content, co-locate sensors on the same row as video, version multimodal datasets, or push predicates down to a corpus of MP4s. Robotics and video-AI teams build the missing layer themselves: stitching together five to eight tools, organizing disorganized video and sensor data, building schemas and versioning that don't exist. "It was rebuilding what Databricks built 15 years ago for analytics — just for AI data."

Eventual was founded in 2022 to ship that layer once. Our open-source engine, Daft, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a multimodal warehouse on top of our engine for Physical AI: video, sensors, and sim outputs co-indexed on the same row, aligned on timecode, and versioned — with a content-aware query layer on top.

We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next.

Join our small (but powerful!) team working together 4 days/week in our SF Mission district office.

Your Role

As a Storage Infrastructure Engineer, you'll take everything we know about modern databases and apply it to the world of Physical AI. Our warehouse co-indexes video, sensors, embeddings, and sim outputs on the same row, versioned, with a third query layer (not row/column, not vector/semantic) — content-aware queries over what's inside clips. Your job is to make that layer fast: the right indices for petabyte-scale video, predicate pushdowns that elide whole files, file formats that respect random access into clips, and a query path that turns "left-arm grasp failures on deformable objects" into the smallest possible read.

You should believe, in your bones, that the best read is the read elided.

Key Responsibilities
  • Design and build the storage and indexing layer: row groups, column chunks, secondary indices, vector indices, and the metadata that lets queries skip everything that doesn't matter.

  • Push the query engine harder — predicate pushdown, projection pushdown, late materialization — across multimodal columns including video, embeddings, and sensor streams.

  • Choose, extend, or build on top of modern open formats (Parquet, Iceberg, Delta etc) and build our own/contribute upstream where it makes sense.

  • Build versioning and schema evolution for multimodal datasets so customer data stays reproducible across months of experimentation.

  • Partner with the Dataloading team on the format-to-loader boundary so an iceberg.scan(...) translates into the absolute minimum of bytes hitting NVMe.

  • Partner with the Visual Understanding team to land model outputs in the index without an external glue layer.

What we look for
  • You love thinking about indices. B+ trees, LSM trees, bitmap indices, vector indices, learned indices — you have favorites and you have grudges.

  • You love thinking about query engines. Predicate pushdown makes you happy. Late materialization makes you happier.

  • Strong familiarity with the storage hierarchy: cloud object stores, NVMe, block storage, spinning disk, RAM, GPU memory — and the latency and cost of moving between them.

  • Strong opinions about Parquet — love it or hate it, you've earned the opinion. Same for Iceberg, Delta, Lance, and the other lakehouse formats.

  • A real love for databases and query systems. You read database papers for fun.

  • You believe the best read is the read elided.

Nice to have
  • Background from a storage or table-format team — Lance, Iceberg, Delta, Hudi, Spiral, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse, or similar.

  • You've attempted to build your own database before. Or, at minimum, fantasized about it in detail.

  • Experience with Rust or modern C++ for storage engines.

  • Hands-on time with vector indices (HNSW, IVF, SCANN) or hybrid retrieval systems.

  • Comfort with the OLAP/lakehouse ecosystem: catalogs, file layout, compaction, manifest formats, time travel.

Perks & Benefits
  • In-person, tight-knit team — 4 days/week in our SF Mission office.

  • Competitive comp and meaningful startup equity.

  • Catered lunches and dinners for SF employees.

  • Commuter benefit.

  • Team-building events and poker nights.

  • Health, vision, and dental coverage.

  • Flexible PTO.

  • Latest Apple equipment.

  • 401(k) plan with match.

If you've ever read a Parquet footer for fun and thought "this is so close to what video needs, but yet so far" — we should talk.

Skills Required

  • Deep experience with storage and indexing systems (B+ trees, LSM, bitmap, vector indices)
  • Experience designing query-engine optimizations (predicate pushdown, projection pushdown, late materialization)
  • Strong familiarity with storage hierarchy and tradeoffs (cloud object stores like S3, NVMe, block, HDD, RAM, GPU memory)
  • Practical knowledge of columnar and lakehouse formats (Parquet, Iceberg, Delta, Lance) and file-layout design
  • Deep interest and background in databases, query systems, and large-scale data platforms
  • Ability to work in-person four days/week in San Francisco Mission office
  • Background on storage or table-format teams (Iceberg, Delta, Lance, Hudi, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse)
  • Experience building databases or storage engines, or significant systems programming experience (Rust or modern C++)
  • Hands-on experience with vector indices and hybrid retrieval systems (HNSW, IVF, SCANN)
  • Familiarity with OLAP/lakehouse concepts: catalogs, file layout, compaction, manifest formats, time travel
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
20 Employees

What We Do

Eventual is building a Data Warehouse from the ground up that is designed to tackle the challenges of working with traditional data engineering and analytics alongside modern ML/AI workloads. Eventual has raised over $2.5M from investors including YCombinator, Array VC, Caffeinated Capital and top Silicon Valley executives and founders in companies such as Meta, Lyft and Databricks.

Similar Jobs

Zscaler Logo Zscaler

Development Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
102K-145K Annually

Comcast Logo Comcast

Senior Account Executive

Digital Media • Information Technology • News + Entertainment
Hybrid
Livermore, CA, USA
115000 Employees
73K-123K Annually

Comcast Logo Comcast

Account Executive

Digital Media • Information Technology • News + Entertainment
Hybrid
Sacramento, CA, USA
115000 Employees
50K-97K Annually

General Motors Logo General Motors

Senior Software Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Mountain View, CA, USA
165000 Employees
175K-222K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account