Distributed Systems Engineer

Reposted 13 Days Ago
Be an Early Applicant
San Francisco, CA
In-Office
250K-500K Annually
Senior level
Artificial Intelligence
Code interpreting for your AI apps | Safely run AI-generated code in E2B sandbox
The Role
The role involves building a cloud platform for AI applications, managing distributed systems, ensuring efficient sandbox operations, and implementing observability.
Summary Generated by Built In

💻 Languages: 5+ years of experience in Go, Terraform

Skills: Go, Building and managing large clusters, Linux, Networking, Kubernetes, Virtualization

👉 Who we are

E2B is a fast growing Series A startup with 7-figure revenue. We've raised over $32M in total since our funding in 2023 and are supported by great investors like Insight Partners. Our customers are companies like Perplexity, Hugging Face, Manus, or Groq. We're building the next hyperscaler for AI agents.

👉 About the role

You will be building the next cloud platform for running AI software - a cloud where AI apps are building other software apps.

Your job will be:

  1. Building a distributed system for millions and billions of AI agents running on E2B

  2. Building an orchestrator for placing sandboxes in the right nodes

  3. Adding support for sandbox live migrations

  4. Making sure our self-hosting DX is as smooth as possible (we’re open-source)

  5. Not letting our sandboxes take more than 200ms to start (starting with the user hitting enter)

  6. Scaling to millions and later billions of sandboxes running at the same time

  7. Building an observability stack starting at the kernel level of virtual machines

We’re looking for an infrastructure engineer passionate about making things run fast and efficiently, and running A LOT of them at the same time.

If you aren’t afraid of going into the kernel of a VM and words like Firecracker, eBPF, UFFD, block device, L4 load balancing, noisy neighbor problem, or hugepages sound exciting to you, we want to hear from you!

👉 What we're looking for
  • 7+ years building distributed systems - You've operated infrastructure at serious scale (100K+ RPS, multi-region, PB-scale data) and understand the trade-offs between consistency, availability, and partition tolerance in practice, not just theory

  • Deep Linux internals expertise - You're comfortable working at the kernel level. You've debugged performance issues using eBPF, understand CPU scheduling, memory management, and can explain the difference between cgroups v1 and v2 without looking it up

  • VM hypervisor experience - You've worked with Firecracker, QEMU, KVM, or similar. You understand virtio, know what a hypercall is, and have opinions about nested virtualization trade-offs

  • Systems programming skills - Strong in at least one of: Go, Rust, C/C++. You've written performance-critical code and know when to reach for lock-free data structures, memory-mapped files, or io_uring

  • Production orchestration experience - You've built or operated orchestration systems (Kubernetes, Nomad, or custom). You understand bin-packing algorithms, resource scheduling, and have dealt with noisy neighbor problems at scale

  • Performance obsession - You've shaved milliseconds off hot paths, understand CPU caches and memory locality, and have profiled production systems under load. You know what "p99 latency" means and care deeply about making it better

  • Networking expertise - Strong understanding of L4/L7 load balancing, network namespaces, iptables/nftables, and how to build secure, isolated network topologies for multi-tenant systems

  • Located in San Francisco or willing to relocate - We work in person as a team and believe in the magic that happens when engineers collaborate face-to-face on hard problems

  • Excited about open source - Comfortable with our code and infrastructure being public. You contribute to discussions, write clear documentation, and help the community succeed with self-hosting

👉 Bonus points for:
  • Experience with userfaultfd (UFFD), copy-on-write mechanisms, or lazy loading

  • GPU passthrough or PCIe device virtualization experience

  • Built or maintained infrastructure for AI/ML workloads

  • Contributions to Firecracker, Cloud Hypervisor, or similar open source projects

  • Experience with observability at scale (distributed tracing, kernel-level metrics)

Top Skills

Go
Kubernetes
Linux
Networking
Terraform
Virtualization
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
17 Employees
Year Founded: 2023

What We Do

Code interpreting for your AI apps | Safely run AI-generated code in E2B sandbox

Similar Jobs

Verkada Inc Logo Verkada Inc

Senior Back-end Engineer

Cloud • Hardware • Security • Software
In-Office
San Mateo, CA, USA
2000 Employees
180K-260K Annually

OpenAI Logo OpenAI

Software Engineer

Artificial Intelligence • Machine Learning • Generative AI
In-Office
San Francisco, CA, USA
224 Employees
255K-405K Annually

Rubrik Logo Rubrik

Software Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Palo Alto, CA, USA
3000 Employees
127K-190K Annually

Salesforce Logo Salesforce

Software Engineer

Cloud • Software
In-Office
2 Locations
72000 Employees
126K-335K Annually

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account