Tooling Engineer

Reposted 2 Days Ago
Be an Early Applicant
San Francisco, CA, USA
In-Office
220K-300K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Infrastructure as a Service (IaaS)
The Role
You will lead the build system and CI/CD processes at SFCompute, focusing on improving reproducibility, hermeticity, and speed. Your role includes auditing current systems, migrating to a new build system, ensuring effective CI, and collaborating with teams on new infrastructure developments.
Summary Generated by Built In

We're building the company which will de-risk the largest infrastructure build-out in history.

When people finance GPU clusters, the datacenters housing them, and the infrastructure powering them, they need "offtake" - meaning someone has signed a contract to lease the cluster for a period of time before its even built.

Financing a GPU cluster is inherently risky, since margins are thin and volumes are huge. Lenders don't want to take on the risk that cluster developers can't repay their loan, and cluster developers really don't want to risk not selling their cluster. As a result, risk is offloaded to the customer using fixed-price long-term contracts.

If you don't mitigate this customer risk, there's a bubble. This isn't SaaS anymore - application layer companies sign multi-year contracts for computer and inference, but sell to customers on monthly subscriptions. If you mess up a purchase, it's game over: a minor shift in your revenue growth rate might mean the difference between profit or bankruptcy. But what if companies could exit their contract by selling it back to the market?

Otherwise, as AI scales, compute only becomes available to folks who can effectively take on that risk. A 2-person startup in a San Francisco Victorian can't realistically sign a 5-year take or pay contract on $100m supercomputers. But they may be able to buy the month of liquidity that someone else sold back.

So that's what we make: a liquid market for GPU offtake.

About the Tooling Team

We are a small team focused on making SFCompute engineering faster, more observable, and more reliable. Our work spans data infrastructure, developer experience, pre-production environments, and AI tooling. The common thread is not any single domain. It is that we find the problems nobody else owns and turn them into solved problems.

We act as internal field engineers. Our job is to maximize the speed and effectiveness of everyone else at the company. The team is kept deliberately small and independent so it can respond directly to its internal customers without waiting on outside approval. We own a graph of metrics and we are driven by goals, not by a ticket queue.

Everyone here wears many hats. You will work across the stack, collaborate with every part of engineering, and regularly take on problems that do not fit neatly into a job description. If you want a narrow scope and a clear ticket queue, this team is not it. If you want a large, legible impact on a small team building serious infrastructure, read on.

About SFCompute

The San Francisco Compute Company runs large-scale GPU clusters (H100s, H200s, B300s) on contracts you can exit. Need 256 H100s for three days? Buy them at market price, cancel what you don't use. We operate the stack from UEFI up, so you are never paying a reseller markup or waiting on a support ticket. Customers include NVIDIA, MIT, Liquid AI, and Roboflow. We are a small team that has managed over $1B of hardware and is building what we think will be the defining infrastructure marketplace for the AI era.

The Role

We are looking for a generalist Tooling Engineer to own the systems that sit underneath every engineer's daily work. You will embed with the people you serve, watch how they actually work, and fix what slows them down. Some weeks that means build pipelines and infrastructure as code. Other weeks it means a staging environment, a per-engineer developer sandbox, an internal data pipeline, or better tooling for AI coding agents. You decide what matters most by talking to your internal customers, not by waiting for a spec.

This is not a "build dashboards and wait for requests" role. The team owns the software development lifecycle, and the gaps in it are yours to close. You will need to scope your own work, ship it, and then stand up in front of the company every two weeks and show what improved.

What You'll Do
  • Embed with engineers across the company, learn their workflows, and find the bottlenecks nobody owns

  • Build and operate the systems beneath daily engineering work: build pipelines, infrastructure as code, internal services, and the production platform

  • Pick up problems across our focus areas as priorities shift: pre-production and staging environments, isolated developer and agent sandboxes, internal data and observability pipelines, and AI coding tooling

  • Drive ambiguous problems to clear outcomes, deciding what to build, not just how to build it

  • Demo your work to the whole company on a regular cadence and take candid feedback well

  • Track the metrics that show whether engineering is getting faster and more reliable (deployment frequency, lead time for changes, change failure rate, time to restore)

What We're Looking For
  • Ability to scope your own work and operate without a spec. The first job is figuring out what the problem actually is

  • Strong communication. You can explain your work clearly, demo it, and write it down

  • Genuine curiosity about how other people work and a habit of walking up to anyone to investigate their workflow

  • Comfort with ambiguity and a small ego, in the sense of caring more about the outcome than about owning it

  • Solid engineering fundamentals and the intellectual honesty to say when you do not know something

  • Nice to have: experience with CI/CD, infrastructure as code (Terraform, Helm, or similar), Kubernetes, ETL and analytical data stores, or AI coding tools such as Claude Code; familiarity with marketplace or infrastructure business models

Why This Role

The tooling team is small, trusted, and independent. You will have direct access to the engineers you serve and to leadership, and the backing to fix things the right way rather than just document them. The systems you build sit underneath everyone's daily work, so when they get better, the whole company feels it. The work you produce is a real artifact, not a presentation deck, and on a team this size your impact is immediate and legible.

BenefitsGenerous equity grant

Team members are offered a competitive salary along with equity in the company

Visa Sponsorships

Yes, we sponsor visas and work permits

Retirement matching

We match 401(k) plans up to 4%

Medical, dental & vision

We offer competitive medical, dental, vision insurance for employees and dependents and cover 100% of premiums

Time off

We offer unlimited paid time off as well as 10+ observed holidays

Parental leave

We offer biological, adoptive, and foster parents paid time off to spend quality time with family

Daily lunch

We cover lunch daily for employees

Unlimited office book budget

You can buy as many books for the office as you want

The San Francisco Compute Company is committed to maintaining a workplace free from discrimination and harassment.

We make employment decisions based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, belief, national origin, social or ethical origin, age, physical, mental, or sensory disability, sexual orientation, gender identity or expression, marital status, civil union or domestic partnership status, past or present military service, HIV status, family medical history or genetic information, family or parental status including pregnancy, or any other status protected by law.

We welcome the opportunity to consider qualified applicants with prior arrest or conviction records. Our commitment to diversity includes hiring talented individuals regardless of their criminal history, in accordance with local, state, and federal laws, including San Francisco’s Fair Chance Ordinance and California’s ban-the-box laws.

Skills Required

  • Senior or staff-level experience running Bazel, Buck2, Pants or comparable build systems
  • Experience operating remote execution and remote caching in production
  • Comfortable across language ecosystems, specifically TypeScript and Rust
  • Strong opinions on determinism and reproducibility
  • CI ops experience in queue health and build time budgets
  • Able to scope your own work without defined specifications
  • Experience moving codebases onto Bazel or off of it (nice to have)
  • Experience with polyglot or protobuf-heavy monorepos (nice to have)
  • Prior work on developer infrastructure at autonomy, robotics, or systems company (nice to have)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
30 Employees
Year Founded: 2023

What We Do

San Francisco Compute Company operates a marketplace for large-scale GPU clusters, enabling users to buy and sell compute contracts with flexible terms. They aim to make AI compute more accessible and affordable by creating a liquid market for GPU offtake.

Similar Jobs

Airwallex Logo Airwallex

Senior Software Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Hybrid
San Francisco, CA, USA
2200 Employees
200K-250K Annually

Airwallex Logo Airwallex

Staff Software Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Hybrid
San Francisco, CA, USA
2200 Employees
200K-280K Annually
In-Office
Sunnyvale, CA, USA
3411 Employees
149K-164K Annually

Northrop Grumman Logo Northrop Grumman

Staff Tooling Engineer - Level 5

Aerospace • Logistics • Security • Software • Cybersecurity
In-Office
Commerce, CA, USA
85636 Employees
154K-230K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account