Head of Site Reliability Engineering at ChipStack

Sorry, this job was removed at 12:11 a.m. (CST) on Monday, Sep 08, 2025
Be an Early Applicant
San Jose, CA
In-Office
Artificial Intelligence • Semiconductor
The Role

Locations • San Jose, CA – On‑site • Full‑time • Engineering

About ChipStack

Chips power everything, yet chip‑design tooling hasn’t kept up with the exploding complexity. ChipStack reinvents verification with AI‑native software already in use at 10+ semiconductor innovators. Backed by Khosla Ventures, Cerberus, and Clear Ventures, our small, fast team ships at the intersection of AI, EDA, and systems engineering.

The Opportunity

We need rock‑solid, low‑latency deployments—often inside customer data centers with no internet egress. As our first dedicated reliability owner, you’ll design, automate and operate these hybrid/on‑prem environments so customers experience “five nines” availability without touching the underlying plumbing.

What You’ll Do
  • Own end‑to‑end reliability – architect, deploy, and monitor production clusters (on‑prem & cloud) running our Python/TypeScript micro‑services, LLM workloads and GPU back‑ends.

  • Automate the stack – build IaC pipelines (Terraform), GitOps workflows and zero‑downtime rollout strategies.

  • Observe & respond – instrument apps with Prometheus/Grafana, set SLOs/SLIs, lead incident response, perform root‑cause analysis, and harden runbooks.

  • Secure & comply – implement network segmentation, secrets management, RBAC and vulnerability scanning to satisfy strict semiconductor‑industry requirements.

  • Collaborate – pair with product engineers on performance profiling, scalability bottlenecks and customer issue triage.

  • Continually improve – champion best practices in testing, CI/CD, and chaos drills to push our “ship fast, ship quality” culture.

Must‑Have Skills
  • 5+ years building and operating production systems as an SRE / DevOps / Platform Engineer.

  • Hands‑on expertise with Kubernetes and Docker in hybrid or bare‑metal setups.

  • Strong Python for automation tooling; proficiency reading TypeScript services.

  • Deep Linux administration knowledge (kernel tuning, networking, storage, security hardening).

  • Proven track record delivering 99.9 %+ uptime for latency‑sensitive services.

  • Observability stack experience (Prometheus, Grafana, Loki / ELK, Alertmanager).

  • Proficiency with Terraform (or equivalent IaC) and Git‑based workflows.

  • Excellent communication and a bias for action when facing vague, first‑of‑its‑kind problems.

Nice‑to‑Have
  • Experience running GPU workloads, ML inference or EDA toolchains in production.

  • Familiarity with air‑gapped / restricted‑network deployments and data‑center operations.

  • Exposure to security certifications (SOC 2, ISO 27001) or semiconductor customer audits.

  • Prior work at an early‑stage startup.

Our Culture (What You’ll Thrive In)
  • Challenge status‑quoStrong opinions, loosely heldShip fast, ship qualityProud of our craft

Ready to harden the infrastructure that will redefine chip design? Apply now and keep ChipStack running flawlessly for the world’s most advanced silicon teams.

Similar Jobs

NinjaOne Logo NinjaOne

Enterprise Account Executive

Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
Remote or Hybrid
California, USA
2000 Employees
150K-300K Annually

RapDev Logo RapDev

Senior Account Executive

Information Technology • Productivity • Professional Services • Software
Hybrid
California, USA
130 Employees
60K-150K Annually

Square Logo Square

Marketing Strategy Lead

eCommerce • Fintech • Hardware • Payments • Software • Financial Services
Remote or Hybrid
8 Locations
12000 Employees
136K-245K Annually

Square Logo Square

Design Director

eCommerce • Fintech • Hardware • Payments • Software • Financial Services
Remote or Hybrid
8 Locations
12000 Employees
252K-377K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Campbell, CA
21 Employees

What We Do

Reimagining how we design chips.

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account