Site Reliability Engineer

Reposted 23 Days Ago
Palo Alto, CA
In-Office
120K-140K Annually
Senior level
Hardware • Manufacturing
The Role
As an SRE, you'll maintain service reliability, operate monitoring tools, automate tasks in Python, and manage incident responses.
Summary Generated by Built In

PsiQuantum’s mission is to build the first useful quantum computers—machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has been to build and deploy million-qubit, fault-tolerant quantum systems. 

Quantum computers harness the laws of quantum mechanics to solve problems that even the most advanced supercomputers or AI systems will never reach. Their impact will span energy, pharmaceuticals, finance, agriculture, transportation, materials, and other foundational industries. 

Our architecture and approach is based on silicon photonics. By leveraging the advanced semiconductor manufacturing industry—including partners like GlobalFoundries—we use the same high-volume processes that already produce billions of chips for telecom and consumer electronics. Photonics offers natural advantages for scale: photons don’t feel heat, are immune to electromagnetic interference, and integrate with existing cryogenic cooling and standard fiber-optic infrastructure. 

In 2024, PsiQuantum announced government-funded projects to support the build-out of our first utility-scale quantum computers in Brisbane, Australia, and Chicago, Illinois. These initiatives reflect a growing recognition that quantum computing will be strategically and economically defining—and that now is the time to scale. 

PsiQuantum also develops the algorithms and software needed to make these systems commercially valuable. Our application, software, and industry teams work directly with leading Fortune 500 companies—including Lockheed Martin, Mercedes-Benz, Boehringer Ingelheim, and Mitsubishi Chemical—to prepare quantum solutions for real-world impact. 

Quantum computing is not an extension of classical computing. It represents a fundamental shift—and a path to mastering challenges that cannot be solved any other way. The potential is enormous, and we have a clear path to make it real. 

Come join us. 

Job Summary: 

Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs/SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments. 

Responsibilities: 

  • Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs/SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects. 
  • Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP/EVPN stability. 
  • Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers. 
  • Drive incident response: triage, mitigate, perform post-incident reviews, and implement preventive actions—particularly for network-related outages, congestion, or misconfigurations. 
  • Develop automation and self-service tooling in Python/Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics. 
  • Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management. 
  • Improve CI/CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments. 
  • Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration 

Experience/Qualifications: 

  • Bachelor’s Degree or higher in Computer Science, Engineering, or related technical field. 
  • 5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production. 
  • Hands-on expertise with observability tools: Grafana, Prometheus, Loki, Tempo (or equivalent). 
  • Proven track record designing dashboards and alerts around golden signals and USE/RED methodologies, extended to network utilization, saturation, and error metrics. 
  • Solid scripting/automation skills in Python and Bash; familiarity with GitLab CI pipelines. 
  • Operational experience with Kubernetes and containerized workloads. 
  • Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN/EVPN). 
  • Experience running incident response and writing actionable post-mortems, including for network-related events. 
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management. 
  • Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem/cloud topologies is a plus. 
  • Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers. 

 

PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

Note: PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to [email protected].

We are not accepting unsolicited resumes from employment agencies.

The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

U.S. Base Pay Range
$120,000$140,000 USD
Bay Area Pay Range
$145,000$165,000 USD

Top Skills

Ansible
AWS
Bash
Gitlab
Grafana
Kubernetes
Loki
Prometheus
Python
Tempo
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, California
265 Employees
Year Founded: 2015

What We Do

Quantum computing will be a world-changing technology with the potential to unlock powerful advances in medicine, energy, finance and beyond. At PsiQuantum, we’re focused on building the world’s first useful quantum computer.

A useful quantum computer requires at least 1,000,000 qubits and error correction. We believe photonics is the only path to building a useful quantum computer.

Our team at PsiQuantum is a mix of quantum physicists, semiconductor, systems, and software engineers, system architects and more. Error correction is at the centre of everything we do; and we focus on solving real-world problems.

If you’re interested in joining our team, we are always open to hearing from exceptional people interested in working on one of the defining technologies of our lifetime.

Similar Jobs

Celonis Logo Celonis

Site Reliability Engineer

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
Redwood City, CA, USA
3000 Employees
195K-235K Annually

Cox Enterprises Logo Cox Enterprises

Site Reliability Engineer

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Irvine, CA, USA
50000 Employees
120K-199K Annually

Block Logo Block

Site Reliability Engineer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office
San Francisco, CA, USA
12000 Employees
185K-327K Annually
Easy Apply
In-Office
San Francisco, CA, USA
561 Employees
169K-276K Annually

Similar Companies Hiring

Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
155 Employees
Blissway Thumbnail
Transportation • Software • Machine Learning • Internet of Things • Hardware • Fintech • Computer Vision
Denver, Colorado
20 Employees
Turion Space Thumbnail
Software • Manufacturing • Information Technology • Hardware • Defense • Artificial Intelligence • Aerospace
Irvine, CA
150 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account