Site Reliability Engineer

Posted 5 Days Ago
San Francisco, CA
Hybrid
140K-185K Annually
Mid level
Artificial Intelligence • Healthtech
The Role
Participate in incident response, improve operational reliability, manage Kubernetes and cloud infrastructure. Automate tasks, enhance observability, and support safe changes while collaborating with engineering teams.
Summary Generated by Built In
Who We Are

Healthcare needs a better rhythm: one that keeps care continuous and deeply human. Heidi is building an AI Care Partner that works alongside clinicians to make that possible.

We’re a team of doctors, engineers, designers, researchers, and creatives building tools that help clinicians stay focused on what matters most: their patients.

In just 18 months, Heidi has given back more than 18 million hours to healthcare professionals — supporting 73 million patient visits in 116 countries. Today, more than two million patient visits each week are powered by Heidi worldwide.

Backed by nearly $100 million in funding, we’re growing in the US, UK, Canada, and Europe, partnering with leading health systems including the NHS, Beth Israel Lahey Health, and Monash Health.

What you’ll do
  • Participate in on-call and incident response:

    Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.

  • Improve operational reliability:

    Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.

  • Own parts of the production environment:

    Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.

  • Strengthen observability:

    Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.

  • Reduce operational toil:

    Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.

  • Support safe change:

    Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.

  • Contribute to operational practices:

    Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.

  • Collaborate closely with engineers:

    Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.

What we’re looking for

  • 3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.

  • Experience supporting production systems and participating in on-call rotations.

  • Comfortable debugging live systems under pressure.

  • Experience operating cloud infrastructure (AWS preferred).

  • Working knowledge of Kubernetes and containerised workloads.

  • Infrastructure as Code experience (Terraform or similar).

  • Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).

  • Scripting or automation experience (Python, Bash, or similar).

The way we work

1. Build to Last

We design for safety and reliability so clinicians, patients, and our teams can trust what we build every day.

2. Own Your Practice

Ideas rise on merit, not title, and everyone shares responsibility for the standards we set together.

3. Move Fast, Stay Steady

We move quickly but never at the cost of trust. Progress only matters if people can depend on what we make.

4. Make Others Better

Honest feedback, steady support, and shared growth keep our teams improving together.

Why you will flourish with us 🚀?

  • In office to collaborate with like-minded professionals

  • Healthcare, Dental, Vision benefit options

  • 401k with 3% match

  • Personal development budget of $500 per annum

  • Become an owner, with shares (equity) in the company, if Heidi wins, we all win

  • The rare chance to create a global impact as you immerse yourself in one of the leading healthtech startups

  • The opportunity to fast track your startup career!

Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We're proud to be an equal opportunity employer and welcome all applicants as we're committed to promoting a culture of opportunity for all.

Top Skills

AWS
Bash
Datadog
Kubernetes
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cremorne, Victoria
112 Employees
Year Founded: 2019

What We Do

Heidi Health is the team behind the world’s most loved AI scribe used daily by tens of thousands of clinicians in over 50 countries scribing millions of consults every month. Where other scribes end at transcription, Heidi is just getting started. Heidi’s real power is its ability to personalize notes with customized templates, create any healthcare document with a simple prompt, enable seamless team collaboration through shared sessions for multi-disciplinary care and more. From solo practitioners to large hospital networks, primary care to neurology to OBGYN, Heidi adapts to unique workflows across all specialties. Heidi is safe for every clinician to use with HIPAA and NHS compliance fortified with SOC2 and ISO 27001 security. Join the revolution at www.heidihealth.com – scribing is free, and it’s just the beginning.

Similar Jobs

Ericsson Logo Ericsson

Site Reliability Engineer

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
Santa Clara, CA, USA
89000 Employees
138K-173K Annually

DFIN Logo DFIN

Site Reliability Engineer

Fintech • Software
Remote or Hybrid
United States
1750 Employees

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
San Jose, CA, USA
8697 Employees
119K-170K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
182K-260K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account