Staff/Lead Site Reliability Engineer (SRE)

Posted 12 Hours Ago
Easy Apply
Be an Early Applicant
San Francisco, CA, USA
In-Office
201K-251K Annually
Senior level
Healthtech
Transforming Cardiovascular Care Through Innovation
The Role
The Staff SRE will lead the design and operation of cloud infrastructure, improve observability, mentor engineers, and enhance incident response systems.
Summary Generated by Built In

Heartflow is a medical technology company advancing the diagnosis and management of coronary artery disease, the #1 cause of death worldwide, using cutting-edge technology. The flagship product—an AI-driven, non-invasive cardiac test supported by the ACC/AHA Chest Pain Guidelines called the Heartflow FFRCT Analysis—provides a color-coded, 3D model of a patient’s coronary arteries indicating the impact blockages have on blood flow to the heart. Heartflow is the first AI-driven non-invasive integrated heart care solution across the CCTA pathway that helps clinicians identify stenoses in the coronary arteries (RoadMap™Analysis), assess coronary blood flow (FFRCT Analysis), and characterize and quantify coronary atherosclerosis (Plaque Analysis). Our pipeline of products is growing and so is our team; join us in helping to revolutionize precision heartcare.

Heartflow is a publicly traded company (HTFL) that has received international recognition for exceptional strides in healthcare innovation, is supported by medical societies around the world, cleared for use in the US, UK, Europe, Japan and Canada, and has been used for more than 500,000 patients worldwide.  

HeartFlow is transforming cardiovascular care with cutting-edge, non-invasive technology. We are launching a massive Platform Modernization initiative to power the next generation of our life-saving medical products.

We're looking for an experienced Site Reliability Engineer (SRE) to join our cloud-native infrastructure team. You will work closely with our Platform engineers and development teams to ensure our critical systems are highly available, scalable, observable, and performant. If you thrive on eliminating toil, automating complex operations, and defining the standards for production excellence, we want to talk to you.

Job Responsibilities

As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include:

As a Staff SRE, you'll operate at the highest level of technical expertise and influence. You won't just solve problems; you'll prevent them at a fundamental level across organizational boundaries.

  • Lead the design, implementation, and operation of reliable, scalable cloud infrastructure
  • Define and begin rollout of SLI/SLO standards across microservices
  • Develop self-service instrumentation tooling enabling engineering teams to own observability
  • Establish observability and monitoring using OSS toolchain 
  • Serve as a technical escalation point for critical incidents, perform deep-dive root cause analyses (RCAs), and implement robust corrective measures to prevent recurrence.
  • Enhance our monitoring, logging, and tracing systems to provide comprehensive visibility into system health.
  • Set the technical direction and best practices for the entire SRE and engineering organization. Mentor mid-level and senior engineers on design patterns, operational rigor, and reliability principles.

We're looking for a leader and a deep technical expert with a proven track record of solving the hardest scaling and reliability challenges.

Required Qualifications
  • 8+ years of progressive experience in Site Reliability Engineering, Production Engineering, or a closely related role.
  • Deep expertise with:
    • AWS 
    • Kubernetes, Helm
    • Observability stack (Prometheus, Grafana, Mimir, Loki, Pixie, Tempo)
    • CI/CD systems (ArgoCD, Harness)
  • Fluency in at least one major scripting/programming language for automation and tooling (e.g., Python, Go, or Java).
  • Hands-on engineering mindset — able to instrument services directly, not just configure tooling
  • Track record of building or significantly improving incident detection and response systems
  • Have deep technical familiarity with Kubernetes ecosystems, containerization technologies, and modern IaC tooling (e.g., Terraform, Crossplane, or Operators) so you can effectively guide the team's technical decisions
  • Exceptional communication skills, capable of explaining complex technical issues to both technical and non-technical audiences.
Nice-to-Have
  • Experience implementing Service Mesh technologies (e.g., Istio, Linkerd).
  • A strong understanding of security principles and practices in a cloud environment.
  • Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer).

A reasonable estimate of the base salary compensation range is $200,750 to $250,922, cash bonus, and equity. #LI-IB1 #LI-Hybrid

#sre #kubernetes #openrole

Heartflow is an Equal Opportunity Employer. We are committed to a work environment that supports, inspires, and respects all individuals and do not discriminate against any employee or applicant because of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. This policy applies to every aspect of employment at Heartflow, including recruitment, hiring, training, relocation, promotion, and termination.
 
Positions posted for Heartflow are not intended for or open to third party recruiters / agencies. Submission of any unsolicited resumes for these positions will be considered to be free referrals.
 
Heartflow has become aware of a fraud where unknown entities are posing as Heartflow recruiters in an attempt to obtain personal information from individuals as part of our application or job offer process. Before providing any personal information to outside parties, please verify the following: A) all legitimate Heartflow recruiter email addresses end with “@heartflow.com” and B) the position described is found on our careers site at www.heartflow.com/about/careers/. 

Top Skills

Argocd
AWS
Crossplane
Go
Grafana
Harness
Helm
Java
Kubernetes
Loki
Mimir
Pixie
Prometheus
Python
Tempo
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Austin, TX
650 Employees
Year Founded: 2010

What We Do

Heartflow is the global leader in AI-driven coronary artery disease (CAD) management, transforming how CAD — the world’s leading cause of death — is diagnosed and treated. Our advanced technology generates personalized, precision 3D heart models from a single CT scan, providing clinicians with the clarity and confidence to deliver earlier, more effective treatments — transforming CAD into a disease that can be managed for life. Heartflow One is the only complete, non-invasive, precision coronary care platform providing patient insights throughout the guideline-directed CCTA pathway. The AI-driven platform — including Roadmap™ Analysis, FFRCT Analysis and Plaque Analysis — is supported by the ACC/AHA Chest Pain Guideline and backed by more than 600 peer-reviewed publications. With over 400,000 patients treated, more than 1,400 leading institutions adopting our solution, and 99.5% of U.S. lives covered — Heartflow is redefining the standard of coronary care. We're a global company, with employees across the United States, Europe and Japan. Our headquarters are in Mountain View, California, with additional offices in California, Texas, the UK, and Japan. We believe CAD shouldn’t be a silent threat. By making it screenable, diagnosable, and manageable, we’re changing the story of CAD, empowering clinicians to save lives and giving patients more time for what matters most.

Why Work With Us

Join Us to Rewrite the Story of CAD.

Similar Jobs

MongoDB Logo MongoDB

Senior Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
6 Locations
5550 Employees
127K-249K Annually

Crexi Logo Crexi

Senior Site Reliability Engineer

Real Estate • Sales • Software • PropTech
Easy Apply
Hybrid
Los Angeles, CA, USA
400 Employees
160K-214K Annually

Unify (unifygtm.com) Logo Unify (unifygtm.com)

Site Reliability Engineer

Artificial Intelligence • Software
In-Office or Remote
2 Locations
64 Employees
250K-295K Annually

Poshmark Logo Poshmark

Senior Site Reliability Engineer

Consumer Web • eCommerce • Fashion • Retail
Hybrid
Redwood City, CA, USA
850 Employees
156K-261K Annually

Similar Companies Hiring

Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account