Senior SRE

Sorry, this job was removed at 04:13 a.m. (CST) on Tuesday, May 27, 2025
Be an Early Applicant
3 Locations
In-Office or Remote
Artificial Intelligence • Healthtech
The Role

Who are Heidi? 

Heidi is on a mission to halve the time it takes to deliver world-class care. 

We believe that by 2050, every clinician will practice with AI systems that free them from administrative burdens and increase the quality and accessibility of care to patients across the world. 

Built for clinicians, by clinicians, at the core of Heidi is its people. We are an eclectic bunch of inventors, builders, scientists, nurses, doctors, mathematicians, designers, creatives, and high-agency executors.

 We achieve in 6 months what it takes our competitors 4 years to do. In just 12 months, 20 million patient consults were supported by Heidi, and we’re now powering more than 1 million consults every week.

With our most recent $16.6MM round of funding from leading VC firms, we’re geared up to supercharge our ambitious global growth, starting with the US, Canada, UK and Europe - and we need great people like you to get there. Ready for the challenge?

The Role

As a Senior Site Reliability Engineer at Heidi, you'll be instrumental in establishing and scaling our reliability practices while ensuring robust, secure, and observable systems.

You'll work closely with our engineering team to implement comprehensive monitoring, incident management, and reliability processes for our AI-powered healthcare solutions.

Primary Responsibilities:

Observability & Monitoring

  • Design and implement comprehensive observability strategies using Datadog, or other tooling that you are able to convince us with!

  • Implement OpenTelemetry instrumentation across our backend and frontend services

  • Set up real user monitoring (RUM) and application performance monitoring (APM) to ensure end-to-end visibility

  • Create and maintain dashboards that provide meaningful insights for different stakeholders (technical teams, support, management)

  • Monitor and optimise third-party service integrations, particularly for critical services

Incident Management & Response

  • Establish and implement incident management processes from the ground up

  • Evaluate and implement appropriate incident management tools that integrate with our observability stack

  • Create and maintain incident response playbooks and automated runbooks

  • Lead post-incident reviews and foster a blameless culture

  • Implement and maintain on-call rotations and escalation policies

SLA & SLO Management

  • Define and implement SLOs that align with business requirements and customer expectations

  • Set up error budgets and tracking mechanisms

  • Create comprehensive SLA reporting for enterprise customers

  • Design and implement SLI metrics that provide meaningful insights into service health

Cost Optimisation & Efficiency

  • Optimise observability costs through efficient logging and metrics collection

  • Implement log management and retention strategies

  • Fine-tune alerting to minimise alert fatigue while maintaining service reliability

  • Evaluate and recommend cost-effective tooling solutions

Key Requirements:

  • Extensive experience with observability platforms (Datadog preferred) and understanding of observability architecture

  • Strong knowledge of OpenTelemetry and modern instrumentation practices

  • Experience implementing APM and RUM in Python and React/React Native environments

  • Track record of establishing incident management processes and fostering a blameless culture

  • Experience defining and implementing SLAs/SLOs for enterprise customers

  • Strong background in monitoring distributed systems and third-party service integrations

  • Experience with cloud infrastructure (AWS required, Azure and GCP beneficial)

  • Proven track record in implementing SRE practices and reliability improvements

Preferred Qualifications:

  • Experience with chaos engineering practices

  • Knowledge of automated runbook implementation

  • Healthcare industry experience

  • Understanding of HIPAA or similar healthcare compliance frameworks

What we will look for:

  • Problem-solving mindset with a focus on reliability and scalability

  • Strong communication skills to work with cross-functional teams

  • Ability to balance technical requirements with business needs

  • Experience in fast-paced startup environments

  • Dedication to maintaining high standards in a regulated environment

What do we believe in?

  • We create unconventional solutions to difficult problems and we build them fast. We want you to set impossible goals and make them happen, think landing a rocket but the medical version.

  • You'll be surrounded by a world-class team of engineers, medicos and designers to do your best work, inspired by our shared beliefs:

    • We will stop at nothing to improve patient care across the world.

    • We design user experiences for joy and ship them fast.

    • We make decisions in a flat hierarchy that prioritises the truth over rank.

    • We provide the resources for people to succeed and give them the freedom to do it.

Why you will flourish with us 🚀?

  • Flexible hybrid working environment, with 3 days in the office.

  • Additional paid day off for your birthday and wellness days

  • Special corporate rates at Anytime Fitness in Melbourne, Sydney tbc.

  • A generous personal development budget of $500 per annum

  • Learn from some of the best engineers and creatives, joining a diverse team

  • Become an owner, with shares (equity) in the company, if Heidi  wins, we all win

  • The rare chance to create a global impact  as you immerse yourself in one of Australia’s leading healthtech startups 

  • If you have an impact quickly, the opportunity to fast track your startup career!

Similar Jobs

Easy Apply
Remote
Australia
91 Employees
80K-150K Annually

Broadridge Logo Broadridge

Senior Site Reliability Engineer

Fintech • Financial Services
In-Office or Remote
2 Locations
14000 Employees

Algolia Logo Algolia

Senior Site Reliability Engineer

Natural Language Processing • Software
Easy Apply
Remote
Australia
700 Employees

Red Hat Logo Red Hat

Senior Site Reliability Engineer

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Remote
Australia
20000 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cremorne, Victoria
112 Employees
Year Founded: 2019

What We Do

Heidi Health is the team behind the world’s most loved AI scribe used daily by tens of thousands of clinicians in over 50 countries scribing millions of consults every month. Where other scribes end at transcription, Heidi is just getting started. Heidi’s real power is its ability to personalize notes with customized templates, create any healthcare document with a simple prompt, enable seamless team collaboration through shared sessions for multi-disciplinary care and more. From solo practitioners to large hospital networks, primary care to neurology to OBGYN, Heidi adapts to unique workflows across all specialties. Heidi is safe for every clinician to use with HIPAA and NHS compliance fortified with SOC2 and ISO 27001 security. Join the revolution at www.heidihealth.com – scribing is free, and it’s just the beginning.

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account