Lead SRE- Observability

Posted 2 Hours Ago
Be an Early Applicant
2 Locations
Remote
143K-243K Annually
Senior level
Healthtech • Information Technology • Telehealth
Curing complexity to simplify the practice of care.
The Role
Lead the design, build, and operation of scalable observability and telemetry platforms. Implement IaC and automation, support monitoring/alerting, troubleshoot production distributed systems, participate in incident response/on-call, and mentor engineers while driving platform reliability and cross-team technical decisions.
Summary Generated by Built In

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Lead SRE- Observability

Position Summary
athenahealth is seeking a Lead Site Reliability Engineer to help evolve our observability and telemetry platform. In this role, you will design and operate scalable, resilient infrastructure that enables engineering teams to monitor, troubleshoot, and run distributed services reliably at scale.

You will partner across engineering organizations to improve reliability, operational efficiency, and developer productivity while advancing SRE and automation best practices across athenahealth.

Why This Role Matters
This role is central to improving the reliability, visibility, and scalability of the systems that support athenahealth’s engineering and cloud environments. The work directly enables better incident response, stronger operational performance, and more reliable healthcare technology.

About the Team
The Observability Engineering team builds and operates the telemetry, monitoring, and reliability platforms that support athenahealth’s cloud infrastructure and engineering organizations. The team processes large volumes of logs, metrics, traces, and events that help teams develop, troubleshoot, and operate highly available healthcare applications.

The team works closely with Cloud Engineering & Operations and R&D to improve observability, operational efficiency, and platform reliability through scalable infrastructure and automation-first engineering practices.

Essential Job Responsibilities:

  • Observability platform engineering

    • Build and operate scalable observability and telemetry platforms that process logs, metrics, traces, and events across production environments.

    • Support monitoring, alerting, and instrumentation strategies that improve service visibility and operational insight.

    • Partner with engineering teams to strengthen telemetry collection and overall observability.

  • Infrastructure and automation

    • Design resilient, automated infrastructure and platform services that improve reliability, scalability, and efficiency.

    • Develop Infrastructure as Code and automation solutions that reduce toil and improve consistency.

    • Lead technical initiatives from architecture through implementation with attention to performance, reliability, security, and maintainability.

  • Production support and incident response

    • Troubleshoot complex production issues involving distributed systems, Linux infrastructure, networking, cloud services, and telemetry pipelines.

    • Participate in incident response and on-call processes.

    • Help drive operational excellence, root cause analysis, and continuous improvement.

  • Technical leadership and mentoring

    • Mentor engineers on SRE best practices, observability strategy, and scalable systems design.

    • Contribute to long-term platform strategy and reliability improvements.

    • Influence technical decisions across engineering organizations.

Expected Education & Experience:

  • 7+ years of experience operating and engineering large-scale production infrastructure and distributed systems.

  • Strong expertise in Linux systems engineering, cloud infrastructure, and SRE practices.

  • Proven experience designing and operating observability and telemetry platforms.

  • Hands-on experience with tools and technologies such as OpenSearch/Elasticsearch, Kafka, Prometheus, Grafana, Vector, Fluentd, OpenTelemetry, ClickHouse, or similar.

  • Experience building Infrastructure as Code solutions using Terraform, CloudFormation, or equivalent tooling.

  • Strong automation and software engineering skills using Python, Golang, or Bash.

  • Experience troubleshooting large-scale distributed systems in production with a focus on availability, performance, scalability, and resiliency.

  • Experience operating services in cloud-native environments, including AWS and containerized platforms.

  • Strong understanding of monitoring strategy, telemetry pipelines, incident response, root cause analysis, and operational excellence.

  • Ability to communicate effectively across engineering organizations and influence technical decision-making.

Preferred Experience:

  • Experience operating high-scale telemetry or analytics platforms with large ingestion volumes.

  • Experience with Kubernetes, Docker, CI/CD systems, and modern platform engineering practices.

  • Strong networking and troubleshooting experience using tools such as tcpdump and Wireshark.

  • Experience leading cross-functional engineering efforts and mentoring within SRE or infrastructure organizations.

  • Familiarity with healthcare technology or other highly regulated production environments.


Expected Compensation

$143,000 - $243,000

The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates.  Base pay is only one part of our competitive Total Rewards package - depending on role eligibility, we offer both short and long-term incentives by way of an annual discretionary bonus plan, variable compensation plan, and equity plans.


About athenahealth

Our vision: In an industry that becomes more complex by the day, we stand for simplicity. We offer IT solutions and expert services that eliminate the daily hurdles preventing healthcare providers from focusing entirely on their patients — powered by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Our company culture: Our talented  employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our vision. We are a diverse group of dreamers and do-ers with unique knowledge, expertise, backgrounds, and perspectives. We unite as mission-driven problem-solvers with a deep desire to achieve our vision and make our time here count. Our award-winning culture is built around shared values of inclusiveness, accountability, and support.

Our DEI commitment: Our vision of accessible, high-quality, and sustainable healthcare for all requires addressing the inequities that stand in the way. That's one reason we prioritize diversity, equity, and inclusion in every aspect of our business, from attracting and sustaining a diverse workforce to maintaining an inclusive environment for athenistas, our partners, customers and the communities where we work and serve.

What we can do for you:

Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative  workspaces  — some offices even welcome dogs.

We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment,full-time. With consistent communication and digital collaboration tools, athenahealthenablesemployees to find a balance that feels fulfilling and productive for each individual situation.

In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. We provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued. 

Learn more about our culture and benefits here: athenahealth.com/careers  

https://www.athenahealth.com/careers/equal-opportunity

Skills Required

  • 7+ years operating and engineering large-scale production infrastructure and distributed systems
  • Strong expertise in Linux systems engineering
  • Strong expertise in cloud infrastructure and SRE practices
  • Proven experience designing and operating observability and telemetry platforms
  • Hands-on experience with OpenSearch/Elasticsearch, Kafka, Prometheus, Grafana, Vector, Fluentd, OpenTelemetry, ClickHouse, or similar
  • Experience building Infrastructure as Code using Terraform, CloudFormation, or equivalent tooling
  • Strong automation and software engineering skills using Python, Golang, or Bash
  • Experience troubleshooting large-scale distributed systems in production (availability, performance, scalability, resiliency)
  • Experience operating services in cloud-native environments, including AWS and containerized platforms
  • Strong understanding of monitoring strategy, telemetry pipelines, incident response, root cause analysis, and operational excellence
  • Ability to communicate effectively across engineering organizations and influence technical decision-making
  • Experience operating high-scale telemetry or analytics platforms with large ingestion volumes
  • Experience with Kubernetes, Docker, CI/CD systems, and modern platform engineering practices
  • Strong networking and troubleshooting experience using tcpdump and Wireshark
  • Experience leading cross-functional engineering efforts and mentoring within SRE or infrastructure organizations
  • Familiarity with healthcare technology or other highly regulated production environments

athenahealth Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about athenahealth and has not been reviewed or approved by athenahealth.

  • Healthcare Strength Health coverage is described as comprehensive, including medical, dental, and vision options alongside additional protections like accident and critical illness coverage. Mental health support and EAP-style counseling resources are also part of the package.
  • Leave & Time Off Breadth Time-away offerings include PTO that covers vacation and sick time, paid holidays, and options for leaves of absence and sabbaticals. Flexible time off is positioned as a meaningful part of the overall rewards package for some roles.
  • Retirement Support Retirement benefits include a 401(k) plan with employer matching, supported by broader financial wellbeing resources. Equity and performance bonuses are also referenced as part of total rewards.

athenahealth Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Boston, MA
7,200 Employees
Year Founded: 1997

What We Do

athenahealth strives to cure complexity and simplify the practice of healthcare. Our innovative technology includes electronic health records, revenue cycle management, and patient engagement solutions that help healthcare providers, administrators, and practices eliminate friction for patients while getting paid efficiently. athenahealth partners with practices with purpose-built software backed by expertise to produce the insights needed to drive better clinical and financial outcomes. We’re inspired by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.  For more information, please visit www.athenahealth.com

Why Work With Us

We are here to make an impact on the healthcare industry at scale. We enable our diverse teams to move fast, grapple with interesting technical challenges, and innovate at every level. We are on a modernization journey and build on the hybrid cloud. We deliver best-in-class solutions to help every patient receive the best possible care.

Gallery

Gallery

Similar Jobs

Samsara Logo Samsara

Manager, SMB

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
9 Locations
4000 Employees
169K-242K Annually

Samsara Logo Samsara

Specialist Seller - Enterprise Select AI Products

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
MA
4000 Employees
350K-350K Annually

DBS Bank Ltd Logo DBS Bank Ltd

Support Engineer

Fintech • Information Technology • Software • Financial Services
Remote
Centre, El-Hajeb, MAR
41000 Employees

Miele & Cie. KG Logo Miele & Cie. KG

Consultant

Hardware • Appliances • Manufacturing
Remote
Centre, El-Hajeb, MAR
23000 Employees
13-13 Hourly

Similar Companies Hiring

Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account