Staff Site Reliability Engineer - Observability

Posted 9 Days Ago
Be an Early Applicant
6 Locations
In-Office
118K-284K Annually
Senior level
Fitness • Healthtech • Retail • Pharmaceutical
The Role
The Staff Site Reliability Engineer will lead the design and optimization of observability systems, collaborate on monitoring solutions, and mentor junior engineers, focusing on edge computing environments.
Summary Generated by Built In

At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.

As the nation’s leading health solutions company, we reach millions of Americans through our local presence, digital channels and more than 300,000 purpose-driven colleagues – caring for people where, when and how they choose in a way that is uniquely more connected, more convenient and more compassionate. And we do it all with heart, each and every day.

Position Summary

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Key Responsibilities:

  • Observability Strategy & Implementation:

    • Design and implement comprehensive observability solutions tailored for edge computing environments, including monitoring, logging, tracing, and metrics collection, to provide deep visibility into system performance and health across distributed remote facilities.

    • Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability in edge and centralized infrastructure.

    • Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities.

    • Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing environments.

  • System Reliability & Performance in Edge Computing:

    • Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind, incorporating best practices for instrumentation and monitoring in resource-constrained environments.

    • Drive proactive identification of issues in edge facilities through advanced observability tools, reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) across distributed systems.

    • Lead incident postmortems, analyzing root causes specific to edge environments and implementing observability-driven improvements to prevent recurrence.

  • Tooling & Automation for Edge Environments:

    • Develop and maintain tools, scripts, and automation to enhance observability pipelines, optimizing for the unique challenges of edge computing, such as bandwidth limitations and intermittent connectivity.

    • Evaluate and integrate industry-standard observability tools (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry) and recommend solutions tailored for edge computing use cases.

    • Optimize observability data storage, retention, and querying to balance performance, cost, and scalability across a large number of remote facilities.

  • Leadership & Collaboration:

    • Mentor and guide junior SREs and engineers on observability best practices for edge computing, fostering a culture of reliability and proactive monitoring.

    • Partner with solution, engineering, and business teams to align observability efforts with business objectives, ensuring seamless operation of edge and centralized systems.

    • Lead cross-functional initiatives to improve observability, reliability, and operational efficiency across distributed edge infrastructure.

  • Continuous Improvement:

    • Stay current with emerging observability trends, tools, and methodologies, particularly those suited for edge computing and distributed systems, and advocate for their adoption.

    • Contribute to the development of observability standards, runbooks, and documentation tailored for edge environments to ensure consistency and scalability.

    • Drive cost optimization for observability infrastructure while maintaining high-quality monitoring and alerting capabilities across remote facilities.

Required Qualifications

  • 7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field.

  • 5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar.

  • 3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments.

Preferred Qualifications

  • Experience with implementation of AIOps.

  • Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure.

  • Strong proficiency in programming/scripting languages (e.g., Python, java) for automation and tooling in distributed environments.

  • Expertise working in edge computing environments with a large number of remote facilities, managing observability for distributed, high-latency, or resource-constrained systems.

  • Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing.

  • Familiarity with chaos engineering principles to validate observability systems in edge environments.

  • Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes.

  • Strong problem-solving skills with a proactive, analytical mindset, particularly for addressing edge computing challenges.

  • Excellent communication and collaboration skills to work effectively with cross-functional teams across centralized and remote locations.

  • Ability to mentor and lead technical initiatives with a focus on observability and reliability in edge environments.

  • Comfortable working in a fast-paced, dynamic environment with a focus on delivering customer value.

  • Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie) tailored for distributed systems.

  • Deep understanding of monitoring, logging, and tracing concepts, including metrics collection, log aggregation, and distributed tracing for edge and centralized systems.

  • Familiarity with cloud infrastructure, CI/CD pipelines, and edge-specific deployment patterns.

Education

Bachelor’s degree, or equivalent experience (HS diploma + 4 years relevant experience)

Business Overview
Bring your heart to CVS Health Every one of us at CVS Health shares a single, clear purpose: Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing world. Anchored in our brand — with heart at its center — our purpose sends a personal message that how we deliver our services is just as important as what we deliver.  Our Heart At Work Behaviors™ support this purpose. We want everyone who works at CVS Health to feel empowered by the role they play in transforming our culture and accelerating our ability to innovate and deliver solutions to make health care more personal, convenient and affordable.  We strive to promote and sustain a culture of diversity, inclusion and belonging every day.  CVS Health is an affirmative action employer, and is an equal opportunity employer, as are the physician-owned businesses for which CVS Health provides management services. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex/gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law.  We proudly support and encourage people with military experience (active, veterans, reservists and National Guard) as well as military spouses to apply for CVS Health job opportunities.

Pay Range

The typical pay range for this role is:

$118,450.00 - $284,280.00


This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls.  The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors.  This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.  This position also includes an award target in the company’s equity award program. 
 

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include:

  • Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan.

  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.

  • Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.

For more information, visit https://jobs.cvshealth.com/us/en/benefits

We anticipate the application window for this opening will close on: 11/28/2025

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

Top Skills

Docker
Elk
Grafana
Java
Kubernetes
Opentelemetry
Prometheus
Python
Splunk
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Woonsocket, RI
119,959 Employees
Year Founded: 1963

What We Do

CVS Health is the leading health solutions company that delivers care in ways no one else can. We reach people in more ways and improve the health of communities across America through our local presence, digital channels and our nearly 300,000 dedicated colleagues – including more than 40,000 physicians, pharmacists, nurses and nurse practitioners.

Wherever and whenever people need us, we help them with their health – whether that’s managing chronic diseases, staying compliant with their medications, or accessing affordable health and wellness services in the most convenient ways. We help people navigate the health care system – and their personal health care – by improving access, lowering costs and being a trusted partner for every meaningful moment of health. And we do it all with heart, each and every day.

Similar Jobs

Fluidstack Logo Fluidstack

Site Reliability Engineer

Artificial Intelligence • Software
In-Office
4 Locations
30 Employees

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
7 Locations
5550 Employees
127K-249K Annually

Justworks Logo Justworks

Account Executive

HR Tech • Payments • Professional Services • Software
Easy Apply
Hybrid
Austin, TX, USA
1165 Employees
84K-164K Annually

Apex Fintech Solutions Logo Apex Fintech Solutions

Sales Manager

Fintech • Software • Financial Services
Hybrid
2 Locations
1000 Employees
141K-176K Annually

Similar Companies Hiring

Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account