Senior Software Development Engineer (Site Reliability)

Posted 3 Days Ago
Be an Early Applicant
Richardson, TX, USA
In-Office
93K-204K Annually
Senior level
Fitness • Healthtech • Retail • Pharmaceutical
The Role
Ensure reliability, availability, performance, and scalability of the myPBM platform using SRE practices: define SLIs/SLOs, implement observability (AppDynamics, Splunk), lead incident response and RCAs, support CI/CD and deployment guardrails, manage Azure/AKS infrastructure, automate with IaC and scripting, enforce security/compliance, participate in on-call rotation, and collaborate with engineering, DevOps, infrastructure, and security teams.
Summary Generated by Built In

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

Position Summary

The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and operational scalability of the myPBM platform. This role applies software engineering practices to operations, with a focus on automation, observability, incident management, and continuous improvement to support the stable, scalable delivery of client-facing services.

The SRE partners closely with DevOps, Engineering, Infrastructure, and Security teams to balance system reliability with delivery velocity while maintaining compliance with enterprise standards.

*We prefer this person is hybrid in Richardson, TX, Northbrook, IL or Scottsdale, AZ
 

Primary Responsibilities

1. Reliability Engineering and Operations

  • Ensure high availability, resiliency, and performance of myPBM applications and infrastructure.

  • Define and manage SLIs, SLOs, and SLAs for critical services.

  • Monitor production systems and proactively identify issues before customer impact.

  • Lead incident response, triage, and root cause analysis (RCA).

  • Drive continuous improvement to reduce repeat incidents and operational toil.

2. Monitoring, Observability, and Alerting

  • Implement and maintain end-to-end observability across UI, APIs, and infrastructure layers.

  • Build and manage monitoring solutions using:

  • AppDynamics (APM, RUM, synthetic monitoring)

  • Splunk (logs, dashboards, and error tracking)

  • Design actionable alerts and escalation workflows using tools such as xMatters and MIR3.

  • Standardize dashboards and ensure data accuracy and visibility.

  • Continuously optimize alerting to reduce noise and improve signal quality.

3. DevSecOps and Release Engineering

  • Support and enhance CI/CD pipelines, including GitHub Actions and enterprise pipeline solutions.

  • Enforce deployment guardrails, release governance, and production readiness checks.

  • Support build and deployment failure triage and rollback strategies.

  • Partner with development teams to improve deployment reliability and automation.

  • Ensure adherence to change management (CAB/SNOW) and release policies

4. Infrastructure Engineering and Platform Stability

  • Manage and support cloud infrastructure, including AKS, compute, storage, and networking.

  • Ensure platform health, capacity monitoring, and performance optimization.

  • Support infrastructure provisioning and environment setup.

  • Drive disaster recovery (DR) readiness and failover validation, including RTO and RPO objectives.

  • Enable application onboarding onto standardized enterprise platforms.

5. Security and Compliance

  • Implement continuous security monitoring and vulnerability remediation.

  • Manage secrets, certificates, and identity integration, including IAM onboarding.

  • Ensure compliance with CVS security standards, audit requirements, and production readiness controls.

  • Enforce shift-left security practices in CI/CD pipelines.

6. Incident Management and Support Model

  • Participate in 24x7 on-call rotation and incident response.

  • Partner with Production Support to resolve incidents.

  • Ensure monitoring and alerting gaps are identified and closed.

  • Maintain incident documentation and improve standard operating procedures.

  • Support the full issue detection, triage, resolution, and prevention lifecycle.

7. Automation and Continuous Improvement

  • Automate repetitive operational tasks to reduce toil.

  • Implement infrastructure as code (IaC) practices.

  • Continuously improve deployment pipelines, monitoring, and observability.

  • Enable predictive insights and proactive issue prevention.

8. Collaboration and Platform Enablement

  • Work closely with engineering, DevOps, infrastructure, and security teams.

  • Enable a shared ownership model for reliability and operations.

  • Provide guidance on production readiness and operational best practices.

Required Qualifications

5+ years of experience in site reliability engineering, DevOps, or platform engineering including the following:.

  • Experience with Monitoring and observability tools such as Splunk and AppDynamics

  • Cloud platforms, preferably Azure, including AKS and Kubernetes

  • CI/CD pipelines such as GitHub Actions, Jenkins, or similar tools

  • Strong understanding of Incident management and root cause analysis, Monitoring, alerting, and logging practices, and Infrastructure and networking fundamentals

  • Scripting experience with Python, Bash, or PowerShell.

Preferred Qualifications

  • Experience in healthcare or other regulated environments.

  • Knowledge of site reliability engineering principles, including SLIs, SLOs, and error budgets.

  • Familiarity with DevSecOps practices and compliance requirements.

  • Experience supporting large-scale distributed systems.

Education

Bachelor's degree or equivalent experience.

Anticipated Weekly Hours

40

Time Type

Full time

Pay Range

The typical pay range for this role is:

$92,700.00 - $203,940.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls.  The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors.  This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. 
 

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.


Additional details about available benefits are provided during the application process and on
Benefits Moments.

We anticipate the application window for this opening will close on: 07/19/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

Skills Required

  • 5+ years of experience in site reliability engineering, DevOps, or platform engineering
  • Experience with monitoring and observability tools such as Splunk and AppDynamics
  • Cloud platform experience, preferably Azure, including AKS and Kubernetes
  • Experience with CI/CD pipelines such as GitHub Actions, Jenkins, or similar tools
  • Strong understanding of incident management and root cause analysis, monitoring, alerting, and logging practices
  • Infrastructure and networking fundamentals
  • Scripting experience with Python, Bash, or PowerShell
  • Bachelor's degree or equivalent experience
  • Familiarity with change management tools/processes (CAB/ServiceNow) and release governance
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Woonsocket, RI
119,959 Employees
Year Founded: 1963

What We Do

CVS Health is the leading health solutions company that delivers care in ways no one else can. We reach people in more ways and improve the health of communities across America through our local presence, digital channels and our nearly 300,000 dedicated colleagues – including more than 40,000 physicians, pharmacists, nurses and nurse practitioners. Wherever and whenever people need us, we help them with their health – whether that’s managing chronic diseases, staying compliant with their medications, or accessing affordable health and wellness services in the most convenient ways. We help people navigate the health care system – and their personal health care – by improving access, lowering costs and being a trusted partner for every meaningful moment of health. And we do it all with heart, each and every day.

Similar Jobs

Circle Logo Circle

Security Engineer

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
In-Office or Remote
7 Locations
1050 Employees
123K-165K Annually

Ericsson Logo Ericsson

Hyperscaler Deployment Services Director

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
Plano, TX, USA
88000 Employees

Sonar Logo Sonar

Director, Deal Desk

Artificial Intelligence • Cloud • Security • Software
Easy Apply
Hybrid
Austin, TX, USA
800 Employees

Boeing Logo Boeing

Supply Chain Specialist

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Dallas, TX, USA
170000 Employees
66K-112K Annually

Similar Companies Hiring

Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Scotch Thumbnail
Artificial Intelligence • eCommerce • Fintech • Payments • Retail • Software • Analytics
US
35 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account