Lead Director, Site Reliability Engineering - Client Experience

Reposted 16 Days Ago
Be an Early Applicant
Richardson, TX, USA
In-Office
144K-288K Annually
Senior level
Fitness • Healthtech • Retail • Pharmaceutical
The Role
The Lead Director for SRE will lead teams to enhance reliability and performance for cloud platforms, define best practices, manage incidents, and ensure system stability across Azure and GCP environments.
Summary Generated by Built In

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

Position Summary

The Lead Director – Site Reliability Engineering - Client Experience is responsible for building, leading, and scaling hands‑on SRE teams supporting Adjudication and Client Experience platforms across On-Prem, Azure and GCP.

This role owns end‑to‑end reliability engineering—from defining SLOs and error budgets to designing resilient cloud architectures, automating operations, and embedding reliability directly into the SDLC. The ideal candidate is a deeply technical leader who has personally designed, operated, and scaled highly available distributed systems and can coach teams to do the same.

You will work closely with engineering, architecture, product, infrastructure, and security teams to shift operations from reactive to predictive, reduce operational toil, and ensure platform stability at enterprise scale.

Key Responsibilities

  • Lead and grow hands‑on SRE teams responsible for reliability, scalability, performance, and availability of Tier‑1 services across Azure and GCP

  • Establish and enforce SRE best practices, including SLIs, SLOs, error budgets, toil reduction, and automation‑first operations

  • Review and influence architecture, reliability designs, and failure modes for critical platforms and services

  • Drive cloud‑native reliability patterns, including autoscaling, graceful degradation, resilience testing, and disaster recovery

  • Own incident management, serving as an escalation leader and championing blameless post‑mortems and systemic fixes

  • Lead root cause analysis and ensure corrective actions result in measurable reliability improvements

  • Define and standardize monitoring, alerting, and observability across distributed systems using metrics, logs, and traces

  • Implement predictive operations and AI‑Ops capabilities, including anomaly detection, automated triage, and remediation

  • Lead reliability engineering for multi‑cloud environments (Azure & GCP), including Kubernetes platforms (AKS, GKE)

  • Ensure pre‑season readiness and year‑round capacity planning based on historical usage and growth forecasts

  • Drive consistency in CI/CD, deployment strategies, and rollback mechanisms across teams

  • Embed reliability into the SDLC, shifting accountability left into design, development, and testing

  • Reduce operational toil through automation, self‑service platforms, and standardized runbooks

  • Lead modernization initiatives that replace manual operations with engineering‑driven reliability solutions

  • Communicate platform health, risks, and improvements using data‑driven reliability metrics

  • Ensure systems meet security, compliance, and regulatory requirements

Required Qualifications

  • 10+ years of progressive experience in engineering or SRE organizations,

  • 5+ years of experience managing senior engineers and leaders

  • 5+ years of hands‑on experience designing, deploying, and operating systems in cloud environments (Azure and/or GCP)

  • Proven experience building or scaling SRE practices, including SLOs, SLIs, incident response, and post‑mortems

  • Strong background in distributed systems, microservices, APIs, and cloud‑native architectures

  • Experience leading teams through platform modernization or reliability transformation initiatives

Preferred Qualifications

  • Deep expertise with Kubernetes‑based platforms (AKS, GKE; OpenShift a plus)

  • Experience implementing AI‑Ops, automation, and predictive reliability solutions

  • Strong understanding of observability platforms and modern monitoring strategies

  • Track record of reducing outages, improving MTTR, and scaling reliability at enterprise scale

  • Ability to operate with a startup mindset while navigating complex enterprise environments

  • Excellent communication and stakeholder management skills with the ability to influence at all levels

Education

Bachelor’s degree or equivalent experience

Pay Range

The typical pay range for this role is:

$144,200.00 - $288,400.00


This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls.  The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors.  This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.  This position also includes an award target in the company’s equity award program. 
 

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.


Additional details about available benefits are provided during the application process and on
Benefits Moments.

We anticipate the application window for this opening will close on: 06/29/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

Skills Required

  • 10+ years of progressive experience in engineering or SRE organizations
  • 5+ years of experience managing senior engineers and leaders
  • 5+ years of hands-on experience designing, deploying, and operating systems in cloud environments (Azure and/or GCP)
  • Proven experience building or scaling SRE practices, including SLOs, SLIs, incident response, and post-mortems
  • Strong background in distributed systems, microservices, APIs, and cloud-native architectures
  • Experience leading teams through platform modernization or reliability transformation initiatives
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Woonsocket, RI
119,959 Employees
Year Founded: 1963

What We Do

CVS Health is the leading health solutions company that delivers care in ways no one else can. We reach people in more ways and improve the health of communities across America through our local presence, digital channels and our nearly 300,000 dedicated colleagues – including more than 40,000 physicians, pharmacists, nurses and nurse practitioners. Wherever and whenever people need us, we help them with their health – whether that’s managing chronic diseases, staying compliant with their medications, or accessing affordable health and wellness services in the most convenient ways. We help people navigate the health care system – and their personal health care – by improving access, lowering costs and being a trusted partner for every meaningful moment of health. And we do it all with heart, each and every day.

Similar Jobs

Collectors Logo Collectors

Director, People Operations

Consumer Web • eCommerce • Machine Learning • Software • Sports • Analytics
In-Office or Remote
4 Locations
2246 Employees
163K-267K Annually

Superhuman Logo Superhuman

Enterprise Account Executive

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Remote or Hybrid
United States
1500 Employees
207K-300K Annually

inKind Logo inKind

Platform Engineer

eCommerce • Fintech • Food • Mobile • Social Impact
Easy Apply
Hybrid
Austin, TX, USA
170 Employees
150K-160K Annually

CSC Logo CSC

Document Processor

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Hybrid
Lewisville, TX, USA
8500 Employees

Similar Companies Hiring

Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Scotch Thumbnail
Artificial Intelligence • eCommerce • Fintech • Payments • Retail • Software • Analytics
US
35 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account