Sr. Site Reliability Engineer - Incident Response

Posted 8 Hours Ago
Be an Early Applicant
Atlanta, GA
Hybrid
99K-165K Annually
Senior level
Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Empowering people today to build a better future for the next generation.
The Role
The SRE - Incident Response accelerates incident resolution and enhances management processes, partnering with engineering teams to troubleshoot issues and analyze incident response effectiveness.
Summary Generated by Built In
The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process. This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools, and post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution. The SRE - Incident Response also plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements.
Core Competencies
  • Engineering/Tooling: Demonstrates the ability to design, build, and maintain engineering solutions and tools that enhance reliability, automate incident response, and reduce operational toil.
  • Incident Troubleshooting: Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents.
  • Monitoring & Observability: Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms.
  • AI Centric Engineering: Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks
  • Executive Communication: Ability to distill complex technical issues into concise, business-relevant summaries for senior leadership.
  • Analytical Rigor: Strong attention to detail in validating incident data and identifying trends or gaps in response.
  • DevOps & Architecture Knowledge: Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure.
  • Metrics & Reporting: Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).

Key Responsibilities of This Role
Here's how it typically looks when not tied to active on-call:
Post-Incident Review Development
  • Draft and deliver executive summaries post-incident
  • Develop and coach teams on blameless postmortems.
  • Create templates, train facilitators, and help guide root cause analysis (e.g., 5 Whys, fishbone diagrams).
  • Maintain a central library of learnings and cross-cutting themes.

Incident Process Improvement
  • Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
  • Navigate and analyze data from observability platforms to make informed inferences about root causes
  • Analyze the effectiveness of incident response to identify systemic reliability gaps.
  • Standardize incident response workflows (incident roles, comms, escalation paths).
  • Create or refine runbooks, incident command frameworks, and severity classification guides.

Metrics and Insights
  • Build dashboards around incident frequency, MTTR, MTTA, and recurrence rates.
  • Use incident data to drive reliability of OKRs or engineering investments.

Tooling & AI Solutions
  • Partner with engineering teams to identify repetitive or high-impact tasks suitable for automation.
  • Develop, implement, and continuously improve custom scripts, bots, and AI-driven workflows for monitoring, alerting, and incident triage.
  • Evaluate and integrate emerging AI/ML technologies to optimize detection, root cause analysis, and reporting.
  • Ensure all tools and automations are secure, maintainable, and aligned with organizational standards and SRE best practices.
  • Document and socialize new tools and AI solutions, enabling adoption and knowledge sharing across teams.

Cross-Team Collaboration
  • Collaborate with Engineering Managers and Incident Commanders to gather and validate incident data
  • Partner with product teams, infra, and leadership to socialize reliability best practices.
  • Act as a reliability "consultant" to squads that have impactful incidents.
  • Recommend enhancements to monitoring, alerting, and response processes to reduce future incident impact

USD 99,000.00 - 165,000.00 per year
Compensation:
Compensation includes a base salary of $99,000.00 - $165,000.00. The base salary may vary within the anticipated base pay range based on factors such as the ultimate location of the position and the selected candidate's knowledge, skills, and abilities. Position may be eligible for additional compensation that may include an incentive program.
Benefits:
The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company's needs, and its obligations; seven paid holidays throughout the calendar year; and up to 160 hours of paid wellness annually for their own wellness or that of family members. Employees are also eligible for additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave.

Top Skills

AI
Datadog
Ml
New Relic
Splunk

What the Team is Saying

Belinda
Tonya
Chris
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Atlanta, GA
50,000 Employees
Year Founded: 1898

What We Do

For well over a century, Cox Enterprises has been shaping the future with daring ideas and values-driven thinking.

Since our founding in 1898, our relentless spirit of innovation has driven us to disrupt industries and enhance the quality of life in the communities we serve. Through our major divisions — Cox Communications, Cox Automotive and Cox Farms — our people have countless opportunities to grow and make an impact in the communications and automotive industries, as well as in new ventures in agriculture, cleantech, digital media and more.

As a privately-held, family-owned business, we know that people are our most valuable asset. We offer a supportive and inclusive environment with flexible career growth, amazing benefits and work-life balance at the forefront.

Our mission, our ways of working and our commitment to people are what make our workplace culture remarkably flexible and resilient. Join us to build a better future and make your mark.

Why Work With Us

At our core, Cox is a technology company that values human relationships. We know people feel most empowered when their work has meaning, when they feel respected and have opportunities to grow. “Career satisfaction” is not enough at Cox — we’re here to help you find balance, live well and achieve your career goals even as they change over time.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Cox Enterprises Teams

Team
Product & Tech
Team
B2B & Cloud Sales
About our Teams

Cox Enterprises Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Every person has different working styles and preferences — and we aim to empower teams to work where they are most comfortable. Some roles require in-person work, but for those that can be performed remotely, we offer flexibility.

Typical time on-site: Flexible
Company Office Image
HQAtlanta, GA
Company Office Image
Austin, TX
Company Office Image
Burlington, VT
Company Office Image
Foothill Ranch, CA
Las Vegas, NV
Company Office Image
North Hills, NY
Company Office Image
Oklahoma City, OK
Company Office Image
Omaha, NE
Company Office Image
Phoenix, AZ
Company Office Image
Raleigh, NC
Company Office Image
San Diego, CA
South Jordan, UT
Learn more

Similar Jobs

Cox Enterprises Logo Cox Enterprises

Senior Director, Sales Strategy & Analytics

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Atlanta, GA, USA
50000 Employees
175K-292K Annually

Cox Enterprises Logo Cox Enterprises

Operations Management Sr Analyst - DealShield

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Atlanta, GA, USA
50000 Employees
79K-119K Annually

Cox Enterprises Logo Cox Enterprises

Sr Insurance Claims Specialist

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Georgia, USA
50000 Employees
18-27 Hourly

Cox Enterprises Logo Cox Enterprises

Water & Biodiversity Engineer II

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Atlanta, GA, USA
50000 Employees
72K-108K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account