Manager of Site Reliability Engineering (SRE)

Posted Yesterday
Be an Early Applicant
Birmingham, AL, USA
In-Office
Senior level
Automotive • Hardware • Logistics
The Role
The Manager of Site Reliability Engineering leads a team to enhance cloud infrastructure reliability, automate processes, and collaborate with various teams to improve service delivery and operations.
Summary Generated by Built In

SUMMARY:

The Manager of Site Reliability Engineering leads and develops a team of SRE practitioners focused on delivering highly reliable, scalable, and performant cloud-based infrastructure and services. This role ensures the implementation of SRE principles, drives automation, observability, and incident management practices to enhance system reliability, and collaborates across development and operations teams to support continuous delivery and robust cloud platform operations.

You must be eligible to work in the US without Visa Sponsorship

JOB DUTIES

• Lead, mentor, and grow a high-performing team of Site Reliability Engineers, fostering a culture of ownership, continuous improvement, and operational excellence.

• Implement and champion Site Reliability Engineering principles and DevOps best practices within the team to ensure service reliability, availability, and performance.

• Define and track key SRE metrics such as service uptime, incident response and resolution times.

• Drive automation efforts including CI/CD pipeline enhancements, infrastructure-as-code practices, and self-service infrastructure provisioning to increase deployment velocity while reducing manual toil.

• Own and continuously improve observability practices including system monitoring, logging, alerting, and diagnostics to ensure rapid issue detection and resolution.

• Participate in incident response processes including incident management, root cause analysis, post-mortems, and continuous improvement to enhance system resilience.

• Partner closely with software engineering, product management, architecture, and security teams to embed reliability and security early in the software development lifecycle (SDLC).

• Oversee the management and scalability of cloud infrastructure environments, primarily on Google Cloud Platform (GCP), with a focus on Kubernetes, container orchestration, and hybrid cloud integrations.

• Advocate for and apply best practices in performance tuning, capacity planning, and system design for high availability.

• Develop and execute a long-term roadmap for our hybrid cloud platform, aligning with evolving business objectives and technology trends.

• Establish and monitor key performance indicators (KPIs) service level indicators (SLIs) and service level objectives (SLOs) to drive system health and stability.

EDUCATION & EXPERIENCE

Typically requires a bachelor's degree and 7 years of experience in a technology and/or software engineering role or an equivalent combination

KNOWLEDGE, SKILLS, ABILITIES

Experience & Leadership

• Proven experience working in large, complex enterprise environments (Fortune 500 or equivalent).

Site Reliability Engineering & DevOps Practices

• Strong understanding and demonstrated implementation of Site Reliability Engineering (SRE) principles at scale.

• Hands-on experience with infrastructure-as-code (IaC) tools such as Terraform, and ArgoCD.

• In-depth knowledge and practical experience with CI/CD pipelines and automation of software delivery.

• Championing DevOps practices and embedding reliability early in the SDLC.

• Significant hands-on experience in Site Reliability Engineering or related roles focused on cloud infrastructure reliability.

• Strong software engineering background with proficiency in infrastructure-as-code tools (e.g., Terraform, ArgoCD) and CI/CD automation.

• Deep knowledge of cloud platforms, specifically Google Cloud Platform (GCP), Kubernetes, container orchestration, and cloud-native architecture.

• Familiarity with monitoring and observability tools such as Dynatrace, Datadog, or equivalents.

• Experience managing high-availability systems in 24/7 operational environments.

• Ability to collaborate cross-functionally and drive alignment across engineering, product, and security teams.

Tools & Monitoring

• Experience with monitoring, logging, and observability platforms.

• Familiarity with incident management and performance monitoring tools, including Dynatrace and Datadog.

• Proficient in cloud deployment tooling and automation frameworks.

• Experience with Azure DevOps (ADO) or equivalent CI/CD tools.

Core Technical Skills

• Strong software engineering and infrastructure background.

• Solid understanding of Kubernetes, container orchestration, cluster management, and elastic scalability.

• Experience with API-driven, event driven and microservices architectures.

• • Skilled in performance diagnostics, capacity planning, tuning, and system architecture for high-availability systems.

Not the right fit?  Let us know you're interested in a future opportunity by joining our Talent Community on jobs.genpt.com or create an account to set up email alerts as new job postings become available that meet your interest!

GPC conducts its business without regard to sex, race, creed, color, religion, marital status, national origin, citizenship status, age, pregnancy, sexual orientation, gender identity or expression, genetic information, disability, military status, status as a veteran, or any other protected characteristic. GPC's policy is to recruit, hire, train, promote, assign, transfer and terminate employees based on their own ability, achievement, experience and conduct and other legitimate business reasons.

Skills Required

  • Bachelor's degree and 7 years of experience in technology or software engineering
  • Proven experience in large complex enterprise environments
  • Demonstrated implementation of SRE principles at scale
  • Hands-on experience with infrastructure-as-code tools like Terraform and ArgoCD
  • Familiarity with monitoring and observability tools such as Dynatrace or Datadog
  • Proficient in cloud deployment tooling and automation frameworks

Genuine Parts Company Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Genuine Parts Company and has not been reviewed or approved by Genuine Parts Company.

  • Retirement Support Retirement programs include a 401(k) with company match and an Employee Stock Purchase Plan, with profit sharing and pension plans also mentioned. These elements indicate strong long-term financial support alongside ownership opportunities.
  • Healthcare Strength Benefits encompass medical, dental, and vision coverage with HSA and FSA options plus income-protection coverages like life, AD&D, and disability. This breadth suggests a robust core health and protection offering.
  • Parental & Family Support Paid maternity and paternity leave are provided in addition to short‑term disability, and an Employee Assistance Program supports families with counseling and life tools. These programs reinforce family support alongside standard PTO.

Genuine Parts Company Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Marietta, GA
4,400 Employees
Year Founded: 1928

What We Do

Genuine Parts Company (GPC), founded in 1928, is a global service organization engaged in the distribution of automotive and industrial replacement parts. We serve hundreds of thousands of customers from a network of more than 10,000 locations in 14 countries and have approximately 50,000 employees.

Similar Jobs

Zeta Global Logo Zeta Global

Lead Software Engineer

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
Easy Apply
Remote or Hybrid
United States
2429 Employees
150K-200K Annually
Remote or Hybrid
United States
240 Employees
150K-175K Annually

General Motors Logo General Motors

Sales Manager

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees
106K-141K Annually

SailPoint Logo SailPoint

Senior Marketing Specialist

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
2 Locations
2461 Employees
86K-145K Annually

Similar Companies Hiring

Turion Space Thumbnail
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Irvine, CA
150 Employees
Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
19 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account