Site Reliability Engineer (SRE) Manager

Posted 2 Days Ago
Be an Early Applicant
Monterrey, Nuevo León
Expert/Leader
Software
The Role
The SRE Manager will lead the Site Reliability Engineering team, managing production system reliability, incident response, and CI/CD practices while ensuring alignment with business objectives.
Summary Generated by Built In

Location: Hybrid in Monterrey, MX. 8 days a month on-site. 
Possibility to get a travel or relocation stipend for travel.
Type of Employment: contract to hire. 1-3 month remote contract, and then full-time employment.
Requirement: Must be legally authorized to work for any Mexican employer without sponsorship, now or in the future. 

About Us
Concord isn't your typical consulting firm; we're an execution focused company passionate about delivering results. Our mission is to help clients enhance customer experiences, optimize operations, and revolutionize product offerings through seamless integration, optimization, and activation of technology and data.
Our services and solutions include Digital Experience (Salesforce, Headless Commerce, UI/UX), Data and Analytics (Snowflake, Databricks, Martech Analytics), and Engineering and Application Services (Application Modernization, Greenfield Apps, Portal Buildout, etc.).

About the Role
We are seeking a strategic, technically adept, and hands-on SRE Manager to lead the reliability, scalability, and operational excellence of our production systems. This role is ideal for a leader who thrives in high-pressure environments, excels at debugging complex production issues, and is passionate about building and mentoring high-performing teams.
The SRE Manager will be responsible for hiring and managing a team of SREs, driving incident response and postmortem processes, and collaborating with multiple product teams to build and maintain robust CI/CD pipelines and deployment practices. This role demands a strong sense of ownership, a deep understanding of cloud-native infrastructure, and the ability to lead by example.
Business Alignment
The SRE Manager will partner with business stakeholders to ensure reliability goals support customer experience, compliance, and growth targets. This includes aligning SRE initiatives with broader business objectives such as revenue protection, innovation, and regulatory adherence.

Key Responsibilities

  • Build and lead a high-performing Site Reliability Engineering team.
  • Create individualized development plans for SREs, encourage participation in industry conferences, and support certification programs.
  • Debug and resolve complex production issues, ensuring minimal downtime and rapid recovery.
  • Own the incident lifecycle, including coordination, communication, and creation of detailed postmortem documentation.
  • Implement blameless postmortems and maintain a library of runbooks for common incident types.
  • Follow up with product teams to ensure resolution and implementation of long-term fixes.
  • Partner with internal product and engineering teams to understand infrastructure needs and deliver scalable, secure, and reliable solutions.
  • Drive the design, implementation, and automation of cloud infrastructure using Azure, Terraform, and Kubernetes (AKS).
  • Lead the adoption and management of tools such as Argo CD, Argo Workflows, Azure DevOps, and Octopus Deploy.
  • Architect and manage API Gateways, WAFs, Service Mesh, and multi-cloud networking (VNets, private networks).
  • Establish and enforce deployment best practices, including documentation, versioning, rollback strategies, and environment management.
  • Collaborate with product teams to build and maintain CI/CD pipelines, ensuring reliable and repeatable deployments.
  • Foster a culture of ownership, accountability, and continuous improvement across the team.
  • Define and track key performance indicators (KPIs) for system reliability and team effectiveness.
  • Define and manage Service Level Objectives (SLOs) and error budgets for all critical services.
  • Lead the adoption of advanced observability tools for proactive reliability management.
  • Collaborate with security, compliance, and architecture teams through joint reviews, shared dashboards, and audits to ensure infrastructure meets enterprise standards.

Required Qualifications

  • 10+ years of experience in infrastructure, DevOps, or SRE roles, with 3+ years in a technical leadership or management capacity.
  • Proven experience debugging and resolving production issues in large-scale systems.
  • Experience building and scaling cloud-native infrastructure on Azure.
  • Deep expertise in Kubernetes (AKS), CI/CD pipelines, and Infrastructure as Code (Terraform).
  • Strong understanding of networking, VNets, private cloud connectivity, and multi-cloud architectures.
  • Hands-on experience with Argo CD, Argo Workflows, Azure DevOps.
  • Demonstrated ability to hire, mentor, and lead engineering teams.
  • Excellent communication and stakeholder management skills.
  • Strong problem-solving mindset with a bias for action and ownership.
  • Ability to create and maintain detailed deployment documentation and lead by example in operational excellence.
  • Advanced English proficiency (C1 or C2) with proven success collaborating in global, English-speaking environments.
Preferred Qualifications
  • Experience supporting internal product teams or platform engineering organizations.
  • Familiarity with FinOps, cost optimization, and cloud governance.
  • Exposure to compliance frameworks (SOC2, ISO, HIPAA).
  • Experience with service mesh technologies (Istio, Linkerd).
  • Knowledge of emerging technologies such as AI/ML ops, edge computing, and sustainability practices.
What Success Looks Like
  • A high-performing SRE team that operates with autonomy and accountability.
  • Internal customers view the SRE team as a trusted partner in delivering reliable, scalable systems.
  • Infrastructure is automated, observable, and resilient by design.
  • Incidents are rare, well-managed, and always lead to learning and improvement.
  • CI/CD pipelines are robust, well-documented, and consistently deliver high-quality deployments.

Top Skills

Argo Cd
Argo Workflows
Azure
Azure Devops
Kubernetes
Octopus Deploy
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, Georgia
642 Employees
Year Founded: 2003

What We Do

Concord is a technology consultancy building connected customer experiences backed by powerful AI & analytics and underpinned by secure IT foundations.

Digital Experience | Data & Analytics | Engineering & Applications

Similar Jobs

Magna International Logo Magna International

Senior SAP SD-LE Analyst

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
San Pedro Garza Garcia, San Pedro Garza García, Nuevo León, MEX
171000 Employees
9-9 Annually

Mondelēz International Logo Mondelēz International

Manager Project Engineering

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
2 Locations
90000 Employees

Mondelēz International Logo Mondelēz International

Engineering Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
Monterrey, Nuevo León, MEX
90000 Employees

Mondelēz International Logo Mondelēz International

Project Engineer

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
2 Locations
90000 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account