This role is for one of the Weekday's clients
Salary range: Rs 5000000 - Rs 6500000 (ie INR 50-65 LPA)
Experience: 14+ yrs
Location: Bangalore
Job Type: full-time
We are seeking a highly experienced SRE & DevOps Architect to lead enterprise-wide reliability, platform engineering, and DevOps transformation initiatives. This strategic leadership role is responsible for defining and driving the organization's Site Reliability Engineering (SRE), DevOps, DevSecOps, and Platform Engineering vision while establishing scalable frameworks, governance models, and best practices that enable high-performing, resilient, secure, and cost-effective technology platforms.
As a key technology leader, you will work closely with Engineering, Cloud, Security, Infrastructure, and Business stakeholders to build and scale a DevOps & Reliability Center of Excellence (CoE). You will be responsible for driving cloud-native adoption, operational excellence, observability, automation, platform modernization, and continuous improvement across the software delivery lifecycle.
The ideal candidate combines deep technical expertise with strong architectural leadership, enterprise-scale transformation experience, and a proven ability to influence stakeholders across multiple teams and geographies. This role requires a balance of strategic thinking, technical governance, hands-on architectural guidance, and mentorship to enable engineering teams to deliver reliable and scalable solutions at speed.
RequirementsKey ResponsibilitiesArchitecture & Strategy
- Define enterprise-wide SRE, DevOps, and Platform Engineering strategies, standards, and reference architectures.
- Design scalable, highly available, resilient, and fault-tolerant platforms across cloud-native and hybrid environments.
- Establish reliability engineering practices including Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, capacity planning, and resilience frameworks.
- Lead adoption of Internal Developer Platforms (IDPs) and platform engineering best practices to improve developer productivity and operational efficiency.
- Create reusable architectural patterns, frameworks, and governance standards that can be leveraged across engineering teams.
- Build, scale, and manage the DevOps and Site Reliability Engineering Center of Excellence.
- Define operating models, governance frameworks, maturity assessments, and key performance metrics for DevOps and SRE practices.
- Establish enterprise standards for CI/CD, Infrastructure as Code, observability, release management, and automation.
- Act as a trusted advisor to engineering and product teams by providing architectural guidance and implementation best practices.
- Drive organization-wide adoption of reliability engineering and automation-first principles.
- Architect and govern solutions across AWS, Azure, and GCP environments.
- Lead Infrastructure as Code initiatives using Terraform, CloudFormation, Ansible, ARM templates, or equivalent technologies.
- Enable and optimize containerized environments using Docker, Kubernetes, Helm, OpenShift, and service mesh technologies.
- Establish cloud governance, scalability frameworks, and FinOps practices to optimize cloud spending and resource utilization.
- Drive infrastructure modernization and automation initiatives across enterprise environments.
- Design and implement enterprise observability solutions using tools such as Prometheus, Grafana, ELK, Splunk, Datadog, and OpenTelemetry.
- Establish proactive monitoring, alerting, incident management, root cause analysis, and reliability improvement processes.
- Lead blameless postmortems and continuous service improvement programs.
- Integrate DevSecOps practices throughout the software development lifecycle.
- Collaborate with security teams to implement secrets management, policy-as-code, compliance controls, and vulnerability management processes.
- Partner with engineering leadership, cloud teams, security organizations, and enterprise architects to align technology strategies.
- Mentor DevOps and SRE engineers and establish capability-building programs across the organization.
- Influence executive stakeholders through operational metrics, reliability improvements, cost optimization outcomes, and business value realization.
- Lead large-scale transformation initiatives involving multiple teams, business units, and geographic locations.
- 15–20 years of technology experience with at least 6+ years in architecture, platform engineering, SRE leadership, or DevOps Center of Excellence roles.
- Deep expertise in Site Reliability Engineering, including SLIs, SLOs, error budgets, incident management, capacity planning, and operational excellence practices.
- Strong experience leading enterprise DevOps, DevSecOps, and platform engineering transformations.
- Proven track record of building and managing DevOps and SRE Centers of Excellence.
- Expertise in cloud platforms including AWS, Microsoft Azure, and Google Cloud Platform.
- Hands-on experience with Kubernetes, Docker, Helm, OpenShift, and modern container orchestration technologies.
- Strong proficiency in Infrastructure as Code and automation tools such as Terraform, CloudFormation, and Ansible.
- Extensive experience implementing CI/CD platforms using Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, and Argo CD.
- Advanced knowledge of observability platforms including Prometheus, Grafana, ELK, Splunk, Datadog, and OpenTelemetry.
- Experience integrating security controls into software delivery pipelines and implementing enterprise DevSecOps practices.
- Strong scripting and automation skills using Python, Shell scripting, and preferably Go.
- Experience driving enterprise-scale cloud modernization and multi-team transformation programs.
- Excellent stakeholder management, communication, and leadership skills.
- Ability to influence teams without direct authority while balancing governance, compliance, and developer productivity.
- Strong mentoring mindset with a passion for building engineering excellence and high-performing teams.
- Professional certifications in AWS, Azure, GCP, Kubernetes, SRE, or DevOps disciplines are highly desirable.
Skills Required
- 14+ years technology experience
- 6+ years in architecture, platform engineering, SRE leadership, or DevOps Center of Excellence roles
- Deep expertise in Site Reliability Engineering (SLIs, SLOs, error budgets, incident management, capacity planning)
- Experience leading enterprise DevOps, DevSecOps, and platform engineering transformations
- Proven track record building and managing DevOps and SRE Centers of Excellence
- Expertise in cloud platforms: AWS, Microsoft Azure, Google Cloud Platform
- Hands-on experience with containers and orchestration: Kubernetes, Docker, Helm, OpenShift
- Proficiency in Infrastructure as Code and automation: Terraform, CloudFormation, Ansible, ARM templates
- Experience with CI/CD platforms: Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, Argo CD
- Experience with observability and monitoring tools: Prometheus, Grafana, ELK, Splunk, Datadog, OpenTelemetry
- Experience integrating security controls into software delivery pipelines, secrets management, and policy-as-code
- Strong scripting and automation skills using Python and Shell scripting
- Proven stakeholder management, communication, leadership and mentoring skills
- Experience driving enterprise-scale cloud modernization and multi-team transformation programs
- Hands-on experience with Go
- Professional certifications in AWS, Azure, GCP, Kubernetes, SRE, or DevOps disciplines
What We Do
Weekday is an AI-powered recruitment platform that helps startups hire top-tier engineering and product talent. By leveraging a massive database of white-collar professionals and advanced outreach tools, the company streamlines the hiring process through automated sourcing, AI-driven resume screening, and white-glove contingency services. Their mission is to modernize recruitment by enabling companies to discover and engage passive candidates efficiently, ensuring high-quality hires for critical roles.








