Site Reliability Engineer (AWS & Kubernetes), VP

Posted Yesterday
Be an Early Applicant
3 Locations
In-Office
Senior level
Fintech • Payments • Financial Services
The Role
As a Vice President Site Reliability Engineer, you'll enhance system reliability and performance using SRE principles on AWS and Kubernetes, manage incidents, and drive operational excellence.
Summary Generated by Built In

Join us as a Site Reliability Engineer

  • In this key role, you’ll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
  • You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way
  • This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
  • We're offering this role at vice president level
What you'll do

As a Senior Site Reliability Engineer, you’ll act as a hands‑on expert responsible for ensuring the reliability, availability and performance of critical production platforms.

You’ll lead the adoption of SRE practices, embedding resilience, observability and operational excellence into distributed systems running on AWS and Kubernetes. You’ll also take ownership of 24/7 production support models, ensuring systems are highly available and incidents are effectively managed and learned from.

In addition to this, you’ll:

  • Designing and operating highly resilient AWS-based Kubernetes platforms (EKS) aligned to enterprise standards
  • Owning and improving production reliability, availability, and SLA/SLO frameworks
  • Leading incident management, escalation and 24/7 on-call practices, including post-incident reviews
  • Embedding SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams
  • Implementing infrastructure and platform automation using Terraform and GitOps methodologies
  • Driving self-healing, auto-scaling and failure recovery mechanisms using tools such as Karpenter
  • Building secure, scalable networking and service communication (e.g. Cilium)
  • Defining and operating observability platforms using Grafana, Prometheus, Loki, Tempo
  • Partnering with DevOps and engineering teams to ensure production readiness and operational excellence
  • Leading complex troubleshooting across distributed systems and cloud-native environments
  • Developing reusable “golden paths”, operational runbooks and reliability patterns
  • Ensuring platforms meet regulatory, security and operational risk requirements
  • Using data, SLIs and metrics to drive continuous improvement and proactive reliability enhancements
The skills you'll need

We’re looking for a highly experienced SRE who has a strong background in operating large-scale, business-critical platforms with a passion for reliability engineering

We’re also looking for:

  • Deep expertise managing production systems on AWS and Kubernetes (EKS)
  • Strong experience in 24/7 support models, incident management and on-call leadership
  • Advanced knowledge of SRE principles (SLIs, SLOs, error budgets, toil reduction)
  • Proficiency in Terraform, GitOps, and cloud automation practices
  • Hands-on experience with GitLab CI/CD and Argo CD
  • Strong understanding of Kubernetes networking, security and service mesh technologies, ideally Cilium
  • Experience scaling infrastructure using Karpenter and auto-scaling strategies
  • Expertise in observability tooling (Grafana, Prometheus, Loki, Tempo)
  • Proven ability to troubleshoot and resolve complex, cross-system production issues
  • Experience operating in regulated or high-security environments
  • Strong leadership, mentoring, and stakeholder engagement capabilities
  • Ability to balance reliability, risk, and delivery in a fast-paced environment

Hours

45

Job Posting Closing Date:

16/06/2026

Skills Required

  • Deep expertise managing production systems on AWS and Kubernetes (EKS)
  • Strong experience in 24/7 support models, incident management and on-call leadership
  • Advanced knowledge of SRE principles (SLIs, SLOs, error budgets, toil reduction)
  • Proficiency in Terraform, GitOps, and cloud automation practices
  • Hands-on experience with GitLab CI/CD and Argo CD
  • Strong understanding of Kubernetes networking, security and service mesh technologies, ideally Cilium
  • Experience scaling infrastructure using Karpenter and auto-scaling strategies
  • Expertise in observability tooling (Grafana, Prometheus, Loki, Tempo)
  • Proven ability to troubleshoot and resolve complex, cross-system production issues
  • Experience operating in regulated or high-security environments
  • Strong leadership, mentoring, and stakeholder engagement capabilities
  • Ability to balance reliability, risk, and delivery in a fast-paced environment

NatWest Group Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NatWest Group and has not been reviewed or approved by NatWest Group.

  • Flexible Benefits A flexible ValueAccount structure with pension and benefit funding allows tailoring of health, protection, lifestyle, and savings options, with unused amounts typically paid as cash. This flexibility supports personalisation of coverage, particularly in Great Britain where the framework is most detailed.
  • Retirement Support Employer-funded pension contributions are provided on top of salary in Great Britain, alongside automatic retirement enrollment and share/save programs. This creates structured long‑term wealth support as part of total reward.
  • Parental & Family Support UK policies outline extended maternity, adoption and equal partner leave on full pay with a phased return, plus paid neonatal care leave. These provisions are positioned as market‑leading and complement broader flexibility resources.

NatWest Group Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bengaluru, Karnataka
40,000 Employees
Year Founded: 1970

What We Do

We’re a business that understands when our customers and people succeed, our communities succeed, and our economy thrives. As part of our purpose, we’re looking at how we can drive change for our communities in enterprise, learning and climate. As one of the leading supporters of UK business, we’re prioritising enterprise as a force of change. We’re focusing on the people and communities who have traditionally faced the highest barriers to entry and figuring out ways to remove these. Learning is also key to our continued growth as a company in an ever changing and increasingly digital world. By setting a dynamic and leading learning culture, our people prosper, and our customers are given the tools to continue to improve their financial capability and confidence. One of the biggest challenges we all face in our future is climate change. That’s why we’ve put it right at the core of our purpose. We want to champion climate solutions with financing and entrepreneurial support, fully embed climate into our culture and decision making, and be climate positive by 2025. We’re committed to using our purpose to break down barriers, drive change and ultimately create a great place to work.

Similar Jobs

Optum Logo Optum

Consultant

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Chennai, Tamil Nadu, IND
160000 Employees

Tufin Logo Tufin

Network Engineer

Security • Cybersecurity
Remote or Hybrid
India
500 Employees

Ericsson Logo Ericsson

Technology Manager

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
5 Locations
88000 Employees

Pfizer Logo Pfizer

Program Manager

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
India
121990 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account