NatWest Group

Senior Site Reliability Engineer

Posted 7 Days Ago

Be an Early Applicant

2 Locations

In-Office

Senior level

Fintech • Payments • Financial Services

The Role

The Senior Site Reliability Engineer ensures reliability and performance of production platforms, leads SRE practices, incident management, and automation using AWS and Kubernetes.

Summary Generated by Built In

Join us as a Senior Site Reliability Engineer

In this key role, you’ll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services
You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way
This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
You’ll need to have the flexibility to support the team by working shifts and weekends on rotation

What you'll do

As a Senior Site Reliability Engineer, you’ll act as a hands-on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You’ll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You’ll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from.

We’ll expect you as well to design and operate highly resilient AWS-based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You’ll lead incident management, escalation, and 24/7 on-call practices, including post-incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you’ll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self-healing, auto-scaling, and failure recovery mechanisms using tools such as Karpenter.

In addition to this, you’ll be:

Building secure and scalable networking and service communication such as Cilium
Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo
Partnering with DevOps and engineering teams to ensure production readiness and operational excellence
Leading complex troubleshooting across distributed systems and cloud-native environments
Developing reusable “golden paths,” operational runbooks, and reliability patterns
Ensuring platforms meet regulatory, security, and operational risk requirements
Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements

The skills you'll need

We’re looking for a highly experienced Site Reliability Engineer with a strong background in operating large-scale, business-critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on-call leadership.

Moreover, you’ll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands-on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential.

In addition, you’ll have to bring:

A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium
Experience scaling infrastructure using Karpenter and auto-scaling strategies
Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo
A proven ability to troubleshoot and resolve complex, cross-system production issues
Experience operating in regulated or high-security environments
Strong leadership, mentoring, and stakeholder engagement capabilities
The ability to balance reliability, risk, and delivery in a fast-paced environment

Hours

Job Posting Closing Date:

03/06/2026

Ways of Working:Remote First

Skills Required

Deep expertise in managing production systems on AWS and Kubernetes
Experience with Terraform and GitOps
Advanced knowledge of SRE principles
Hands-on experience with GitLab CI/CD and Argo CD
Strong understanding of Kubernetes networking and security
Ability to troubleshoot complex production issues

NatWest Group Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NatWest Group and has not been reviewed or approved by NatWest Group.

Flexible Benefits — A flexible ValueAccount structure with pension and benefit funding allows tailoring of health, protection, lifestyle, and savings options, with unused amounts typically paid as cash. This flexibility supports personalisation of coverage, particularly in Great Britain where the framework is most detailed.
Retirement Support — Employer-funded pension contributions are provided on top of salary in Great Britain, alongside automatic retirement enrollment and share/save programs. This creates structured long‑term wealth support as part of total reward.
Parental & Family Support — UK policies outline extended maternity, adoption and equal partner leave on full pay with a phased return, plus paid neonatal care leave. These provisions are positioned as market‑leading and complement broader flexibility resources.

Learn more about NatWest Group's Compensation & Benefits →

NatWest Group Insights

What's It Like to Work at NatWest Group? NatWest Group Culture & Values NatWest Group Career Growth & Development What's the Work-Life Balance Like at NatWest Group? NatWest Group Leadership & Management NatWest Group Company Growth, Stability & Outlook

View all jobs at NatWest Group

View NatWest Group Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Bengaluru, Karnataka

40,000 Employees

Year Founded: 1970

What We Do

We’re a business that understands when our customers and people succeed, our communities succeed, and our economy thrives. As part of our purpose, we’re looking at how we can drive change for our communities in enterprise, learning and climate. As one of the leading supporters of UK business, we’re prioritising enterprise as a force of change. We’re focusing on the people and communities who have traditionally faced the highest barriers to entry and figuring out ways to remove these. Learning is also key to our continued growth as a company in an ever changing and increasingly digital world. By setting a dynamic and leading learning culture, our people prosper, and our customers are given the tools to continue to improve their financial capability and confidence. One of the biggest challenges we all face in our future is climate change. That’s why we’ve put it right at the core of our purpose. We want to champion climate solutions with financing and entrepreneurial support, fully embed climate into our culture and decision making, and be climate positive by 2025. We’re committed to using our purpose to break down barriers, drive change and ultimately create a great place to work.