Senior Site Reliability Engineer

Reposted 21 Days Ago
Be an Early Applicant
London, Greater London, England, GBR
In-Office
Senior level
HR Tech
The Role
The Principal Site Reliability Engineer will lead SRE transformations, ensuring system reliability and scalability, mentor engineers, and drive infrastructure as code practices.
Summary Generated by Built In

Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.

Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.

Role

In this role you will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale.

This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

Responsibilities

  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Crafting and implementing a cloud infrastructure and tooling strategy       
  • Work across our Org to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tool
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform

Requirements
  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews

Benefits
  • Hybrid working - 1+ days a week in the London office
  • Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision) and Life Assurance
  • 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
  • Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

'Here at Orgvue we promote individualism and a diverse workforce to build on our future success'

Skills Required

  • Deep hands-on expertise with Kubernetes in production environments
  • Strong experience with AWS core services
  • Expert in Infrastructure as Code using Terraform
  • Strong background in observability metrics and logging
  • Proven experience with incident management and disaster recovery planning
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Philadelphia, PA
196 Employees

What We Do

Orgvue delivers an altogether richer, more visual organizational design and workforce planning experience. Our SaaS platform empowers large enterprises to continuously plan for the future from ‘strategy to people’, so they can make faster workforce decisions in a constantly changing world. With Orgvue, organizations can confidently build the businesses they want tomorrow, today.

Similar Jobs

Navan Logo Navan

Senior Site Reliability Engineer

Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
Easy Apply
Hybrid
London, Greater London, England, GBR
3300 Employees

iManage Logo iManage

Senior Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
Hybrid
London, Greater London, England, GBR
1100 Employees
In-Office
Cheltenham, Gloucestershire, England, GBR
79 Employees
95K-117K Annually
In-Office
Nottingham, Nottinghamshire, England, GBR
15967 Employees

Similar Companies Hiring

RethinkFirst Thumbnail
Telehealth • Software • Professional Services • Information Technology • HR Tech • Healthtech • Edtech
New York, NY
300 Employees
Empathy Thumbnail
Fintech • Healthtech • HR Tech • Information Technology • Other • Financial Services • Telehealth
New York, NY
180 Employees
Compa Thumbnail
Artificial Intelligence • HR Tech • Other • Software • Business Intelligence
Irvine, CA
75 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account