Site Reliability Engineer

Posted 9 Hours Ago
Be an Early Applicant
Cambridge, Cambridgeshire, England, GBR
In-Office
Senior level
Security • Cybersecurity
The Role
Lead a core SRE domain by defining standards, building automation and tooling, investigating systemic incidents, embedding reliability into the internal platform, contributing to on-call rotations, and collaborating with Platform Engineering and DevSecOps to improve resilience and operational maturity.
Summary Generated by Built In

Darktrace is a global leader in AI for cybersecurity that keeps organizations ahead of the changing threat landscape every day. Founded in 2013, Darktrace provides the essential cybersecurity platform protecting nearly 10,000 organizations from unknown threats using its proprietary AI.

The Darktrace Active AI Security Platform™ delivers a proactive approach to cyber resilience to secure the business across the entire digital estate – from network to cloud to email. Breakthrough innovations from our R&D teams have resulted in over 200 patent applications filed. Darktrace’s platform and services are supported by over 2,400 employees around the world. To learn more, visit http://www.darktrace.com.  

Job Description:

About the Role

We’re looking for a Site Reliability Engineer (SRE) to bring deep expertise in a key reliability domain and help shape the future of our platform reliability strategy.

SRE sits at the heart of our operational trifecta alongside Platform Engineering and DevSecOps. In this role, you’ll act as the go-to authority in your area of specialism, working across teams to embed best practices, solve complex reliability challenges, and improve system resilience at scale.

Unlike a generalist SRE, this role focuses on a core domain of expertise—such as observability, performance engineering, data infrastructure reliability, security-focused SRE, or network reliability—while influencing reliability standards across the wider engineering organisation.

Key ResponsibilitiesDomain Expertise & Strategy
  • Act as the subject matter expert in your chosen reliability domain
  • Define and implement standards, frameworks, and best practices across SRE, Platform Engineering, and DevSecOps
  • Stay current with industry trends and bring innovative ideas into the organisation
Engineering & Delivery
  • Design and implement solutions to complex, cross-cutting reliability challenges
  • Build tooling, automation, and frameworks to improve system resilience and scalability
  • Lead deep-dive investigations into systemic issues and drive long-term fixes
Collaboration & Platform Integration
  • Partner with Platform Engineering to ensure your domain is embedded within the internal developer platform
  • Collaborate with DevSecOps to integrate security, compliance, and resilience practices
  • Contribute to cross-team initiatives that improve reliability across the stack
Incident & Operational Excellence
  • Play a key role in incident response, particularly within your specialism
  • Contribute to on-call rotations and continuous improvement of operational processes
  • Develop runbooks, documentation, and training materials to support teams
What You’ll BringEssential
  • Proven experience in Site Reliability Engineering, DevOps, or infrastructure engineering
  • Deep expertise in at least one of the following areas:
    • Observability & monitoring (metrics, logging, distributed tracing)
    • Performance engineering & capacity planning
    • Data infrastructure reliability (databases, streaming, pipelines)
    • Security-focused SRE (hardening, compliance automation, secrets management)
    • Network reliability & traffic management
  • Strong programming skills (e.g. Go, Python, or similar)
  • Experience with cloud platforms (AWS, GCP, Azure) and Kubernetes
  • Strong communication skills, with the ability to explain complex technical concepts clearly
  • Self-driven with the ability to identify and prioritise high-impact work independently
Desirable
  • Experience building internal developer platforms or tooling
  • Contributions to open-source, technical blogs, or public speaking
  • Experience working in regulated environments
  • Familiarity with SLO frameworks and error budget management
  • Relevant certifications in your specialist domain
Success Measures
  • Improved reliability and performance within your domain of specialism
  • Adoption of best practices across SRE, Platform Engineering, and DevSecOps
  • Reduction in incidents and faster resolution times
  • Scalable, well-integrated solutions within the internal platform
  • Strong collaboration across teams and measurable improvements in operational maturity
Why Join Us?
  • Shape reliability strategy in a modern, cloud-native engineering environment
  • Work on complex, high-impact systems at scale
  • Collaborate with expert teams across Platform Engineering and DevSecOps
  • Take ownership of a domain and drive meaningful, organisation-wide impact

Benefits:

  • 23 days’ holiday + all public holidays, rising to 25 days after 2 years of service,

  • Additional day off for your birthday,

  • Private medical insurance which covers you, your cohabiting partner and children,

  • Life insurance of 4 times your base salary,

  • Salary sacrifice pension scheme,

  • Enhanced family leave,

  • Confidential Employee Assistance Program,

  • Cycle to work scheme.

Darktrace is an Equal Opportunity Employer. We consider all qualified applicants for employment without regard to race, color, religion, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, veteran or military status, or any other characteristic protected by applicable federal, state, or local law.

Darktrace is committed to providing reasonable accommodations to qualified individuals with disabilities in accordance with applicable laws. If you require a reasonable accommodation to participate in the application or interview process, please contact your Talent Partner.

Skills Required

  • Proven experience in Site Reliability Engineering, DevOps, or infrastructure engineering
  • Deep expertise in one specialist area (observability, performance engineering, data infrastructure reliability, security-focused SRE, or network reliability)
  • Strong programming skills (Go, Python, or similar)
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Experience with Kubernetes
  • Strong communication skills and ability to explain complex technical concepts clearly
  • Self-driven with ability to identify and prioritise high-impact work independently
  • Experience building internal developer platforms or tooling
  • Contributions to open-source, technical blogs, or public speaking
  • Experience working in regulated environments
  • Familiarity with SLO frameworks and error budget management
  • Relevant certifications in your specialist domain

Darktrace Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Darktrace and has not been reviewed or approved by Darktrace.

  • Healthcare Strength Healthcare coverage is considered strong in the U.S., with employer-paid medical, dental, and vision noted alongside good networks. Plan quality stands out as a core strength of the package.
  • Leave & Time Off Breadth Time off typically includes around 20–21 days of PTO plus paid holidays, with mentions of an additional birthday day. Parental and family leave are also described as part of the offering.
  • Affordable Benefits Employer-paid health coverage for employees—often extended to dependents—reduces premium costs. This makes core medical benefits more financially accessible.

Darktrace Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, GA
1,763 Employees
Year Founded: 2013

What We Do

Darktrace, a global leader in cyber security AI, delivers world-class technology that protects over 5,500 customers worldwide from advanced threats, including ransomware and cloud and SaaS attacks. The company’s fundamentally different approach applies Self-Learning AI to enable machines to understand the business in order to autonomously defend it. Headquartered in Cambridge, UK, the company has 1,500 employees and over 30 offices worldwide. Darktrace was named one of TIME magazine’s ‘Most Influential Companies’ for 2021.

Similar Jobs

Miro Logo Miro

Site Reliability Engineer

Cloud • Information Technology • Internet of Things • Productivity • Software
Remote or Hybrid
2 Locations
2500 Employees

TWG Global Logo TWG Global

Site Reliability Engineer

Angel or VC Firm • Artificial Intelligence • Fintech • Software • Financial Services
In-Office
London, Greater London, England, GBR
54 Employees
95K-95K Annually

Mistral AI Logo Mistral AI

Site Reliability Engineer

Artificial Intelligence
In-Office or Remote
6 Locations
92 Employees

MLabs Logo MLabs

Site Reliability Engineer

Artificial Intelligence • Blockchain • Information Technology • Consulting
In-Office
London, Greater London, England, GBR
50K-60K Annually

Similar Companies Hiring

Oso Thumbnail
Software • Security • Infrastructure as a Service (IaaS)
New York, New York
36 Employees
Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Milestone Systems Thumbnail
Artificial Intelligence • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account