Site Reliability Engineer

Posted 20 Days Ago
Be an Early Applicant
Prague
In-Office
Mid level
Machine Learning • Software • Cybersecurity
TruU eliminates passwords and badges through a unified, mobile-first digital identity.
The Role
The DevSecOps Engineer will design, build, and secure CI/CD pipelines in AWS while embedding security practices throughout the software delivery lifecycle. Responsibilities include managing AWS infrastructure, implementing automation, and collaborating with development teams to ensure secure application deployment.
Summary Generated by Built In

About the Role 

We are seeking a Site Reliability Engineer (SRE) to own the availability, resilience, and operational readiness of our cloud-native platform running on AWS. This role is responsible for ensuring our systems are designed to tolerate failure, recover quickly, and support safe, continuous delivery. 

As an SRE, you will apply software engineering principles to infrastructure, operations, and incident response. You will partner closely with application engineers to balance delivery velocity with reliability, and you will have clear ownership of high availability (HA), disaster recovery (DR), and incident management preparedness

Core Responsibilities:

Reliability, Availability & Disaster Recovery 

  • Own and continuously evolve the High Availability (HA) and Disaster Recovery (DR) strategy across all production systems 
  • Define, document, and enforce service reliability targets, including availability objectives and recovery expectations 
  • Design and maintain resilient architectures using AWS managed services 
  • Establish and validate RTO and RPO targets for applications and data stores 
  • Design, document, and execute disaster recovery simulations, game days, and real failover testing 
  • Implement backup, restore, replication, and failover strategies for managed databases, Kafka, and OpenSearch 
  • Identify and eliminate single points of failure across infrastructure, pipelines, and operational processes 
  • Ensure DR plans are tested, trusted, and continuously improved, not just documented 

Incident Management & Operational Readiness 

  • Own incident management preparedness, including tooling, runbooks, escalation paths, and communication practices 
  • Participate in and lead incident response for availability-impacting events 
  • Conduct blameless post-incident reviews and drive corrective actions to completion 
  • Improve systems and processes based on incident learnings 
  • Ensure on-call rotations, alerts, and monitoring are actionable and sustainable 
  • Design systems so that failures are expected, detected quickly, and recoverable 

Release Engineering & Continuous Delivery 

  • Own the reliability aspects of continuous delivery and release management 
  • Design, build, and improve CI/CD pipelines using AWS CodePipeline and related services 
  • Define and implement safe deployment strategies (e.g., rolling, blue/green, canary) 
  • Build automated validation, rollback, and deployment safety mechanisms 
  • Partner with engineering teams to reduce deployment risk, downtime, and mean time to recovery 
  • Balance release velocity against reliability using data-driven decision-making 

Platform Engineering & DevSecOps 

  • Build and maintain infrastructure using CloudFormation as infrastructure as code 
  • Deploy and operate Docker-based workloads on ECS Fargate 
  • Embed security controls into build, deploy, and runtime stages (DevSecOps) 
  • Secure dependencies and artifacts using AWS CodeArtifact 
  • Collaborate with development teams using Bitbucket-based workflows 
  • Implement observability best practices, including metrics, logs, tracing, and alerts 
  • Apply AWS best practices for IAM, networking, encryption, and secrets management 
  • Required Qualifications 
  • Strong production experience operating systems on AWS 
  • Hands-on experience with containerized workloads on ECS Fargate 
  • Proven experience owning system reliability, availability, and recovery 
  • Experience designing and executing disaster recovery tests and failover simulations 
  • Experience participating in or leading incident response 
  • Strong understanding of CI/CD, release engineering, and deployment strategies 
  • Hands-on experience with CloudFormation or equivalent infrastructure-as-code tools 
  • Experience working with Bitbucket or similar source control systems 
  • Familiarity with managed databases, Kafka, and OpenSearch 
  • Strong scripting and automation skills (e.g., Python, Bash)  

Preferred Qualifications 

  • Experience defining and operating with SLOs, SLIs, and error budgets 
  • Experience running DR game days in production environments 
  • Familiarity with SRE or production-readiness review practices 
  • Experience integrating security scanning and controls into CI/CD pipelines 
  • AWS certifications (DevOps Engineer, Solutions Architect, or Security Specialty) 
  • Experience supporting compliance or audit-driven environments 

What Success Looks Like

  •  Systems consistently meet defined availability and recovery objectives 
  • Disaster recovery plans are regularly tested and trusted 
  • Incidents are handled calmly, efficiently, and lead to lasting improvements 
  • Deployments are routine, low-risk, and automated 
  • Engineering teams ship faster with confidence in platform reliability 

What We Offer 

  • Clear ownership of platform reliability and operational excellence 
  • Modern AWS-native architecture using managed services 
  • A culture that values engineering rigor, resilience, and learning from failure 
  • Competitive compensation and benefits 
  • Flexible work environment 

 Important Note for Candidates 

This role includes shared on-call responsibilities and active participation in incident response. We believe reliable systems are built by engineers who are empowered to improve them. 

Benefits
  • Competitive salary and stock options plan (with approval).
  • 5 weeks of PTO.
  • 5 sick leave days.
  • Multisport card.
  • Flexible work hours and a hybrid work setup.
  • Professional growth and development opportunities.
  • Global, collaborative, and inclusive company culture.

Top Skills

Amazon Opensearch
AWS
Aws Cloud Development Kit (Cdk)
Aws Codeartifact
Aws Codepipeline
Aws Ecs Fargate
Aws-Managed Services
Bash
Ci/Cd
CloudFormation
Docker
Git
Kafka
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Denver, CO
35 Employees
Year Founded: 2017

What We Do

TruU changes the way employees and partners experience a workplace. We are creating the next gen workplace, especially after COVID. Contactless, smartphone-based identity. Forget the badges, eliminate passwords and data breach risk. Interactions with buildings, doors and all IT systems becomes fully frictionless, yet still secure. Real magic!

Why Work With Us

Our company is unique because of the way we meld modern cloud and container-based architecture with cutting edge data science and machine learning. Lots of companies exist to make a profit, we have the potential to change the way people experience day-to-day life and the world. Join TruU to be around smart and passionate technologists.

Gallery

Gallery

Similar Jobs

ShipMonk Logo ShipMonk

Site Reliability Engineer

Logistics • 3PL: Third Party Logistics
Easy Apply
In-Office or Remote
Prague, CZE
341 Employees

Global Payments Inc. Logo Global Payments Inc.

Site Reliability Engineer

eCommerce • Fintech • Payments
In-Office
Prague, CZE
24000 Employees

Rapid7 Logo Rapid7

Senior Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Prague, CZE
2400 Employees

Outreach Logo Outreach

Site Reliability Engineer

Machine Learning • Productivity • Sales • Software
Hybrid
Prague, CZE
1155 Employees

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account