About the Role
We are seeking a Site Reliability Engineer (SRE) to own the availability, resilience, and operational readiness of our cloud-native platform running on AWS. This role is responsible for ensuring our systems are designed to tolerate failure, recover quickly, and support safe, continuous delivery.
As an SRE, you will apply software engineering principles to infrastructure, operations, and incident response. You will partner closely with application engineers to balance delivery velocity with reliability, and you will have clear ownership of high availability (HA), disaster recovery (DR), and incident management preparedness.
Core Responsibilities:
Reliability, Availability & Disaster Recovery
- Own and continuously evolve the High Availability (HA) and Disaster Recovery (DR) strategy across all production systems
- Define, document, and enforce service reliability targets, including availability objectives and recovery expectations
- Design and maintain resilient architectures using AWS managed services
- Establish and validate RTO and RPO targets for applications and data stores
- Design, document, and execute disaster recovery simulations, game days, and real failover testing
- Implement backup, restore, replication, and failover strategies for managed databases, Kafka, and OpenSearch
- Identify and eliminate single points of failure across infrastructure, pipelines, and operational processes
- Ensure DR plans are tested, trusted, and continuously improved, not just documented
Incident Management & Operational Readiness
- Own incident management preparedness, including tooling, runbooks, escalation paths, and communication practices
- Participate in and lead incident response for availability-impacting events
- Conduct blameless post-incident reviews and drive corrective actions to completion
- Improve systems and processes based on incident learnings
- Ensure on-call rotations, alerts, and monitoring are actionable and sustainable
- Design systems so that failures are expected, detected quickly, and recoverable
Release Engineering & Continuous Delivery
- Own the reliability aspects of continuous delivery and release management
- Design, build, and improve CI/CD pipelines using AWS CodePipeline and related services
- Define and implement safe deployment strategies (e.g., rolling, blue/green, canary)
- Build automated validation, rollback, and deployment safety mechanisms
- Partner with engineering teams to reduce deployment risk, downtime, and mean time to recovery
- Balance release velocity against reliability using data-driven decision-making
Platform Engineering & DevSecOps
- Build and maintain infrastructure using CloudFormation as infrastructure as code
- Deploy and operate Docker-based workloads on ECS Fargate
- Embed security controls into build, deploy, and runtime stages (DevSecOps)
- Secure dependencies and artifacts using AWS CodeArtifact
- Collaborate with development teams using Bitbucket-based workflows
- Implement observability best practices, including metrics, logs, tracing, and alerts
- Apply AWS best practices for IAM, networking, encryption, and secrets management
- Required Qualifications
- Strong production experience operating systems on AWS
- Hands-on experience with containerized workloads on ECS Fargate
- Proven experience owning system reliability, availability, and recovery
- Experience designing and executing disaster recovery tests and failover simulations
- Experience participating in or leading incident response
- Strong understanding of CI/CD, release engineering, and deployment strategies
- Hands-on experience with CloudFormation or equivalent infrastructure-as-code tools
- Experience working with Bitbucket or similar source control systems
- Familiarity with managed databases, Kafka, and OpenSearch
- Strong scripting and automation skills (e.g., Python, Bash)
Preferred Qualifications
- Experience defining and operating with SLOs, SLIs, and error budgets
- Experience running DR game days in production environments
- Familiarity with SRE or production-readiness review practices
- Experience integrating security scanning and controls into CI/CD pipelines
- AWS certifications (DevOps Engineer, Solutions Architect, or Security Specialty)
- Experience supporting compliance or audit-driven environments
What Success Looks Like
- Systems consistently meet defined availability and recovery objectives
- Disaster recovery plans are regularly tested and trusted
- Incidents are handled calmly, efficiently, and lead to lasting improvements
- Deployments are routine, low-risk, and automated
- Engineering teams ship faster with confidence in platform reliability
What We Offer
- Clear ownership of platform reliability and operational excellence
- Modern AWS-native architecture using managed services
- A culture that values engineering rigor, resilience, and learning from failure
- Competitive compensation and benefits
- Flexible work environment
Important Note for Candidates
This role includes shared on-call responsibilities and active participation in incident response. We believe reliable systems are built by engineers who are empowered to improve them.
Benefits- Competitive salary and stock options plan (with approval).
- 5 weeks of PTO.
- 5 sick leave days.
- Multisport card.
- Flexible work hours and a hybrid work setup.
- Professional growth and development opportunities.
- Global, collaborative, and inclusive company culture.
Top Skills
What We Do
TruU changes the way employees and partners experience a workplace. We are creating the next gen workplace, especially after COVID. Contactless, smartphone-based identity. Forget the badges, eliminate passwords and data breach risk. Interactions with buildings, doors and all IT systems becomes fully frictionless, yet still secure. Real magic!
Why Work With Us
Our company is unique because of the way we meld modern cloud and container-based architecture with cutting edge data science and machine learning. Lots of companies exist to make a profit, we have the potential to change the way people experience day-to-day life and the world. Join TruU to be around smart and passionate technologists.
Gallery









