Lead Service Reliability Engineer

Posted 3 Days Ago
Be an Early Applicant
Singapore
In-Office
Senior level
Software
Why does Thoughtworks exist? To create an extraordinary impact on the world through our culture & technology excellence.
The Role
As a Lead Service Reliability Engineer, you will ensure technical excellence in SRE, improve system reliability, manage incidents, and mentor teams.
Summary Generated by Built In

Due to the project requirement, candidates must be Singaporean citizens or already hold Singaporean Permanent Residency (PR) at the time of application.

As Service Reliability Engineer (SRE) in DAMO service line, you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.

Job responsibilities
  • You will be responsible for understanding requirements or SRE goals in depth from both tech and business perspectives.
  • You will provide solutions to improve reliability, including identifying and implementing mechanisms and architectures that enable fault tolerance and faster median time to respond and median time to detect.
  • You will be responsible for enhancing the incident management process, including the development of an incident prioritization matrix, triage, communication, mitigation, post-mortem analysis and implementation of corrective actions.
  • You will manage client stakeholder expectations and queries during production incidents, providing detailed technical analysis of issues and remediation plans for mitigation and prevention in future, and act as the interface for C-level executives, if or when needed.
  • You will be a liaison with client engineering teams, build trust and productive relationships with senior client stakeholders and team leads to influence them in making better decisions.
  • You will be responsible for identifying opportunities for enhancing system performance and reliability in alignment with business SLAs, SLOs, KPIs and objectives, and provide guidance and assistance to SRE teams in implementing the identified improvements.
  • As an SRE expert, you will collaborate with Thoughtworks application development leads and solution architects, recommending changes in system design and adopting best practices for improved reliability from day one.
  • You will oversee and mentor other SREs on the team, contributing to their growth and development.
Job qualifications
Technical Skills
  • You can program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby or Java.
  • You are familiar with DevOps and GitOps practices, driving the integration of observability automation into CI/CD pipelines, e.g.: GitLab, Jenkins, CircleCI or equivalent.
  • You have in-depth knowledge of configuration management and Infrastructure as Code (IAC) tools such as Terraform, Ansible, ARM and CloudFormation for provisioning and managing infrastructure.
  • You have an expertise in observability, logs, tracing and monitoring tools such as Grafana (Loki and Tempo), Prometheus, Graylog, Jaeger, Zipkin, ELK stack or equivalent.
  • You have a strong understanding of container-based architecture and hands-on experience with orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.
  • You have in-depth experience in application and infrastructure performance tuning and scaling to handle heavy loads under different scenarios e.g.: Periodic traffic load and tsunami patterns.
  • You have a good understanding of essential concepts such as quality gates encompassing SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortem methodologies, synthetic monitoring, distributed tracing, end-user monitoring and performance testing.
  • You have experience with network load balancing, security tech stacks, Transport Layer Security (TLS) and certificate management, and an understanding of standard networking protocols and configurations.
Professional Skills
  • You have strong communication and articulation skills, and are proficient in English.
  • You are able to convey resolutions to audiences with varying degrees of technical/business proficiency and bring them to consensus. 
  • You have excellent problem-solving and analytical skills, with a focus on continuous improvement.
  • You have good listening and presentation skills.
  • You solve challenging problems and difficult to debug issues with a never give up attitude.
  • You can collaborate with cross-functional engineering teams to conduct capacity planning and scalability assessments, and design solutions for handling current and future growth.
  • You have the ability to work under pressure, with composure, during production incidents.
  • You understand requirements provided by the client on both technical and business aspects, and can break them down for successful implementation.
  • You’re willing to be part of a rotation- and need-based, 24x7 available team.
Other things to know
Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

About DAMO

At DAMO™ Managed Services, we go beyond routine maintenance - we focus on continuous evolution to help organizations achieve extraordinary impact. Here, you’ll work on proactive improvements rather than reactive fixes. We're at the forefront of cost optimization, automation, and scalable solutions. Your expertise will play a key role in streamlining operations, boosting efficiency, and ensuring our systems grow with our clients’ needs. Join and be part of a team that thrives on curiosity, innovation, and purpose.

#LI-Onsite

See here our AI policy.

Top Skills

Ansible
Arm
Aws Eks
CircleCI
CloudFormation
Docker Swarm
Elk Stack
Gitlab
Go
Grafana
Graylog
Jaeger
Java
Jenkins
Kubernetes
Prometheus
Python
Ruby
Shell Scripting
Terraform
Zipkin
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chicago, IL
7,674 Employees
Year Founded: 1993

What We Do

We are a leading global technology consultancy that integrates strategy, design and software engineering to enable enterprises and technology disruptors across the globe to thrive as modern digital businesses.

Why Work With Us

As technologists, we have a unique role to play in how technology should benefit all of society, pursuing a more equitable future. Part of that role is to continuously educate ourselves on the issues that matter to the causes we believe in. We recognize our privilege and strive to see the world from the perspective of the most vulnerable.

Gallery

Gallery

Similar Jobs

In-Office
Singapore, SGP
7674 Employees
Hybrid
Singapore, SGP
289097 Employees

Morningstar Logo Morningstar

Associate Equity Analyst

Enterprise Web • Fintech • Financial Services
Hybrid
Singapore, SGP
12700 Employees
Remote or Hybrid
Singapore, SGP
3049 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account