Senior Service Reliability Engineer

Job Posted 16 Days Ago Posted 16 Days Ago
Be an Early Applicant
Iași
Senior level
Software
Why does Thoughtworks exist? To create an extraordinary impact on the world through our culture & technology excellence.
The Role
As a Senior Service Reliability Engineer, you will enhance site reliability, manage incidents, improve observability, and collaborate with teams to ensure operational efficiency.
Summary Generated by Built In

As a Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.

Job responsibilities

  • You will improve site reliability by building mechanisms/architectures that enable fault tolerance and faster median time to respond and median time to detect
  • You will drive the integration of observability automation into the CI/CD pipeline
  • You will handle production incidents, manage incident communication with clients and draft root cause analysis documents
  • You will monitor performance of production systems and improve their scaling to ensure business goals are met within expected SLA and SLO metrics
  • You will work closely with application development teams as advisors on improving system reliability and assisting in implementation for reliability improvements
  • You will improve system observability across multiple facets such as logging and metrics, reducing false alarms to eliminate unnecessary toil and improving process efficiency
  • You will implement chaos engineering practices as necessary to test system reliability, setting up processes for such testing to be done regularly 
  • You have a clear understanding of client goals and business needs and setting direction for site reliability in line with the same, e.g.: Achieving application availability with minimum/no disruption (99.999%) if necessary for business

Job qualifications

Technical Skills
  • You have hands-on experience in programming and scripting languages such as Python, Go or Bash
  • You have a good understanding of at least one Public Cloud, e.g.: AWS, Azure or GCP 
  • You have had exposure to observability tools such as Grafana, Datadog, NewRelic, ELK Stack, Dynatrace or equivalent and you are proficient in using data from these tools to dissect and identify root causes of system and infrastructure issues 
  • You are familiar with DevOps and GitOps practices 
  • You have a good knowledge of container-based architecture and orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.
  • You understand technical architecture and modern design patterns, including microservices, serverless functions, NoSQL and RESTful APIs, with experience in fixing bugs, analyzing logs, building metrics and operational dashboards
  • You are familiar with creating infrastructure resources for improving reliability of system that follows Cloud’s Well Architected Framework principles: Reliability, security, cost optimization, performance efficiency and operational

Professional Skills

  • You have strong communication and articulation skills, and are proficient in English
  • You have good people skills with an emphasis on negotiation and close collaboration with multiple cross-functional teams from the client side and/or Thoughtworks
  • You solve challenging problems and difficult to debug issues with a never give up attitude
  • You have the ability to work under pressure and with composure during production incidents
  • You can confidently recommend improvements backed by strong technical arguments to client stakeholders or application development teams
  • You are able to understand requirements provided by the client on both technical and business aspects and break them down for successful implementation
  • You have a strong drive and ownership mentality, with a willingness to sign up for and deliver work when called upon, without being too concerned about role boundaries
  • You’re willing to be part of a rotation- and need-based 24x7 available team

Other things to know

Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a global technology consultancy that integrates strategy, design and engineering to drive digital innovation. For 30+ years, our clients have trusted our autonomous teams to build solutions that look past the obvious. Here, computer science grads come together with seasoned technologists, self-taught developers, midlife career changers and more to learn from and challenge each other. Career journeys flourish with the strength of our cultivation culture, which has won numerous awards around the world.
Join Thoughtworks and thrive. Together, our extra curiosity, innovation, passion and dedication overcomes ordinary.

#LI-Remote

Top Skills

AWS
Aws Eks
Azure
Bash
Datadog
Docker Swarm
Dynatrace
Elk Stack
GCP
Go
Grafana
Kubernetes
Newrelic
Nomad
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chicago, IL
7,674 Employees
Hybrid Workplace
Year Founded: 1993

What We Do

We are a leading global technology consultancy that integrates strategy, design and software engineering to enable enterprises and technology disruptors across the globe to thrive as modern digital businesses.

Why Work With Us

As technologists, we have a unique role to play in how technology should benefit all of society, pursuing a more equitable future. Part of that role is to continuously educate ourselves on the issues that matter to the causes we believe in. We recognize our privilege and strive to see the world from the perspective of the most vulnerable.

Gallery

Gallery

Similar Jobs

Endava Logo Endava

Senior Java Developer

Software • Consulting
Iași, ROU
9318 Employees

NVIDIA Logo NVIDIA

STA Backend Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
2 Locations
21960 Employees

NVIDIA Logo NVIDIA

VLSI Backend Integration Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
2 Locations
21960 Employees

NVIDIA Logo NVIDIA

Backend Engineer, Full Chip Layout

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
2 Locations
21960 Employees

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees
Not Eligible
Save
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account