Senior Site Reliability Engineer

Posted 3 Hours Ago
Be an Early Applicant
Sant Cugat del Vallès, Barcelona, Cataluña, ESP
In-Office
Senior level
Healthtech • Biotech • Pharmaceutical
The Role
Design, build, and scale reliable cloud-native systems; define SLIs/SLOs; automate operations with scripting and IaC; lead incident response, root cause analysis, and on-call rotation; improve observability, CI/CD, and platform resilience (Kubernetes, AWS/Azure).
Summary Generated by Built In

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

The Position

The Position
We are building a global Site Reliability Engineering (SRE) team to support critical commercial and internal platforms and applications. As an SRE, you will help design, build, and scale reliable distributed systems that power healthcare innovation worldwide.
This role is focused on reliability, scalability, automation and operational excellence. You will influence system design, define reliability standards and reduce operational toil through engineering solutions.
This role includes participation in a structured on-call rotation.

Who We Are
At Roche, we are passionate about transforming patients’ lives, and we are bold in both decision and action - we believe that good business means a better world. That is why we come to work every single day. We commit ourselves to scientific rigor, unassailable ethics and access to medical innovations for all. We do this today to build a better tomorrow.
Roche is strongly committed to a diverse and inclusive workplace. We strive to build teams that represent a range of backgrounds, perspectives and skills. Embracing diversity enables us to create a great place to work and to innovate for patients.

Step into the Future of IT with Roche!
As a seasoned Site Reliability Engineer (SRE) at Roche, you will leverage your deep software engineering expertise to propel our products to new heights of robustness, scalability and reliability. This isn't just a role—it's an invitation to shape the backbone of technological innovations forward.

Your Mission
Design and maintain cutting-edge tools, scripts and frameworks that automate repetitive tasks, streamline software deployment and manage expansive systems with unparalleled efficiency. Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization and enhance deployment processes for superior uptime and user satisfaction.

Your Impact
Lead the charge in incident management and response. Detect system anomalies, troubleshoot swiftly and conduct thorough root cause analyses to prevent recurring issues.
Champion continuous improvement by refining monitoring and alerting mechanisms, conducting insightful post-incident reviews and embedding best practices in software lifecycle management. Your strategic foresight and meticulous planning will ensure our systems are not only reliable but also superlatively performant.
By joining our elite team, you will play a pivotal role in delivering seamless experiences to our end-users, exceeding business and customer demands, and solidifying Roche's reputation as a leader in IT innovation.

Your Core Responsibilities
Reliability Engineering & Architecture

  • Define and implement SLIs, SLOs, and error budgets with product and engineering teams

  • Conduct reliability reviews for new and existing services

  • Design scalable, fault-tolerant architectures in AWS and Azure environments

  • Lead capacity planning, performance and cost optimization initiatives

  • Improve system resilience through automation and self-healing patterns

  • Drive organizational observability maturity (metrics, logs, traces, alert quality)

Incident Management & Continuous Improvement

  • Perform complex root cause analysis and drive rapid mitigation

  • Participate in blameless postmortems and follow-through

  • Improve MTTR, reduce incident frequency, and elevate production standards

  • Collaborate seamlessly with engineering teams to enable timely and effective resolutions

  • Handle requests and incidents, create and maintain runbooks

  • Participation in a structured 24*7 on-call rotation

Automation & Platform Engineering

  • Reduce operational toil through tooling and automation (Python or similar)

  • Improve CI/CD reliability and deployment safety mechanisms

  • Build and maintain infrastructure-as-code (Terraform or equivalent)

  • Enhance Kubernetes platform reliability (EKS, AKS, or similar)

Cross-Functional Leadership

  • Partner with business, engineering, security, and cloud teams to embed reliability early in the software development life cycle

  • Mentor mid-level engineers and help shape SRE best practices

  • Championing a culture of ownership, accountability, and continuous improvement

Who You Are:

  • Minimum bachelor’s degree in computer science, Engineering, or a related field, or equivalent professional experience.

  • Experience in either site reliability engineering, software engineering or related fields with production on-call experience.

  • Solid experience with AWS and/or Azure, including setting up, monitoring, and maintaining cloud resources (incl. Kubernetes, EKS, AKS, GKE, etc knowledge).

  • Proficiency with observability tools

  • Hands-on experience with incident management tools

  • Proficiency in scripting languages for automation purposes

  • Demonstrated proficiency in troubleshooting, especially in cloud and distributed system environments

  • Excellent communication, teamwork and documentation skills, with a proactive and self-motivated approach to improving system reliability and operational efficiencies.

  • We value and encourage candidates from diverse backgrounds and experiences, believing that diverse perspectives drive innovation and success.

  • Excelling in both spoken and written English communication.

 

 

Who we are

A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.


Let’s build a healthier future, together.

Roche is an Equal Opportunity Employer.

Skills Required

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience
  • Experience in site reliability engineering, software engineering, or related fields with production on-call experience
  • Experience with AWS and/or Azure including setup, monitoring, and maintaining cloud resources
  • Hands-on experience with Kubernetes (EKS, AKS, GKE) or similar
  • Proficiency with observability tools (metrics, logs, traces) and improving monitoring/alerting
  • Hands-on experience with incident management tools and conducting root cause analysis
  • Proficiency in scripting languages for automation (e.g., Python)
  • Experience with infrastructure-as-code (Terraform or equivalent)
  • Experience improving CI/CD reliability and deployment safety mechanisms
  • Excellent communication, teamwork, documentation skills, and strong English proficiency
  • Willingness to participate in a structured 24x7 on-call rotation

Roche Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Roche and has not been reviewed or approved by Roche.

  • Retirement Support U.S. materials describe a 401(k) with both matching and an additional company contribution, supported by formal plan documents and true‑up features. This structure is positioned as a standout element of the total package, particularly at Genentech.
  • Leave & Time Off Breadth Time‑off provisions include substantial vacation, a year‑end shutdown, and a paid six‑week sabbatical after six years. These elements indicate a recharge‑oriented approach within the U.S. offering.
  • Healthcare Strength Company materials emphasize comprehensive medical, dental, vision, and mental‑health resources alongside well‑being programs. Benefits pages consistently highlight breadth across core health coverage elements.

Roche Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Provincia de Buenos Aires
93,797 Employees
Year Founded: 1896

What We Do

Roche is a global pioneer in pharmaceuticals and diagnostics focused on advancing science to improve people’s lives. The combined strengths of pharmaceuticals and diagnostics under one roof have made Roche the leader in personalised healthcare – a strategy that aims to fit the right treatment to each patient in the best way possible. Roche is the world’s largest biotech company, with truly differentiated medicines in oncology, immunology, infectious diseases, ophthalmology and diseases of the central nervous system. Roche is also the world leader in in vitro diagnostics and tissue-based cancer diagnostics, and a frontrunner in diabetes management. Founded in 1896, Roche continues to search for better ways to prevent, diagnose and treat diseases and make a sustainable contribution to society. The company also aims to improve patient access to medical innovations by working with all relevant stakeholders. Thirty medicines developed by Roche are included in the World Health Organization Model Lists of Essential Medicines, among them life-saving antibiotics, antimalarials and cancer medicines. Roche has been recognised as the Group Leader in sustainability within the Pharmaceuticals, Biotechnology & Life Sciences Industry ten years in a row by the Dow Jones Sustainability Indices (DJSI).

Similar Jobs

In-Office or Remote
Barcelona, Cataluña, ESP
1678 Employees

N26 Logo N26

Senior Site Reliability Engineer

Fintech • Financial Services
In-Office
Barcelona, Cataluña, ESP
1600 Employees

Nebius Logo Nebius

Senior Site Reliability Engineer

Artificial Intelligence • Information Technology • Consulting
In-Office or Remote
30 Locations
473 Employees

Nebius Logo Nebius

Senior Site Reliability Engineer

Artificial Intelligence • Information Technology • Consulting
In-Office or Remote
27 Locations
473 Employees

Similar Companies Hiring

Camber Thumbnail
Fintech • Healthtech • Social Impact
New York, New York
90 Employees
Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account