Reliability Engineer

Posted 2 Days Ago
Be an Early Applicant
Cambridge, Cambridgeshire, England
In-Office
Mid level
Biotech
The Role
The Reliability Engineer will ensure production systems' stability, performance, and reliability, manage incidents, automate tasks, and improve system resilience.
Summary Generated by Built In

For over 25 years, Abcam has been providing tools the scientific community needs to enable faster breakthroughs in critical areas like cancer, neurological disorders, infectious diseases, and metabolic disorders.

We believe that to continue making progress, we need to work together, each bringing our own unique perspectives to make an impact on the world. This community needs people like you: dedicated, agile and above all audacious so we can truly drive science forward.

Role Summary

We are seeking a highly motivated Reliability Engineer to join our team. As a Reliability Engineer, you will play a crucial role in ensuring the stability, performance, and reliability of our production systems. Your responsibilities will include proactively identifying and resolving technical issues, leading major incident responses, and implementing best practices for system reliability. You will work closely with cross-functional teams to develop and maintain robust monitoring and automation solutions. This position reports directly to the Global Reliability Manager.

In this role, you will have the opportunity to:

•  Shape system reliability at scale by monitoring performance, spotting trends, and preventing issues before they impact users.

•  Take charge during critical moments, leading major incident responses and driving rapid service restoration.

•  Solve complex problems for the long term, collaborating across teams to implement robust, sustainable solutions.

•  Automate and innovate, building tools and processes that streamline operations and reduce manual work.

•  Drive continuous improvement, using data insights and post-incident learnings to make systems more resilient every day.

The essential requirements of the job include:

•  Automation & Scripting: Ability to code repeatable tasks using PowerShell, Bash, or Python, and familiarity with infrastructure-as-code tools such as Terraform and configuration management tools such as Puppet.

•  Cloud & Infrastructure: Strong knowledge of AWS Cloud services, networking, security, and storage solutions both on-premises and on the cloud.

•  Reliability & Scalability: High-level understanding of High Availability, Disaster Recovery, scalability solutions, and web infrastructure troubleshooting using logs.

•  Monitoring & Incident Management: Proficiency with monitoring dashboards (Grafana, Humio, CloudWatch) and incident management tools like ServiceNow and PagerDuty.

•  Database & Pipelines: Good understanding of SQL Server, Oracle, PostgreSQL (including DML), and familiarity with CI/CD pipelines such as GitLab CI.

It would be a plus if you also possess previous experience in:

•  EKS troubleshooting knowledge

•  Application support experience

•  Linux OS trouble shooting experience

•  Oracle Cloud Infrastructure knowledge

Participate in an on-call rotation to provide 24/7 support for critical systems and respond to incidents as needed.

Join our winning team today. Together, we’ll accelerate the real-life impact of tomorrow’s science and technology. We partner with customers across the globe to help them solve their most complex challenges, architecting solutions that bring the power of science to life.

For more information, visit www.danaher.com.

Top Skills

AWS
Bash
Cloudwatch
Gitlab Ci
Grafana
Humio
Oracle
Pagerduty
Postgres
Powershell
Puppet
Python
Servicenow
SQL Server
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Sunnyvale, CA
4,883 Employees
Year Founded: 1996

What We Do

Cepheid is dedicated to improving healthcare by pioneering molecular diagnostics that combine speed, accuracy, and flexibility. The company's GeneXpert® systems and Xpert® tests automate highly complex and time-consuming manual procedures, providing A Better Way for institutions of any size to perform world-class PCR testing. Cepheid’s broad test portfolio spans respiratory infections, blood virology, women’s and sexual health, TB and emerging infectious diseases, healthcare-associated infectious diseases, oncology and human genetics. The company’s solutions deliver actionable results where they are needed most – from central laboratories and hospitals to near-patient settings. For more information, visit http://www.cepheid.com.

Similar Jobs

Smiths Group plc Logo Smiths Group plc

Reliability Engineer

Aerospace • Security • Energy • Defense
In-Office
Slough, Berkshire, England, GBR
9512 Employees

Point72 Logo Point72

Reliability Engineer

Financial Services
Easy Apply
In-Office
London, Greater London, England, GBR
1691 Employees

JLL Technologies Logo JLL Technologies

Reliability Engineer

Information Technology • Software
In-Office
London, Greater London, England, GBR
2038 Employees

Coralogix Logo Coralogix

Reliability Engineer

Machine Learning • Software
In-Office or Remote
London, Greater London, England, GBR
198 Employees

Similar Companies Hiring

Formation Bio Thumbnail
Pharmaceutical • Healthtech • Biotech • Big Data • Artificial Intelligence
New York, NY
140 Employees
SOPHiA GENETICS Thumbnail
Software • Healthtech • Biotech • Big Data • Artificial Intelligence
Boston, MA
450 Employees
Pfizer Thumbnail
Pharmaceutical • Natural Language Processing • Machine Learning • Healthtech • Biotech • Artificial Intelligence
New York, NY
121990 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account