Senior Site Reliability Engineer / Cloud Operations Engineer (m/f/d)

Posted Yesterday
Be an Early Applicant
Berlin, DEU
In-Office
Senior level
Artificial Intelligence • Big Data • Information Technology • Security • Software
The Role
Operate and maintain highly available sovereign cloud services (99.99%+). Monitor SLIs/SLOs, troubleshoot complex incidents, participate in 24/7 on-call rotation, drive automation, document runbooks, perform post-incident reviews, and ensure compliance for secure cloud environments leveraging Google Cloud technologies.
Summary Generated by Built In
Location: Berlin, Germany

We Say HI* 

Site Reliability Engineer / Cloud Operations Engineer (f/m/d)  
 

 German companies and public administrations in this country are ready to accelerate their digital transformation and the use of AI—but they will never compromise on the security of their most sensitive data. This is where Thales in Germany, in partnership with Google Cloud and our new company currently being established, comes into play. With a new, 100% German business unit, we are providing a concrete response to the strict requirements of the BSI. What we are creating is a locally and fully autonomously operated “Trusted Cloud”. It provides access to the broadest service portfolio on the market, while everything remains strictly under European jurisdiction. By combining German and French standards such as SecNumCloud, C5 and C3-A, we offer our customers unequaled resilience and business continuity. This is a turning point for our industry and a decisive step towards a strong, sovereign digital Europe. 

Your mission as Site Reliability Engineer: 

  • Operate and maintain mission-critical sovereign cloud services with availability targets of 99.99% and above.

  • Monitor service health, reliability, scalability, latency, and performance using Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

  • Investigate, troubleshoot, and resolve complex production incidents across large-scale distributed cloud environments.

  • Participate in a structured 24/7 on-call rotation (approximately one week every six weeks) to ensure continuous service availability.

  • Collaborate with Site Reliability Engineers, Cloud Infrastructure Specialists, and Product Experts across international teams to mitigate incidents and drive long-term solutions.

  • Build a deep understanding of Google's cloud technologies and distributed systems through an intensive training program covering technologies such as Borg, Colossus, Spanner, and other core GCP components.

  • Drive operational excellence by creating and maintaining technical documentation, standardizing incident response procedures, and continuously improving operational playbooks.

  • Lead and contribute to post-incident reviews, root cause analyses, and the implementation of preventive measures to improve platform reliability.

  • Identify opportunities for automation and contribute to improving operational efficiency, scalability, compliance, and service reliability.

  • Support the operation of highly secure cloud environments designed to meet stringent regulatory and sovereignty requirements.

We are looking forward to: 

  • Several years of experience in Site Reliability Engineering, Cloud Operations, DevOps, Platform Engineering, Infrastructure Engineering, Production Support, Network Operations (NOC), Technical Operations, or a comparable role.

  • Experience operating and supporting business-critical production systems with demanding uptime and availability requirements.

  • Strong troubleshooting and incident management skills in complex technical environments.

  • Experience monitoring, operating, and maintaining distributed systems, cloud platforms, infrastructure services, or large-scale applications.

  • Familiarity with reliability engineering concepts, observability, monitoring, alerting, incident response, and root cause analysis.

  • Experience working with automation, scripting, operational tooling, or Infrastructure-as-Code approaches.

  • Strong analytical and problem-solving skills with a structured and methodical approach.

  • Professional proficiency in both German and English.

  • Willingness to participate in a regular on-call rotation.

  • Curiosity, adaptability, and a strong desire to learn and work with hyperscale cloud technologies.

 
The Group invests more than €4,5 billion per year in Research & Development in key areas, particularly for critical environments, such as Artificial Intelligence, cybersecurity, quantum and cloud technologies.  

In 2025, the Group generated sales of €22.1 billion. 
 
For our more than 85,000 employees in 65 countries we open up visionary perspectives, realise individual career paths and enable creative freedom. This is achieved with courage, versatility and the firm intention to make the demanding challenges of our time safer and more inclusive. With our sustainable value-focused management we support diversity actively. 

Say HI* – Your journey to us 

At times of change our international teams are ready to meet the complexity of today with the industry-leading technologies of tomorrow. Will you be part of it? Your Talent Acquisition contact Andre Fuhrmann is looking forward to your online application.  

Andre Fuhrmann – Talent Acquisition Partner  

+49 7156 / 302-22002 

*Human Intelligence 

#LI-AF1 

#LI-HYBRID 

Skills Required

  • Several years of experience in Site Reliability Engineering, Cloud Operations, DevOps, Platform or Infrastructure Engineering
  • Experience operating and supporting business-critical production systems with demanding uptime and availability requirements
  • Strong troubleshooting and incident management skills in complex technical environments
  • Experience monitoring, operating, and maintaining distributed systems, cloud platforms, infrastructure services, or large-scale applications
  • Familiarity with reliability engineering concepts, observability, monitoring, alerting, incident response, and root cause analysis
  • Experience with automation, scripting, operational tooling, or Infrastructure-as-Code approaches
  • Professional proficiency in German and English
  • Willingness to participate in a regular on-call rotation (approx. one week every six weeks)
  • Curiosity, adaptability, and strong desire to learn hyperscale cloud technologies

Thales Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Thales and has not been reviewed or approved by Thales.

  • Retirement Support Retirement plans with employer contributions and matches, profit sharing, and share purchase opportunities are emphasized across multiple regions. These elements are positioned as competitive components of total rewards.
  • Leave & Time Off Breadth Generous PTO that increases with tenure, paid holidays, and paid military, maternity, and paternity leave are described. This breadth supports work–life balance across locations.
  • Flexible Benefits Hybrid work options, flexible schedules, and parental supports such as childcare benefits and leave for sick children are available in several markets. Flexibility is presented as a core part of the employee experience.

Thales Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Arlington, VA
63,258 Employees

What We Do

Thales is a global high technology leader investing in digital and “deep tech” innovations – connectivity, big data, artificial intelligence, cybersecurity and quantum technology – to build a future we can all trust, which is vital to the development of our societies. The company provides solutions, services and products that help its customers – businesses, organisations and states – in the defence, aeronautics, space, transportation and digital identity and security markets to fulfil their critical missions, by placing humans at the heart of the decision-making process.

Similar Jobs

Pfizer Logo Pfizer

Oncology Medical Head, International

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office
8 Locations
121990 Employees
330K-550K Annually

HiBob Logo HiBob

Consultant

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
Germany
1350 Employees

HiBob Logo HiBob

Join our DACH Talent Community (f/m/d)

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
Germany
1350 Employees

HERE Technologies Logo HERE Technologies

Lead Software Engineer

Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software
Hybrid
Berlin, DEU
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account