Thales

Senior Site Reliability Engineer / Cloud Operations Engineer (m/f/d)

Reposted 16 Days Ago

Be an Early Applicant

Berlin, DEU

In-Office

Senior level

Artificial Intelligence • Big Data • Information Technology • Security • Software

The Role

Operate and maintain highly available sovereign cloud services (99.99%+). Monitor SLIs/SLOs, troubleshoot complex incidents, participate in 24/7 on-call rotation, drive automation, document runbooks, perform post-incident reviews, and ensure compliance for secure cloud environments leveraging Google Cloud technologies.

Summary Generated by Built In

Location: Berlin, Germany

We Say HI*

Site Reliability Engineer / Cloud Operations Engineer (f/m/d)

German companies and public administrations in this country are ready to accelerate their digital transformation and the use of AI—but they will never compromise on the security of their most sensitive data. This is where Thales in Germany, in partnership with Google Cloud and our new company currently being established, comes into play. With a new, 100% German business unit, we are providing a concrete response to the strict requirements of the BSI. What we are creating is a locally and fully autonomously operated “Trusted Cloud”. It provides access to the broadest service portfolio on the market, while everything remains strictly under European jurisdiction. By combining German and French standards such as SecNumCloud, C5 and C3-A, we offer our customers unequaled resilience and business continuity. This is a turning point for our industry and a decisive step towards a strong, sovereign digital Europe.

Your mission as Site Reliability Engineer:

Operate and maintain mission-critical sovereign cloud services with availability targets of 99.99% and above.
Monitor service health, reliability, scalability, latency, and performance using Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Investigate, troubleshoot, and resolve complex production incidents across large-scale distributed cloud environments.
Participate in a structured 24/7 on-call rotation (approximately one week every six weeks) to ensure continuous service availability.
Collaborate with Site Reliability Engineers, Cloud Infrastructure Specialists, and Product Experts across international teams to mitigate incidents and drive long-term solutions.
Build a deep understanding of Google's cloud technologies and distributed systems through an intensive training program covering technologies such as Borg, Colossus, Spanner, and other core GCP components.
Drive operational excellence by creating and maintaining technical documentation, standardizing incident response procedures, and continuously improving operational playbooks.
Lead and contribute to post-incident reviews, root cause analyses, and the implementation of preventive measures to improve platform reliability.
Identify opportunities for automation and contribute to improving operational efficiency, scalability, compliance, and service reliability.
Support the operation of highly secure cloud environments designed to meet stringent regulatory and sovereignty requirements.

We are looking forward to:

Several years of experience in Site Reliability Engineering, Cloud Operations, DevOps, Platform Engineering, Infrastructure Engineering, Production Support, Network Operations (NOC), Technical Operations, or a comparable role.
Experience operating and supporting business-critical production systems with demanding uptime and availability requirements.
Strong troubleshooting and incident management skills in complex technical environments.
Experience monitoring, operating, and maintaining distributed systems, cloud platforms, infrastructure services, or large-scale applications.
Familiarity with reliability engineering concepts, observability, monitoring, alerting, incident response, and root cause analysis.
Experience working with automation, scripting, operational tooling, or Infrastructure-as-Code approaches.
Strong analytical and problem-solving skills with a structured and methodical approach.
Professional proficiency in both German and English.
Willingness to participate in a regular on-call rotation.
Curiosity, adaptability, and a strong desire to learn and work with hyperscale cloud technologies.

The Group invests more than €4,5 billion per year in Research & Development in key areas, particularly for critical environments, such as Artificial Intelligence, cybersecurity, quantum and cloud technologies.

In 2025, the Group generated sales of €22.1 billion.

For our more than 85,000 employees in 65 countries we open up visionary perspectives, realise individual career paths and enable creative freedom. This is achieved with courage, versatility and the firm intention to make the demanding challenges of our time safer and more inclusive. With our sustainable value-focused management we support diversity actively.

Say HI* – Your journey to us

At times of change our international teams are ready to meet the complexity of today with the industry-leading technologies of tomorrow. Will you be part of it? Your Talent Acquisition contact Andre Fuhrmann is looking forward to your online application.

Andre Fuhrmann – Talent Acquisition Partner

+49 7156 / 302-22002

*Human Intelligence

#LI-AF1

#LI-HYBRID

Skills Required

Several years of experience in Site Reliability Engineering, Cloud Operations, DevOps, Platform or Infrastructure Engineering
Experience operating and supporting business-critical production systems with demanding uptime and availability requirements
Strong troubleshooting and incident management skills in complex technical environments
Experience monitoring, operating, and maintaining distributed systems, cloud platforms, infrastructure services, or large-scale applications
Familiarity with reliability engineering concepts, observability, monitoring, alerting, incident response, and root cause analysis
Experience with automation, scripting, operational tooling, or Infrastructure-as-Code approaches
Professional proficiency in German and English
Willingness to participate in a regular on-call rotation (approx. one week every six weeks)
Curiosity, adaptability, and strong desire to learn hyperscale cloud technologies

Thales Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Thales and has not been reviewed or approved by Thales.

Retirement Support — Retirement plans with employer contributions and matches, profit sharing, and share purchase opportunities are emphasized across multiple regions. These elements are positioned as competitive components of total rewards.
Leave & Time Off Breadth — Generous PTO that increases with tenure, paid holidays, and paid military, maternity, and paternity leave are described. This breadth supports work–life balance across locations.
Flexible Benefits — Hybrid work options, flexible schedules, and parental supports such as childcare benefits and leave for sick children are available in several markets. Flexibility is presented as a core part of the employee experience.

Learn more about Thales's Compensation & Benefits →

Thales Insights

What's It Like to Work at Thales? Thales Culture & Values Thales Career Growth & Development What's the Work-Life Balance Like at Thales? Thales Leadership & Management Thales Company Growth, Stability & Outlook

View all jobs at Thales

View Thales Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Paris

63,258 Employees

What We Do

Thales is a global high technology leader investing in digital and “deep tech” innovations – connectivity, big data, artificial intelligence, cybersecurity and quantum technology – to build a future we can all trust, which is vital to the development of our societies. The company provides solutions, services and products that help its customers – businesses, organisations and states – in the defence, aeronautics, space, transportation and digital identity and security markets to fulfil their critical missions, by placing humans at the heart of the decision-making process.