Senior Site Reliability Engineer (80–100%) Germany or Switzerland

Posted Yesterday
Be an Early Applicant
Hiring Remotely in Düsseldorf, Nordrhein-Westfalen, DEU
In-Office or Remote
Senior level
Software • Cybersecurity
The Role
Lead design and implementation of automation, self-service APIs, and reliability tooling (primarily in Go) for the Mission Control platform. Define and operate SLIs/SLOs, participate in incident response and on-call, own reliability projects end-to-end, and collaborate with engineering and AI teams to reduce operational toil and improve scalability and observability.
Summary Generated by Built In

Senior Site Reliability Engineer (80–100%)
Germany or Switzerland


Your Mission

We are seeking a highly motivated and talented Senior Site Reliability Engineer to join our growing team. As an SRE, you will be a key driver in building automation, reducing operational toil, and continuously improving how our services are operated, while ensuring reliability, scalability, and accurate SLA measurement.

You will combine strong software engineering and system architecture skills with an operations mindset to improve how our services are built and run at scale. Much of your work centers on the Mission Control (MC) platform — the operational backbone our engineers rely on to run customer services in production, and our point of interaction with customers, running 24/7 around the globe. To improve Mission Control, we build services such as an automation framework, a self-service platform, and a service-level monitoring system that eliminate toil and repetitive tasks, enable customers, and make operations more reliable and efficient. Your responsibilities will include:

  • Building Operational Automation: Design, build, and evolve the automation framework and tooling that powers the MC platform — primarily in Golang — with a strong focus on maintainability, scalability, and reliability. Beyond the framework itself, you will build automation workflows on top of the platform that reduce operational toil and improve day-to-day efficiency for the engineers and operators who run our services.
  • Developing Self-Service APIs: Build and maintain the service APIs and self-service operational tooling of the MC platform that enable customers and teams to safely and efficiently operate services in production without manual intervention.
  • Applying Site Reliability Engineering (SRE) Principles: Define, implement, and continuously improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and SLA measurements so that reliability is measurable and actionable across the MC platform and the services it operates.
  • Owning Reliability and Operations Initiatives: Take ownership of reliability, automation, and Mission Control projects, driving them independently from problem identification through implementation and long-term operation.
  • Collaborating with AI Engineering: Work closely with our AI team and tooling, integrating AI-assisted capabilities into our automation and operational workflows.
  • Incident Response and Learning: Participate in incident response and the on-call rotation, leading root cause analysis and driving sustainable corrective and preventive actions.

This position will give you the opportunity to lead midsize to large automation and reliability initiatives, from early concept and design through production deployment and ongoing operations. You will collaborate closely with software engineers, platform teams, and product owners to identify and implement improvements across our services and operational practices. As part of the role, you will spend at least one day per week working within Mission Control, staying close to the day-to-day reality of how our services are run.

Your Qualifications

You are a senior engineer with a strong foundation in software development and system architecture, combined with a clear interest in operations and reliability. You are comfortable taking ownership, learning quickly, and driving change without needing detailed direction.

Required skills and qualifications:

  • Strong software engineering background, ideally with production-grade Go (Golang) experience — proficiency in another modern language is fine if you are a quick learner and willing to adapt, as Go is our primary language.
  • Solid understanding of distributed systems and scalable architecture.
  • Proven experience designing, building, and operating services and their APIs (e.g. REST, gRPC) in production.
  • Experience operating production systems, including incident response, on-call, and root cause analysis.
  • Experience with SRE, DevOps, platform engineering, or reliability-focused roles.
  • Hands-on experience with infrastructure and operations tooling, such as:
    • Kubernetes
    • Terraform / Infrastructure as Code
    • GitOps principles and CI/CD tooling
    • Prometheus, Loki, Tempo, and modern observability stacks
    • Major cloud platforms (AWS, Azure, GCP)
  • Knowledge of Linux system administration, networking concepts, and major Internet protocols (TCP/IP, IPsec, SSL, SSH, SMTP, HTTPS, DNS).
  • Ability to think in terms of systems, failure modes, and trade-offs.
  • Strong communication skills and the ability to build trust across engineering and operations teams.
  • A proactive mindset: you see problems, propose solutions, and take responsibility for delivering them.
  • University degree in Computer Science or similar educational level.

Bonus skills and qualifications:

  • Experience building internal developer platforms or automations
  • Interest in or experience with integrating AI/LLM-assisted capabilities into automation or operational tooling
  • Background in networking or security operations
  • Experience defining and operating SLO-based reliability programs at scale

What We Offer

Want to join a team that enjoys making secure connectivity simple for our customers? You’ll be among people who believe in: 

    • Caring PASSIONATELY about keeping our customers safe – We’re dedicated to solving problems. Whatever it takes. 
    • Thinking UNCONVENTIONALLY to stay ahead – The world never fails to surprise us. So let’s surprise it first. 
    • Doing the hard work to make things SIMPLE – Craft and hone something that delights in its simplicity. 
    • Working COLLABORATIVELY to build success – The power of the team will always make us faster and better. 

As a testament to this, Open Systems has been recognized as an outstanding place to work. You’ll be surrounded by smart teams who enrich your experience and provide opportunities you will need to develop your skills and advance your career.  

We look forward to receiving your online application (please note that you need to compress your application into two attachments). Only direct applications will be considered.

Come as you are!  We are looking for amazing people of diverse backgrounds, experiences, abilities, and perspectives. Open Systems welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, disability, or veteran status.

About Open Systems

Open Systems is about secure connectivity made easy, combining SD-WAN, Firewall, SWG, CASB, and ZTNA into a framework that supports secure connectivity across cloud and hybrid environments and locations. Open Systems Managed SASE provides a comprehensive SASE solution through an easy-to-use customer portal, underpinned with a unified data platform to drive future innovation, all delivered as a 24x7 managed service.


Skills Required

  • Production-grade Go (Golang) experience or strong proficiency in another modern language with willingness to adopt Go
  • Solid understanding of distributed systems and scalable architecture
  • Proven experience designing, building, and operating services and their APIs (REST, gRPC) in production
  • Experience operating production systems, including incident response, on-call, and root cause analysis
  • Experience in SRE, DevOps, platform engineering, or other reliability-focused roles
  • Hands-on experience with Kubernetes
  • Hands-on experience with Terraform or other Infrastructure as Code
  • Familiarity with GitOps principles and CI/CD tooling
  • Experience with Prometheus, Loki, Tempo or modern observability stacks
  • Experience with major cloud platforms (AWS, Azure, GCP)
  • Knowledge of Linux system administration, networking concepts, and Internet protocols (TCP/IP, IPsec, SSL/TLS, SSH, SMTP, HTTPS, DNS)
  • Strong communication skills and ability to collaborate across teams
  • Proactive mindset and ownership of projects from identification through long-term operation
  • University degree in Computer Science or similar educational level
  • Experience building internal developer platforms or automations
  • Interest in or experience integrating AI/LLM-assisted capabilities into automation or operational tooling
  • Background in networking or security operations
  • Experience defining and operating SLO-based reliability programs at scale
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Zürich
421 Employees
Year Founded: 1990

What We Do

Backed by the Service Experience Promise, Open Systems simply and cost-effectively connects and secures hybrid environments and thus ensures your organization can meet business objectives. Open Systems uniquely focuses on a superior user experience when helping organizations reduce risk, improve efficiency, and accelerate innovation. The Open Systems SASE Experience delivers on the promise of ZTNA with a comprehensive, unified and easy-to-implement and use SASE platform that combines SD-WAN and Security Service Edge delivered as a Service. We provide 24x7 operational management and engineering support from assigned engineering teams and ensure affordable and predictable costs.

Similar Jobs

Zscaler Logo Zscaler

Regional Director - Enterprise

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Germany
8697 Employees
100K-143K Annually

Datadog Logo Datadog

Staff Software Engineer

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
5 Locations
6500 Employees

CrowdStrike Logo CrowdStrike

Regional Sales Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
Germany
10000 Employees

Tulip Logo Tulip

Forward Deployed Engineer - EMEA

Enterprise Web • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
27 Locations
310 Employees
70K-105K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account