Open Systems

Senior Site Reliability Engineer (80–100%) Switzerland or Germany

Reposted 5 Days Ago

Be an Early Applicant

2 Locations

In-Office or Remote

Senior level

Software • Cybersecurity

The Role

Lead design and implementation of automation, self-service APIs, and reliability tooling (primarily in Go) for the Mission Control platform. Define and operate SLIs/SLOs, participate in incident response and on-call, own reliability projects end-to-end, and collaborate with engineering and AI teams to reduce operational toil and improve scalability and observability.

Summary Generated by Built In

Senior Site Reliability Engineer (80–100%)
Switzerland or Germany

Your Mission

We are seeking a highly motivated and talented Senior Site Reliability Engineer to join our growing team. As an SRE, you will be a key driver in building automation, reducing operational toil, and continuously improving how our services are operated, while ensuring reliability, scalability, and accurate SLA measurement.

You will combine strong software engineering and system architecture skills with an operations mindset to improve how our services are built and run at scale. Much of your work centers on the Mission Control (MC) platform — the operational backbone our engineers rely on to run customer services in production, and our point of interaction with customers, running 24/7 around the globe. To improve Mission Control, we build services such as an automation framework, a self-service platform, and a service-level monitoring system that eliminate toil and repetitive tasks, enable customers, and make operations more reliable and efficient. Your responsibilities will include:

Building Operational Automation: Design, build, and evolve the automation framework and tooling that powers the MC platform — primarily in Golang — with a strong focus on maintainability, scalability, and reliability. Beyond the framework itself, you will build automation workflows on top of the platform that reduce operational toil and improve day-to-day efficiency for the engineers and operators who run our services.
Developing Self-Service APIs: Build and maintain the service APIs and self-service operational tooling of the MC platform that enable customers and teams to safely and efficiently operate services in production without manual intervention.
Applying Site Reliability Engineering (SRE) Principles: Define, implement, and continuously improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and SLA measurements so that reliability is measurable and actionable across the MC platform and the services it operates.
Owning Reliability and Operations Initiatives: Take ownership of reliability, automation, and Mission Control projects, driving them independently from problem identification through implementation and long-term operation.
Collaborating with AI Engineering: Work closely with our AI team and tooling, integrating AI-assisted capabilities into our automation and operational workflows.
Incident Response and Learning: Participate in incident response and the on-call rotation, leading root cause analysis and driving sustainable corrective and preventive actions.

This position will give you the opportunity to lead midsize to large automation and reliability initiatives, from early concept and design through production deployment and ongoing operations. You will collaborate closely with software engineers, platform teams, and product owners to identify and implement improvements across our services and operational practices. As part of the role, you will spend at least one day per week working within Mission Control, staying close to the day-to-day reality of how our services are run.

Your Qualifications

You are a senior engineer with a strong foundation in software development and system architecture, combined with a clear interest in operations and reliability. You are comfortable taking ownership, learning quickly, and driving change without needing detailed direction.

Required skills and qualifications:

Strong software engineering background, ideally with production-grade Go (Golang) experience — proficiency in another modern language is fine if you are a quick learner and willing to adapt, as Go is our primary language.
Solid understanding of distributed systems and scalable architecture.
Proven experience designing, building, and operating services and their APIs (e.g. REST, gRPC) in production.
Experience operating production systems, including incident response, on-call, and root cause analysis.
Experience with SRE, DevOps, platform engineering, or reliability-focused roles.
Hands-on experience with infrastructure and operations tooling, such as:
- Kubernetes
- Terraform / Infrastructure as Code
- GitOps principles and CI/CD tooling
- Prometheus, Loki, Tempo, and modern observability stacks
- Major cloud platforms (AWS, Azure, GCP)
Knowledge of Linux system administration, networking concepts, and major Internet protocols (TCP/IP, IPsec, SSL, SSH, SMTP, HTTPS, DNS).
Ability to think in terms of systems, failure modes, and trade-offs.
Strong communication skills and the ability to build trust across engineering and operations teams.
A proactive mindset: you see problems, propose solutions, and take responsibility for delivering them.
University degree in Computer Science or similar educational level.

Bonus skills and qualifications:

Experience building internal developer platforms or automations
Interest in or experience with integrating AI/LLM-assisted capabilities into automation or operational tooling
Background in networking or security operations
Experience defining and operating SLO-based reliability programs at scale

What We Offer

Want to join a team that enjoys making secure connectivity simple for our customers? You’ll be among people who believe in:

- Caring PASSIONATELY about keeping our customers safe – We’re dedicated to solving problems. Whatever it takes.
- Thinking UNCONVENTIONALLY to stay ahead – The world never fails to surprise us. So let’s surprise it first.
- Doing the hard work to make things SIMPLE – Craft and hone something that delights in its simplicity.
- Working COLLABORATIVELY to build success – The power of the team will always make us faster and better.

As a testament to this, Open Systems has been recognized as an outstanding place to work. You’ll be surrounded by smart teams who enrich your experience and provide opportunities you will need to develop your skills and advance your career. 

We look forward to receiving your online application (please note that you need to compress your application into two attachments). Only direct applications will be considered.

Come as you are! We are looking for amazing people of diverse backgrounds, experiences, abilities, and perspectives. Open Systems welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, disability, or veteran status.

About Open Systems

Open Systems is about secure connectivity made easy, combining SD-WAN, Firewall, SWG, CASB, and ZTNA into a framework that supports secure connectivity across cloud and hybrid environments and locations. Open Systems Managed SASE provides a comprehensive SASE solution through an easy-to-use customer portal, underpinned with a unified data platform to drive future innovation, all delivered as a 24x7 managed service.

Skills Required

Production-grade Go (Golang) experience or strong proficiency in another modern language with willingness to adopt Go
Solid understanding of distributed systems and scalable architecture
Proven experience designing, building, and operating services and their APIs (REST, gRPC) in production
Experience operating production systems, including incident response, on-call, and root cause analysis
Experience in SRE, DevOps, platform engineering, or other reliability-focused roles
Hands-on experience with Kubernetes
Hands-on experience with Terraform or other Infrastructure as Code
Familiarity with GitOps principles and CI/CD tooling
Experience with Prometheus, Loki, Tempo or modern observability stacks
Experience with major cloud platforms (AWS, Azure, GCP)
Knowledge of Linux system administration, networking concepts, and Internet protocols (TCP/IP, IPsec, SSL/TLS, SSH, SMTP, HTTPS, DNS)
Strong communication skills and ability to collaborate across teams
Proactive mindset and ownership of projects from identification through long-term operation
University degree in Computer Science or similar educational level
Experience building internal developer platforms or automations
Interest in or experience integrating AI/LLM-assisted capabilities into automation or operational tooling
Background in networking or security operations
Experience defining and operating SLO-based reliability programs at scale

View all jobs at Open Systems

View Open Systems Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Zürich

421 Employees

Year Founded: 1990

What We Do

Backed by the Service Experience Promise, Open Systems simply and cost-effectively connects and secures hybrid environments and thus ensures your organization can meet business objectives. Open Systems uniquely focuses on a superior user experience when helping organizations reduce risk, improve efficiency, and accelerate innovation. The Open Systems SASE Experience delivers on the promise of ZTNA with a comprehensive, unified and easy-to-implement and use SASE platform that combines SD-WAN and Security Service Edge delivered as a Service. We provide 24x7 operational management and engineering support from assigned engineering teams and ensure affordable and predictable costs.