Senior Site Reliability Engineer (m/f/d)

Posted Yesterday
Be an Early Applicant
28 Locations
In-Office or Remote
Senior level
Software
The Role
Lead architecture and reliability for cloud infrastructure and Kubernetes, improve observability and IaC, drive resilience and incident response, mentor engineers, and shape the platform roadmap.
Summary Generated by Built In
Empower every employee.Our mission is to be the world's most used AI employee experience platform by changing the way frontline employees work.

Flip is the leading AI-powered employee experience platform for frontline workers. We're transforming how the people who keep the world running — in retail, manufacturing, and logistics — do their jobs. One app. One touch. Everything they need.

Our mission: Connect every employee to everything they need in one touch.


Job Teaser

As a Senior Site Reliability Engineer in our Platform Squad, you'll own critical reliability domains end-to-end and drive the technical direction within the squad - leading architectural decisions on our platform, mentoring teammates, and continuously raising the reliability bar inside the team.

This role is for an engineer with a proven track record of building and operating high-throughput, highly available systems, who wants senior-level technical ownership and real impact through deep engineering work inside a tight, well-scoped team.

What awaits you with us
  • Co-own the architecture: Help drive the architecture and evolution of our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
  • Drive the resilience strategy: Define how we approach global scaling, zero-downtime deployments, rollback mechanisms and disaster recovery, and make sure the platform stays available around the clock.
  • Evolve our observability stack: Improve our LGTM stack (Loki, Grafana, Tempo, Mimir) into a foundation our engineers can trust.
  • Improve our IaC Platform: Eliminate toil at the source, and make our infrastructure truly self-service for engineering teams.
  • Lead in incidents: Take a leading role in platform-related major incidents, drive blameless post-mortems for the squad, and translate findings into systemic improvements.
  • Mentor within the squad: Coach teammates, run RFCs and design reviews inside the team, and help engineers grow into stronger SREs.
  • Shape our roadmap: Partner with your squad to define the platform's direction.
What you bring to the table

We're looking for a hands-on, SaaS-minded senior Site Reliability Engineer who treats scalability and reliability as a first-class product concern.

Must-Have Qualifications
  • 5+ years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
  • Proven track record building and operating high-throughput, highly available systems in production.
  • Deep, production-level experience with Kubernetes on any Hyperscaler.
  • Strong experience with modern observability stacks (e.g. Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK) and a clear point of view on SLIs, SLOs and error budgets.
  • Solid software development skills in Go (strongly preferred, since our IaC runs on Pulumi in Go) or Python.
  • Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g. ArgoCD) + CI/CD pipeline design.
  • Demonstrated ability to lead complex infrastructure initiatives from design to production - including writing RFCs and driving architecture decisions within your team.
  • Experience mentoring engineers and raising the technical bar within a team.
  • Comfortable owning major incidents end-to-end and turning learnings into systemic change.
  • Strong communication skills and business-fluent English.
  • Willingness to participate in on-call rotations to ensure the reliability of our platform.
Nice-to-Have Qualifications
  • Rolled out production-ready API-Gateways with Gateway API (e.g. Envoy Gateway).
  • Operated multi-cluster service meshes (e.g. Cilium, Linkerd, Istio)
  • Deployed and maintained Kubernetes Operators (e.g. Strimzi, CNPG).
  • Operated highly available PostgreSQL in production.

What we offer you
  • Work mode: We’re remote-first, giving you flexibility to work from home. At the same time, we deeply value the power of in-person collaboration. Depending on the role, you’ll join occasional team events, workshops, or meetings in our Berlin or Stuttgart offices - always with plenty of notice. The exact balance will be discussed during your interview.
  • Work-Life-Balance: We don't want you to grow roots to your desk chair. That's why we cover the costs of your E-Gym-Wellpass membership and offer job bike leasing.
  • Celebrating success: Expect highly motivated and committed people in a relaxed working atmosphere.
  • Be part of something bigger: You actively shape Flip in your role. Along the way, you are an enabler of the rapid growth process of a young tech company and grow towards your goals, fun is guaranteed.
  • Happy to be a Flipster: Stay tuned for regular team events and culture days that bring us together as Flipsters.
  • Working abroad: At Flip you can also work abroad in the European Union. Let's talk about remote work in the interview.

At Flip, everyone is welcome - no matter what gender you identify as or how old you are. Sexual identity, origin, religion, world view and disabilities do not influence your potential job at Flip. The most important thing is that YOU fit in!


Skills Required

  • 5+ years hands-on experience as an SRE, Platform, DevOps, Infrastructure, Cloud, or Backend engineer with strong infrastructure focus
  • Proven track record building and operating high-throughput, highly available systems in production
  • Deep, production-level experience with Kubernetes on any hyperscaler
  • Strong experience with modern observability stacks (Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK) and SLIs/SLOs/error budgets
  • Solid software development skills in Go or Python (Go strongly preferred)
  • Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g., ArgoCD) plus CI/CD pipeline design
  • Demonstrated ability to lead complex infrastructure initiatives from design to production, including writing RFCs and driving architecture decisions
  • Experience mentoring engineers and raising the technical bar within a team
  • Comfortable owning major incidents end-to-end and translating learnings into systemic improvements
  • Strong communication skills and business-fluent English
  • Willingness to participate in on-call rotations
  • Experience with Azure cloud infrastructure
  • Rolled out production-ready API-Gateways with Gateway API (e.g., Envoy Gateway)
  • Operated multi-cluster service meshes (e.g., Cilium, Linkerd, Istio)
  • Deployed and maintained Kubernetes Operators (e.g., Strimzi, CNPG)
  • Operated highly available PostgreSQL in production
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Stuttgart
164 Employees
Year Founded: 2018

What We Do

Flip is the frontline employee app that brilliant businesses use to reach and rally their people — from the field to the floor, and door-to-door. It instantly connects every employee with relevant news and knowledge, and makes everyday tasks like shift planning and time tracking a breeze. Whether they’re 16 or 60, a burger flipper or a warehouse warrior, in Manchester or in Mumbai; Flip brings everyone together in one space, via an intuitive little app that top brands like Bosch, EDEKA, and MAHLE “can’t live without.”

Similar Jobs

GitLab Logo GitLab

Marketing Manager

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
27 Locations
2500 Employees

GitLab Logo GitLab

Security Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
30 Locations
2500 Employees

GitLab Logo GitLab

Senior Back-end Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
31 Locations
2500 Employees
118K-252K Annually

GitLab Logo GitLab

Business Development Representative

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
28 Locations
2500 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account