ATOSS Software SE Jobs

Senior Observability Engineer m/f/d

ATOSS Software SE

Senior Observability Engineer m/f/d

Reposted 7 Days Ago

Be an Early Applicant

2 Locations

In-Office

Senior level

Software

The Role

The Senior Observability Engineer will design, build and operate the observability platform, ensuring reliability and improving monitoring and alerting capabilities. Responsibilities include managing observability solutions, collaborating with R&D teams, and evaluating modern observability technologies.

Summary Generated by Built In

About us

ATOSS Software SE is one of Germany’s most successful tech growth stories. As the market leader in Workforce Management Software, we help companies work more intelligently, creatively, and humanely optimizing the balance between profitability and people.

We’re a rare company: according to Handelsblatt (10/24), just 309 public companies worldwide achieved over 20% return on sales for ten consecutive years. Only two are based in Germany and ATOSS is one of them.

With 19 years of record-breaking growth, over €2 billion market cap, and listings in SDAX and TecDAX, we’re scaling globally and we’re growing.

If you’re ready to drive impact in a high-performing B2B SaaS environment, this is your chance to elevate your career.

The Person You are

At ATOSS, we hire for both character and skill, seeking individuals who embody resilience, a pioneering spirit, and the passion to grow.

We value those who:
Think like entrepreneurs – taking ownership, pushing boundaries, and driving impact.
Challenge the status quo – bringing fresh ideas and bold execution to the table.
Thrive in change – seeing growth as a lifelong journey, both professionally and personally.

The Role

As a Senior Observability Platform Engineer, you are a key member of our observability team.
You design, build, and operate our observability platform (Grafana, Loki, Tempo, Prometheus/Mimir on Kubernetes) and enable product, platform, and AI teams to monitor and improve the services they own.
This is a hands-on engineering role focused on platform reliability, standards, automation, and cross-team collaboration. You will work closely with other platform functions and product lines and help shape our observability strategy and roadmap.

Key Responsibilities

Product ownership & roadmap

Analyze and capture requirements across product lines and support stakeholders.
Translate requirements into clear priorities, user stories, and acceptance criteria.
Communicate progress and upcoming roadmap items to stakeholders through demos, updates, and KPI reporting.
Collaborate with R&D leads and architecture to align the observability roadmap with broader platform and product goals.

Platform engineering & operations

Operate, scale, and upgrade the central observability stack (Grafana, Loki, Tempo, Prometheus/Mimir on Kubernetes) across multiple environments and cloud providers.
Automate routine operations (provisioning, configuration, housekeeping, capacity checks) to reduce manual work and improve reliability.
Evaluate and adopt modern observability technologies (OpenTelemetry, distributed tracing, anomaly detection, AI-assisted insights) that fit the overall platform architecture.

Standards, instrumentation & enablement

Define and evolve standards for metrics, logs, traces, events, dashboards, alerts, retention, and labeling.
Provide reusable templates, reference dashboards, and alert patterns so teams can efficiently build and maintain their own observability content.
Build and improve self-service capabilities (data sources, folder structures, onboarding flows, RBAC patterns) so teams can use the platform without heavy manual support.
Enable and advise teams on instrumentation and observability design.

Reliability, KPIs & incident support

Ensure the observability platform itself is reliable, performant, and cost-efficient.
Define and track KPIs/SLIs for the observability platform (availability, performance, cost, adoption) and continuously improve them.
Support incident detection and response by ensuring the right signals (metrics, logs, traces) and views are available to teams.
Collaborate after major incidents to identify observability gaps and feed improvements back into standards, templates, and the roadmap.
Support KPI measurement capabilities across teams (incident detection, response efficiency, observability coverage).

Security, compliance & cost management

Ensure privacy, compliance, and security requirements are met for observability data (RBAC, tenant isolation, least-privilege access, data minimization).
Define guardrails for data volume, cardinality, and retention to keep the platform performant and cost-effective.
Work with Security, Compliance, and Data Protection to align telemetry practices with regulatory and contractual requirements.

AI observability

Operate and evolve AI observability components as part of the central observability platform.
Define integration patterns, standards, and example configurations so AI team can instrument their models, prompts, and pipelines in a consistent way.
Ensure AI observability tooling is reliable, secure, and cost-efficient, and integrates well with the rest of the observability stack.
Enable AI/R&D teams through guidance and templates, while they remain responsible for defining their own metrics, dashboards, alerts, and evaluations.

Use of AI to increase efficiency

Apply AI features on top of observability data (where appropriate and technically feasible) to reduce manual work.
Evaluate and introduce AI-assisted diagnostics as optional helpers for incident and operations teams.
Collaborate with Security, Compliance, and Data Protection when using AI on observability data, ensuring governance, access control, and data protection requirements are met.

Key Requirements

Background as an Observability / SRE / Platform / Infrastructure Engineer in cloud-native environments.
Deep understanding of metrics, logs, traces, and alerting for distributed systems.
Familiarity with Kubernetes, microservices, and modern instrumentation (including OpenTelemetry).
Strong experience with Prometheus and the Grafana stack (Grafana, Loki, Tempo, Prometheus/Mimir) in production, at scale; exposure to Langfuse, Clickhouse is a strong plus.
Experience with observability in at least one major cloud hyper-scalers and their native services
Proven experience designing and rolling out observability solutions used by multiple teams, including standards, templates, and best pracices.
Ability to work closely with development/engineering teams, understand their needs, and turn them into platform features, standards, and guidance.
Strong stakeholder communication skills across engineering and operations, with a pragmatic, results-focused mindset.
Experience operating a multi-tenant observability platform in a SaaS context.
Knowledge of regulatory and compliance requirements affecting telemetry and logging.

Our Benefits

Competitive Rewards: Including profit-sharing and employee stock program.
Structured Onboarding & Continuous Leadership Development: Clear career paths onboarding through Expert & Leadership Tracks, plus access to ATOSS Academy.
Flexible Work Culture: Hybrid options (remote within the EU), 30 days of vacation, and a strong commitment to diversity & inclusion.
Engaging Team Environment: Seasonal company events, team retreats, and an in-house barista.
Health & Wellbeing: Including regular check-ups, corporate wellness programs, and Wellhub membership.
Stability & Growth: Company listed on SDAX & TecDAX, with 19+ years of record-breaking revenue and a 30%+ EBIT margin. Certified Top Employer© for the 5th year in a row.

At ATOSS, great talent knows no limits. We welcome professionals from all backgrounds and empower their growth through an inclusive, skill focused environment.

Join us and be part of a high-growth, future-focused company!

Skills Required

Background as an Observability/SRE/Platform/Infrastructure Engineer
Deep understanding of metrics, logs, traces, and alerting for distributed systems
Familiarity with Kubernetes and microservices
Strong experience with Prometheus and the Grafana stack in production
Experience with observability in at least one major cloud hyper-scaler
Proven experience designing and rolling out observability solutions
Ability to work closely with development/engineering teams
Strong stakeholder communication skills across engineering and operations
Experience operating a multi-tenant observability platform in a SaaS context
Knowledge of regulatory and compliance requirements affecting telemetry and logging

View all jobs at ATOSS Software SE

View ATOSS Software SE Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Munich

769 Employees

What We Do

At ATOSS we are shaping working environments to the benefit of companies, employees and society. We are paving the way to working environments that are more creative, more intelligent and more humane. At the same time, we are revolutionizing the interaction of cost efficiency and humanity. This vision of a human economy drives and motivates us. Since 35 years, ATOSS has ranked as a trendsetter and key player in the workforce management market. Every day, ATOSS solutions are making significant contributions towards higher value creation and greater competitive strengths for more than 15,000 customers. We are enabling the implementation of employee-oriented working time concepts, thereby ensuring greater job satisfaction – meanwhile in over 50 countries worldwide. Our customers include companies such as Deutsche Bahn, Douglas, EDEKA, HORNBACH, Lufthansa, Sixt SE, thyssenkrupp Packaging Steel or W. L. Gore & Associates. #HumanEconomy Visit our website www.atoss.com Imprint and Data Privacy Policy: https://www.atoss.com/en/imprint https://www.atoss.com/en/data-protection-agreement