Senior Observability Engineer m/f/d

Reposted 7 Days Ago
Be an Early Applicant
2 Locations
In-Office
Senior level
Software
The Role
The Senior Observability Engineer will design, build and operate the observability platform, ensuring reliability and improving monitoring and alerting capabilities. Responsibilities include managing observability solutions, collaborating with R&D teams, and evaluating modern observability technologies.
Summary Generated by Built In

About us 

ATOSS Software SE is one of Germany’s most successful tech growth stories. As the market leader in Workforce Management Software, we help companies work more intelligently, creatively, and humanely optimizing the balance between profitability and people. 

We’re a rare company: according to Handelsblatt (10/24), just 309 public companies worldwide achieved over 20% return on sales for ten consecutive years. Only two are based in Germany and ATOSS is one of them. 

With 19 years of record-breaking growth, over €2 billion market cap, and listings in SDAX and TecDAX, we’re scaling globally and we’re growing. 

If you’re ready to drive impact in a high-performing B2B SaaS environment, this is your chance to elevate your career.

The Person You are  

At ATOSS, we hire for both character and skill, seeking individuals who embody resilience, a pioneering spirit, and the passion to grow.  

We value those who:  
Think like entrepreneurs – taking ownership, pushing boundaries, and driving impact.  
Challenge the status quo – bringing fresh ideas and bold execution to the table.  
Thrive in change – seeing growth as a lifelong journey, both professionally and personally. 


The Role

As a Senior Observability Platform Engineer, you are a key member of our observability team.
You design, build, and operate our observability platform (Grafana, Loki, Tempo, Prometheus/Mimir on Kubernetes) and enable product, platform, and AI teams to monitor and improve the services they own.
This is a hands-on engineering role focused on platform reliability, standards, automation, and cross-team collaboration. You will work closely with other platform functions and product lines and help shape our observability strategy and roadmap.

Key Responsibilities

Product ownership & roadmap

  • Analyze and capture requirements across product lines and support stakeholders.
  • Translate requirements into clear priorities, user stories, and acceptance criteria.
  • Communicate progress and upcoming roadmap items to stakeholders through demos, updates, and KPI reporting.
  • Collaborate with R&D leads and architecture to align the observability roadmap with broader platform and product goals.

Platform engineering & operations

  • Operate, scale, and upgrade the central observability stack (Grafana, Loki, Tempo, Prometheus/Mimir on Kubernetes) across multiple environments and cloud providers.
  • Automate routine operations (provisioning, configuration, housekeeping, capacity checks) to reduce manual work and improve reliability.
  • Evaluate and adopt modern observability technologies (OpenTelemetry, distributed tracing, anomaly detection, AI-assisted insights) that fit the overall platform architecture.

Standards, instrumentation & enablement

  • Define and evolve standards for metrics, logs, traces, events, dashboards, alerts, retention, and labeling.
  • Provide reusable templates, reference dashboards, and alert patterns so teams can efficiently build and maintain their own observability content.
  • Build and improve self-service capabilities (data sources, folder structures, onboarding flows, RBAC patterns) so teams can use the platform without heavy manual support.
  • Enable and advise teams on instrumentation and observability design.

Reliability, KPIs & incident support

  • Ensure the observability platform itself is reliable, performant, and cost-efficient.
  • Define and track KPIs/SLIs for the observability platform (availability, performance, cost, adoption) and continuously improve them.
  • Support incident detection and response by ensuring the right signals (metrics, logs, traces) and views are available to teams.
  • Collaborate after major incidents to identify observability gaps and feed improvements back into standards, templates, and the roadmap.
  • Support KPI measurement capabilities across teams (incident detection, response efficiency, observability coverage).

Security, compliance & cost management

  • Ensure privacy, compliance, and security requirements are met for observability data (RBAC, tenant isolation, least-privilege access, data minimization).
  • Define guardrails for data volume, cardinality, and retention to keep the platform performant and cost-effective.
  • Work with Security, Compliance, and Data Protection to align telemetry practices with regulatory and contractual requirements.

AI observability

  • Operate and evolve AI observability components as part of the central observability platform.
  • Define integration patterns, standards, and example configurations so AI team can instrument their models, prompts, and pipelines in a consistent way.
  • Ensure AI observability tooling is reliable, secure, and cost-efficient, and integrates well with the rest of the observability stack.
  • Enable AI/R&D teams through guidance and templates, while they remain responsible for defining their own metrics, dashboards, alerts, and evaluations.

Use of AI to increase efficiency

  • Apply AI features on top of observability data (where appropriate and technically feasible) to reduce manual work.
  • Evaluate and introduce AI-assisted diagnostics as optional helpers for incident and operations teams.
  • Collaborate with Security, Compliance, and Data Protection when using AI on observability data, ensuring governance, access control, and data protection requirements are met.


Key Requirements

  • Background as an Observability / SRE / Platform / Infrastructure Engineer in cloud-native environments.
  • Deep understanding of metrics, logs, traces, and alerting for distributed systems.
  • Familiarity with Kubernetes, microservices, and modern instrumentation (including OpenTelemetry).
  • Strong experience with Prometheus and the Grafana stack (Grafana, Loki, Tempo, Prometheus/Mimir) in production, at scale; exposure to Langfuse, Clickhouse is a strong plus.
  • Experience with observability in at least one major cloud hyper-scalers and their native services
  • Proven experience designing and rolling out observability solutions used by multiple teams, including standards, templates, and best pracices.
  • Ability to work closely with development/engineering teams, understand their needs, and turn them into platform features, standards, and guidance.
  • Strong stakeholder communication skills across engineering and operations, with a pragmatic, results-focused mindset.
  • Experience operating a multi-tenant observability platform in a SaaS context.
  • Knowledge of regulatory and compliance requirements affecting telemetry and logging.

Our Benefits   

  • Competitive Rewards: Including profit-sharing and employee stock program.  
  • Structured Onboarding & Continuous Leadership Development: Clear career paths onboarding through Expert & Leadership Tracks, plus access to ATOSS Academy.  
  • Flexible Work Culture: Hybrid options (remote within the EU), 30 days of vacation, and a strong commitment to diversity & inclusion.  
  • Engaging Team Environment: Seasonal company events, team retreats, and an in-house barista.  
  • Health & Wellbeing: Including regular check-ups, corporate wellness programs, and Wellhub membership.  
  • Stability & Growth: Company listed on SDAX & TecDAX, with 19+ years of record-breaking revenue and a 30%+ EBIT margin. Certified Top Employer© for the 5th year in a row.  


At ATOSS, great talent knows no limits. We welcome professionals from all backgrounds and empower their growth through an inclusive, skill focused environment. 

Join us and be part of a high-growth, future-focused company! 

Skills Required

  • Background as an Observability/SRE/Platform/Infrastructure Engineer
  • Deep understanding of metrics, logs, traces, and alerting for distributed systems
  • Familiarity with Kubernetes and microservices
  • Strong experience with Prometheus and the Grafana stack in production
  • Experience with observability in at least one major cloud hyper-scaler
  • Proven experience designing and rolling out observability solutions
  • Ability to work closely with development/engineering teams
  • Strong stakeholder communication skills across engineering and operations
  • Experience operating a multi-tenant observability platform in a SaaS context
  • Knowledge of regulatory and compliance requirements affecting telemetry and logging
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Munich
769 Employees

What We Do

At ATOSS we are shaping working environments to the benefit of companies, employees and society. We are paving the way to working environments that are more creative, more intelligent and more humane. At the same time, we are revolutionizing the interaction of cost efficiency and humanity. This vision of a human economy drives and motivates us. Since 35 years, ATOSS has ranked as a trendsetter and key player in the workforce management market. Every day, ATOSS solutions are making significant contributions towards higher value creation and greater competitive strengths for more than 15,000 customers. We are enabling the implementation of employee-oriented working time concepts, thereby ensuring greater job satisfaction – meanwhile in over 50 countries worldwide. Our customers include companies such as Deutsche Bahn, Douglas, EDEKA, HORNBACH, Lufthansa, Sixt SE, thyssenkrupp Packaging Steel or W. L. Gore & Associates. #HumanEconomy Visit our website www.atoss.com Imprint and Data Privacy Policy: https://www.atoss.com/en/imprint https://www.atoss.com/en/data-protection-agreement

Similar Jobs

Magna International Logo Magna International

Werkstudent Operations (m/w/d)

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Neuburg an der Donau, Bayern, DEU
171000 Employees

Samsara Logo Samsara

Sales Manager

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Germany
4000 Employees

Samsara Logo Samsara

Account Executive

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Germany
4000 Employees
10K-150K Annually

Magna International Logo Magna International

Human Resources Generalist

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Markt Schwaben, Bayern, DEU
171000 Employees

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account