Senior Site Reliability Engineer & Incident-Manager (m/f/d)

Reposted 10 Days Ago
Be an Early Applicant
Berlin
In-Office
Senior level
Cloud • Information Technology • Internet of Things
The Role
Drive incident management, improve observability, and support platform engineering. Collaborate with teams to ensure responsive and resilient services on AWS.
Summary Generated by Built In

Your Role

Are you passionate about observability and resiliency? Is ensuring we know about issues before our customers second nature to you? Is being at the front and orchestrating processes sounds fun to you? emnify is seeking a talented Reliability Engineer & Incident Management Operator to drive the company Incident Management routines, be the authority for everything observability and resiliency, and guide internal stakeholders with best practices.

As a part of the larger Engineering department, our Platform team plays a crucial role in enhancing our competitive edge by improving developer experience to increase development efficiency and scale productivity. You will join a team of 3 engineers, fostering empathy and a collaboration mindset to ensure continuous improvement of development experience at emnify. The ideal candidate will have extensive experience with AWS cloud infrastructure, microservices, and modern observability practices as well as strong communication and organizational skills.

The position is 35% Incident management operations, 35% Observability and monitoring work, and 30% platform engineering and developer support.

Emnify technology radar

The position is based in emnify’s office in Berlin.


Your Impact:

  • Incident management operations:

Lead and optimize the incident management process end-to-end, ensuring timely detection, resolution, and documentation of incidents; coordinating cross-functional teams, conducting post-mortems and root cause analyses, and driving continuous improvements to workflows.

  • Observability and monitoring:

Design, implement, and continuously improve observability frameworks by developing dashboards, alerts, metrics, and logging strategies to monitor service health, detect anomalies proactively, support issue resolution, and ensure cost-optimized performance across the platform.

  • Collaboration and Support:

Partner with cross-functional teams to implement observability best practices, providing training and guidance on tools while leveraging metrics data to drive engineering priorities.

  • Platform engineering:

Leverage AWS to design, build, and maintain a resilient cloud infrastructure, implementing best practices for security, scalability, and cost optimization while ensuring high availability, disaster recovery, and robust platform components such as pipelines, shared infrastructure, and application services.


Your Skills:

• Proven experience as a (Site) Reliability Engineer or similar role in a SaaS and/or telecom company.

• Hands-on experience with observability tools (e.g., Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly), including setup and optimization of metrics and alerts.

• Experience in establishing and managing incident management processes.

• Understanding of incident management frameworks and best practices.

• Extensive experience with AWS cloud services (e.g., EC2, S3, RDS, Lambda, CloudWatch).

• Expert skills with modern infrastructure tooling and principles (Kubernetes, IaaC - Terraform, CI/CD - GitHub Actions, Jenkins)

• Good understanding of modern development tooling and principles (e.g., microservices architecture, 12-factor applications, Docker)

• Advanced documentation skills for effective knowledge sharing and collaboration.

• Exceptional problem-solving and critical thinking with a passion for enhancing development experiences in fast-paced tech environments.

• Ability to work independently and as part of a team.


Nice to have:

• Knowledge of networking protocols and telecom systems

• Knowledge of secure software development

• Familiarity with programming languages such as Python, Go, or Java.

• Certification in AWS (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)

Top Skills

AWS
Cloudwatch
Docker
Github Actions
Go
Grafana
Java
Jenkins
Kubernetes
Loki
Mimir
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Berlin
188 Employees
Year Founded: 2014

What We Do

emnify is the leading cloud building block for cellular communications in the IoT stack, connecting millions of IoT devices globally – from electric vehicles to energy meters, alarm systems to GPS trackers, thermometers to health wearables.

The emnify API and SIM technology connect and secure any kind of IoT deployment to its application back-end. emnify’s cloud-native integrations and no-code workflows ensure seamless lifecycle scalability for deployments of all sizes – from local start-up to global enterprise.

The emnify IoT SuperNetwork is the largest globally distributed mobile cloud core network of its kind, supporting local network access (2G – 5G, LTE-M, NB-IoT) in over 180 countries from more than 25 cloud regions – and counting. emnify’s solution is built on partnerships with the leading hyperscaler cloud service providers, system integrators and hundreds of radio network operators worldwide.

Founded in 2014, emnify was the first to transform cellular IoT connectivity into an easy-to-consume cloud resource – trusted today by thousands of the world’s most innovative companies. To learn more about emnify, please visit www.emnify.com

Similar Jobs

Grammarly Logo Grammarly

Technical Program Manager

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Easy Apply
Hybrid
Berlin, DEU

Grammarly Logo Grammarly

Scientist

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Easy Apply
Hybrid
Berlin, DEU

ZS Logo ZS

Administrative Assistant

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Hybrid
Berlin, DEU

ZS Logo ZS

Consultant

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Hybrid
Berlin, DEU
95K-95K

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account