Senior SRE Engineer

Reposted 9 Days Ago
Be an Early Applicant
2 Locations
Remote
Senior level
Marketing Tech • Software
The Role
The Senior SRE Engineer will design and maintain observability platforms, optimize monitoring processes, automate deployments, and improve system reliability by collaborating with cross-functional teams.
Summary Generated by Built In

As part of our continued growth, Neo Group is recruiting on behalf of one of our local partners, leveraging our network of 1,400 talented professionals across 10+ countries. Together, we are committed to delivering innovative, data-driven solutions that empower our clients and foster professional growth within a dynamic and collaborative workplace.

We are on the lookout for a Senior SRE Engineer to join our Engineering Department.

Responsibilities:

  • Design, deploy, and maintain observability platforms including Zabbix, Grafana, and Opensearch Stack (Opensearch, Logstash, Kibana).
  • Implement and maintain metrics, logs, traces, and synthetic monitoring across infrastructure and applications.
  • Integrate Prometheus, Alertmanager and OpenTelemetry where applicable to achieve unified observability.
  • Maintain monitoring coverage for Linux, network devices, applications, and cloud services.
  • Maintain and enhance the overall monitoring and logging infrastructure, including capacity, performance, and reliability.
  • Develop meaningful dashboards and alerting logic to ensure timely and actionable incident notifications.
  • Optimize alerting systems: reduce noise, tune thresholds, and focus on critical business and technical metrics.
  • Improve observability processes and implement predictive failure analysis and early-warning signals.
  • Analyze incidents, identify patterns, and drive proactive monitoring improvements.
  • Define and maintain KPIs, SLIs, SLOs, and SLA measurement processes in coordination with service owners.
  • Enhance reliability through structured incident management and post-mortem analysis.
  • Automate deployment and configuration of monitoring components using Ansible, Terraform following IaC principles.
  • Manage configuration templates and Zabbix host provisioning through automation tools (Ansible, Terraform following IaC principles).
  • Leverage APIs and scripting (e.g., Python, Go) for data collection, integrations, and automation.
  • Collaborate closely with  Developers, System Engineers, DevOps, and IT Operations teams to improve system reliability and reduce MTTR.
  • Establish and evolve the Monitoring & Diagnostics foundation for the in-house 24/7 App Support team, including tooling, processes, knowledge base, training, runbooks, and troubleshooting guides.
  • Create intelligent, step-by-step troubleshooting instructions to speed up incident resolution.

Requirements
  • 4+ years of experience as an SRE, Monitoring Engineer, or similar role in production environments.
  • Advanced Linux user with strong command-line and diagnostic skills.
  • Strong understanding of monitoring, logging, and observability concepts (metrics, logs, traces, SLIs/SLOs, alerting).
  • Hands-on experience with at least several of the following:
  • Zabbix, Prometheus, Grafana, Elastic Stack (ELK), Alertmanager, OpenTelemetry.
  • Experience managing both cloud-based and on-premise environments.
  • Automation skills using Python or Go.
  • Proficiency with configuration management / IaC tools (Ansible, Terraform or similar).
  • Solid grasp of networking principles and protocols (TCP/IP, HTTP, DNS, load balancing, etc.).
  • Experience with CI/CD pipelines (GitLab, Jenkins or similar).
  • Familiarity with container orchestration (Kubernetes, Rancher).
  • Experience documenting workflows and training support teams.
  • Proven skills in incident analysis, pattern recognition, and driving preventive improvements.
  • Good communication skills and ability to work with cross-functional teams.

Nice to Have:

  • Experience with synthetic monitoring tools and user-experience monitoring.
  • Background in capacity planning and performance tuning.
  • Advanced knowledge of ML-driven monitoring and predictive analysis.
  • Experience with automated incident response (self-healing systems).

Soft Skills:

  • Responsibility, initiative, and strong analytical thinking.
  • Ability to collaborate effectively within a team.
  • Focus on automation and process improvement.
  • Strong documentation and knowledge-sharing skills.
  • Capability to diagnose complex incidents and provide actionable insights.

Benefits
  • Enjoy 3 health days to focus on your well-being.
  • Take advantage of 25 paid calendar vacation days to explore, relax, and unwind.
  • Get a $30 net per month sports compensation to stay active and healthy.
  • Benefit from top-notch medical insurance for peace of mind.
  • Indulge in a variety of snacks available in the office.
  • Join us for exciting corporate events that foster team spirit and fun!

Top Skills

Alertmanager
Ansible
Ci/Cd
Elastic Stack
Gitlab
Go
Grafana
Jenkins
Kubernetes
Opensearch
Opentelemetry
Prometheus
Python
Terraform
Zabbix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Ta'Xbiex
308 Employees

What We Do

Neo Group encompasses a portfolio of companies offering B2B services in marketing, technology, data analysis, customer support, HR, and compliance. Headquartered in Malta, our teams are strategically located across Europe, Southeast Asia, and Africa. Our mission at Neo Group is straightforward: to drive profitability and expansion in every market we enter. Yet, beyond financial goals, we prioritize creating an environment where individuals thrive. We aim to expand our presence globally while empowering our team members to reach their fullest potential.

Similar Jobs

In-Office or Remote
4 Locations
1151 Employees

Deel Logo Deel

Product UX Tester

Fintech • HR Tech • Payments • Financial Services
In-Office or Remote
27 Locations
8347 Employees

Renmoney Logo Renmoney

Head of Financial Modelling and Product Analytics

Fintech • Payments • Software • Financial Services
Remote
3 Locations
1405 Employees

Freedx Logo Freedx

Performance Marketing Lead

Blockchain • Cloud • Fintech • Payments • Software • Financial Services • Cryptocurrency
Remote
6 Locations
82 Employees

Similar Companies Hiring

LayerOne Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account