Zeal Holdings

Senior Site Reliability Engineer (SRE)

Posted 7 Hours Ago

Be an Early Applicant

3 Locations

Senior level

Fintech • Payments • Financial Services

The Role

As a Senior Site Reliability Engineer, you will ensure the reliability and performance of systems by designing scalable architectures, optimizing CI/CD pipelines, automating tasks, leading incident management, and mentoring team members. Your expertise will enhance development velocity and customer satisfaction.

Summary Generated by Built In

Description

About Zeal Group

Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe 🌎

Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hub located in Cyprus 🚀

We are a product and people focused company who are passionate about growth, innovative technology, and collaboration 🙌🏼
About the Role

We are looking for a Senior Site Reliability Engineer (SRE) to join our engineering team and help drive the reliability, scalability, and performance of our infrastructure. As a Senior SRE, you will play a key role in architecting and maintaining highly available systems, optimizing our CI/CD pipelines, automating repetitive tasks, and ensuring seamless deployment and observability for our services. Your contributions will have a direct impact on our development velocity, service uptime, and overall customer satisfaction. Our team of SRE engineers is fully responsible for the infrastructure in the clouds and its fault tolerance and performance. To support the development and their pipelines, we have a separate DevOps team that helps them.
Responsibilities:

System Design & Architecture: Collaborate with software engineers and DevOps to design and implement resilient and scalable systems, focusing on high availability, fault tolerance, and disaster recovery.
Automation & Infrastructure as Code: Develop and maintain infrastructure automation scripts and tools using Terraform, Ansible, or similar technologies, ensuring reproducibility and consistency across environments.
CI/CD Pipeline Optimization: Build and enhance CI/CD pipelines to accelerate deployment speed and reduce time to market, including implementing blue-green or canary deployments where applicable.
Monitoring & Alerting & Logging: Create, manage, and refine monitoring dashboards and alerting systems using tools like Prometheus, Grafana, ElasticSearch to proactively detect and address potential issues before they impact customers.
Incident Management & Troubleshooting: Lead incident response efforts, perform root cause analysis, and implement long-term fixes to prevent reoccurrence, ensuring a fast, reliable response to production issues.
Performance Tuning: Conduct regular performance testing and tuning, identifying bottlenecks in infrastructure performance and system resources.
Mentorship & Leadership: Guide and mentor other team members, sharing best practices and helping to build a culture of reliability and performance within the engineering organization.

Requirements

5+ years of experience in SRE, DevOps, or a similar role, with a proven track record of managing large-scale, distributed systems.
Strong knowledge of Linux/Unix systems and networking fundamentals.
Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
Experience with containerization and orchestration (Docker, Kubernetes).
Hands-on experience with infrastructure as code (IaC) tools such as Terraform, Ansible.
Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK).
Knowledge of technology for storing and delivering secrets to microservices (Hashicorp Vault)
Cloud Expertise: Experience with cloud platform (GCP) and understanding of cloud-native architectures.
Problem-Solving Skills: Strong analytical and problem-solving skills, with a focus on building automated, scalable solutions to complex challenges.
Collaboration: Ability to work cross-functionally with engineering, product, and support teams, with excellent communication and collaboration skills.

Technology Stack:

CDN providers: Akamai, EdgeNext
Cloud Platform: GCP
Orchestration: Kubernetes
CI/CD: GitLab, ArgoCD
IAC: terraform, ansible
Event streams: Kafka, RabbitMQ
Logging: ElasticSearch, Kibana, filebeat, logstash
Monitoring: Prometheus/VictoriaMetrics, Grafana, AlertManager, PagerDuty.
Secret Management: Hashicorp Vault, External Secret Operator
Artifactory: Sonatype Nexus
Object storage: GCS, minio

Top Skills

Bash

Python

View all jobs at Zeal Holdings

View Zeal Holdings Profile

Report Job

The Company

Amsterdam

348 Employees

On-site Workplace

Year Founded: 2017

What We Do

Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe
Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hubs located in Cyprus and Netherlands
We are a product and people focused company who are passionate about growth, innovative technology, and collaboration