About Zeal Group
Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe 🌎
Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hub located in Cyprus 🚀
We are a product and people focused company who are passionate about growth, innovative technology, and collaboration 🙌🏼
About the Role
We are looking for a Senior Site Reliability Engineer (SRE) to join our engineering team and help drive the reliability, scalability, and performance of our infrastructure. As a Senior SRE, you will play a key role in architecting and maintaining highly available systems, optimizing our CI/CD pipelines, automating repetitive tasks, and ensuring seamless deployment and observability for our services. Your contributions will have a direct impact on our development velocity, service uptime, and overall customer satisfaction. Our team of SRE engineers is fully responsible for the infrastructure in the clouds and its fault tolerance and performance. To support the development and their pipelines, we have a separate DevOps team that helps them.
Responsibilities:
- System Design & Architecture: Collaborate with software engineers and DevOps to design and implement resilient and scalable systems, focusing on high availability, fault tolerance, and disaster recovery.
- Automation & Infrastructure as Code: Develop and maintain infrastructure automation scripts and tools using Terraform, Ansible, or similar technologies, ensuring reproducibility and consistency across environments.
- CI/CD Pipeline Optimization: Build and enhance CI/CD pipelines to accelerate deployment speed and reduce time to market, including implementing blue-green or canary deployments where applicable.
- Monitoring & Alerting & Logging: Create, manage, and refine monitoring dashboards and alerting systems using tools like Prometheus, Grafana, ElasticSearch to proactively detect and address potential issues before they impact customers.
- Incident Management & Troubleshooting: Lead incident response efforts, perform root cause analysis, and implement long-term fixes to prevent reoccurrence, ensuring a fast, reliable response to production issues.
- Performance Tuning: Conduct regular performance testing and tuning, identifying bottlenecks in infrastructure performance and system resources.
- Mentorship & Leadership: Guide and mentor other team members, sharing best practices and helping to build a culture of reliability and performance within the engineering organization.
- 5+ years of experience in SRE, DevOps, or a similar role, with a proven track record of managing large-scale, distributed systems.
- Strong knowledge of Linux/Unix systems and networking fundamentals.
- Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
- Experience with containerization and orchestration (Docker, Kubernetes).
- Hands-on experience with infrastructure as code (IaC) tools such as Terraform, Ansible.
- Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK).
- Knowledge of technology for storing and delivering secrets to microservices (Hashicorp Vault)
- Cloud Expertise: Experience with cloud platform (GCP) and understanding of cloud-native architectures.
- Problem-Solving Skills: Strong analytical and problem-solving skills, with a focus on building automated, scalable solutions to complex challenges.
- Collaboration: Ability to work cross-functionally with engineering, product, and support teams, with excellent communication and collaboration skills.
Technology Stack:
- CDN providers: Akamai, EdgeNext
- Cloud Platform: GCP
- Orchestration: Kubernetes
- CI/CD: GitLab, ArgoCD
- IAC: terraform, ansible
- Event streams: Kafka, RabbitMQ
- Logging: ElasticSearch, Kibana, filebeat, logstash
- Monitoring: Prometheus/VictoriaMetrics, Grafana, AlertManager, PagerDuty.
- Secret Management: Hashicorp Vault, External Secret Operator
- Artifactory: Sonatype Nexus
- Object storage: GCS, minio
Top Skills
What We Do
Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe
Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hubs located in Cyprus and Netherlands
We are a product and people focused company who are passionate about growth, innovative technology, and collaboration