We are looking for a Lead Site Reliability Engineer (SRE) who will drive reliability, performance, resiliency, and automation across our platform. This is a hands-on technical leadership role responsible for building scalable infrastructure, improving observability, leading incident response, and establishing SRE best practices in a high-volume, high-availability fintech environment.
You will partner closely with engineering teams and influence architecture decisions to ensure our systems are reliable, efficient, and production-ready.
Responsibilities
Reliability & Architecture
- Own the reliability, availability, and performance of core production services.
- Define and manage SLIs, SLOs, and error budgets, ensuring reliability is baked into every release.
- Drive production-ready reviews, capacity planning, and performance optimization.
- Lead disaster recovery strategies, failover testing, and resilience engineering.
Observability & Incident Management
- Improve and scale observability across logs, metrics, and traces using modern tools.
- Strengthen root-cause analysis and lead blameless postmortems.
- Refine on-call processes and automate repetitive operational tasks.
Automation & Infrastructure
- Lead automation initiatives to eliminate toil and reduce manual operations.
- Build and maintain cloud-native infrastructure (AWS / GCP / Azure).
- Develop internal tools and reliability solutions using Go, Python, or Java.
- Work with Kubernetes, Terraform, CI/CD pipelines, and infrastructure-as-code standards.
Leadership & Collaboration
- Mentor SRE engineers and champion SRE practices across engineering.
- Collaborate with backend, platform, and product teams to design scalable, resilient distributed systems.
- Provide technical guidance and influence architectural decisions that impact system reliability.
Qualifications
- Bachelor’s or Master’s in Computer Science, Engineering, or related field.
- 8+ years in cloud, infrastructure, DevOps, or reliability engineering.
- 2+ years leading SRE, platform, or reliability-focused initiatives.
- Expertise in Linux systems, distributed system fundamentals, networking (TCP/IP, HTTP), and cloud architecture.
- Strong hands-on experience with:
- Kubernetes / K8s
- AWS, GCP, or Azure
- Terraform and IaC frameworks
- Prometheus, Grafana, ELK, OpenTelemetry, or similar tools
- Proficiency in Go, Python, or Java for automation and tooling.
Nice to Have
- Experience working in fintech or regulated financial environments.
- Exposure to high-throughput, low-latency systems.
Top Skills
What We Do
This is Clip, where the extraordinary happens! Clip is Mexico’s leading integral ecosystem that promotes financial inclusion of people and businesses through innovative & technological trusted solutions, making it easy, accessible and transparent for all.
Every solution is endorsed by the best talent making the extraordinary happen.
In June 2021 Clip became the first payments Unicorn in the 12th largest economy in the world. This step only reinforces our commitment to continue building Mexico’s operating platform for commerce, fulfilling our vision to have Clip in every business in Mexico.
Have you ever worked in a Unicorn Fintech company? Well, if the answer was no, this is your chance to join a company as unique and extraordinary as unicorns are!

.png)





