Clip

SRE Lead

Reposted An Hour Ago

Be an Early Applicant

Hiring Remotely in México

Remote

Senior level

Fintech

The Role

Lead Site Reliability Engineer responsible for ensuring system reliability, performance, and resiliency through architecture, observability, and automation processes.

Summary Generated by Built In

We are looking for a Lead Site Reliability Engineer (SRE) who will drive reliability, performance, resiliency, and automation across our platform. This is a hands-on technical leadership role responsible for building scalable infrastructure, improving observability, leading incident response, and establishing SRE best practices in a high-volume, high-availability fintech environment.

You will partner closely with engineering teams and influence architecture decisions to ensure our systems are reliable, efficient, and production-ready.

Responsibilities

Reliability & Architecture

Own the reliability, availability, and performance of core production services.
Define and manage SLIs, SLOs, and error budgets, ensuring reliability is baked into every release.
Drive production-ready reviews, capacity planning, and performance optimization.
Lead disaster recovery strategies, failover testing, and resilience engineering.

Observability & Incident Management

Improve and scale observability across logs, metrics, and traces using modern tools.
Strengthen root-cause analysis and lead blameless postmortems.
Refine on-call processes and automate repetitive operational tasks.

Automation & Infrastructure

Lead automation initiatives to eliminate toil and reduce manual operations.
Build and maintain cloud-native infrastructure (AWS / GCP / Azure).
Develop internal tools and reliability solutions using Go, Python, or Java.
Work with Kubernetes, Terraform, CI/CD pipelines, and infrastructure-as-code standards.

Leadership & Collaboration

Mentor SRE engineers and champion SRE practices across engineering.
Collaborate with backend, platform, and product teams to design scalable, resilient distributed systems.
Provide technical guidance and influence architectural decisions that impact system reliability.

Qualifications

Bachelor’s or Master’s in Computer Science, Engineering, or related field.
8+ years in cloud, infrastructure, DevOps, or reliability engineering.
2+ years leading SRE, platform, or reliability-focused initiatives.
Expertise in Linux systems, distributed system fundamentals, networking (TCP/IP, HTTP), and cloud architecture.
Strong hands-on experience with:
- Kubernetes / K8s
- AWS, GCP, or Azure
- Terraform and IaC frameworks
- Prometheus, Grafana, ELK, OpenTelemetry, or similar tools
Proficiency in Go, Python, or Java for automation and tooling.

Nice to Have

Experience working in fintech or regulated financial environments.
Exposure to high-throughput, low-latency systems.

Top Skills

AWS

Azure

Ci/Cd

Elk

GCP

Grafana

Java

Kubernetes

Opentelemetry

Prometheus

Python

Terraform

View all jobs at Clip

View Clip Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Mexico City, Mexico

1,000 Employees

Year Founded: 2012

What We Do

This is Clip, where the extraordinary happens! Clip is Mexico’s leading integral ecosystem that promotes financial inclusion of people and businesses through innovative & technological trusted solutions, making it easy, accessible and transparent for all.

Every solution is endorsed by the best talent making the extraordinary happen.

In June 2021 Clip became the first payments Unicorn in the 12th largest economy in the world. This step only reinforces our commitment to continue building Mexico’s operating platform for commerce, fulfilling our vision to have Clip in every business in Mexico.

Have you ever worked in a Unicorn Fintech company? Well, if the answer was no, this is your chance to join a company as unique and extraordinary as unicorns are!