Lead Site Reliability Engineer
We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Manager-level Backend Engineer to own the reliability and operational maturity of our settlement platforms. These are batch-critical systems that process every credit and debit transaction across the network.
This is a foundational role. You'll be one of the first engineers in CDMX responsible for ensuring settlement cycles complete accurately, on time, and in compliance with SOX and PCI-DSS requirements. You'll work across hybrid infrastructure (on-prem data centers and AWS), partner closely with UK-based engineers, and build the automation and observability that allows Mexico City to operate settlement.
What You'll Do
- Own reliability for batch settlement systems - ensure cycle completion windows are met, data integrity is maintained, and failures are detected before they reach downstream consumers
- Build and improve observability for settlement pipelines - dashboards, alerts, and anomaly detection that make system health legible and reduce reliance on tribal knowledge
- Drive automation of operational toil - certificate rotation, environment provisioning, compliance artifact generation, and manual validation steps that currently require human intervention
- Partner with UK-based settlement engineers - acquire domain expertise on Durbin compliance windows, cross-border DCI routing, and acquirer/issuer SLA adherence
- Participate in incident management - respond to settlement failures, drive root cause analysis, and implement durable fixes that prevent recurrence
- Contribute to regulatory readiness - ensure SRE practices produce audit-ready artifacts for SOX and PCI-DSS exams without manual toil
What Success Looks Like
- Independently validate and troubleshoot settlement cycle failures
- At least two manual settlement operations processes fully automated
- Settlement observability coverage sufficient to detect anomalies before cycle deadlines
- Documented runbooks and severity criteria for all critical settlement failure modes
The Environment
You'll work with batch processing systems that handle financial transactions across multiple on-prem data centers with active/active and active/passive configurations. The stack includes Java, Python, shell scripting, SQL, AWS, Kubernetes, OpenShift containers, Datadog, Observe, and legacy payment platforms. CI/CD pipelines, API automation, and secret management via HashiCorp Vault are part of daily operations. You'll leverage agentic AI automation (Claude Code or others) to accelerate development and build automation solutions. You'll need strong troubleshooting and debugging skills and be comfortable with both modern cloud-native tooling and traditional enterprise batch systems.
Basic Qualifications
- Professional English fluency
- Bachelor's degree
- At least 6 years of experience in SRE, production operations, or reliability engineering
- Experience in DevOps Engineering (internship experience does not apply)
- 5+ years of experience in at least one of the following: Java, Python, Go
- At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
- 3+ years of experience with container orchestration services including Docker or Kubernetes
- Experience with Shell or Bash scripting
- At least 3 years of Unix or Linux system administration experience
Preferred Qualifications
- Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
- Troubleshooting and debugging skills across distributed systems
- Familiarity with payments, financial services, or other regulated high-availability domains
- Knowledge or experience of Networking concepts (TCP/DNS/TLS)
At Capital One, we respect individual differences in culture, religion, and ethnicity. Likewise, we promote equal opportunities and development for all personnel. In the hiring process, we seek to provide equal employment opportunities to candidates, regardless of race, color, religion, gender, sexual orientation, marital or civil status, national origin, disability, or any other situation protected by federal, state, or local laws.
For technical support or questions about Capital One's recruiting process, please send an email to [email protected]
Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.
Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe, any position posted in the Philippines is for Capital One Service Corp (COPSSC), and any position posted in Mexico is for Capital One Technology Labs Mexico.
Skills Required
- Professional English fluency
- Bachelor's degree
- At least 6 years experience in SRE, production operations, or reliability engineering
- Experience in DevOps Engineering (internship experience does not apply)
- 5+ years experience in at least one: Java, Python, Go
- At least 4 years experience with Cloud Native technologies (AWS, Azure, GCP)
- 3+ years experience with container orchestration (Docker or Kubernetes)
- Experience with Shell or Bash scripting
- At least 3 years Unix or Linux system administration experience
- Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
- Troubleshooting and debugging skills across distributed systems
- Familiarity with payments, financial services, or other regulated high-availability domains
- Knowledge or experience of Networking concepts (TCP/DNS/TLS)
What We Do
At Capital One, we think and work like a tech company, using our digital fluency to transform everything about the customer experience. We’re bending data to our will, and turning a stodgy industry on its head. That’s reflected in our ranking as the number one business technology innovator in the U.S. in the 2016 InformationWeek Elite 100.
Why Work With Us
Here’s another question: What are you looking for? A place where curiosity is the starting point? Where data leads to human insights? Where humanity drives product development? We’re bringing breakthrough products and services to consumers, small businesses, and commercial clients. And each new idea makes life better for millions of people.
Gallery
Capital One Teams
Capital One Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
















