https://ad.doubleclick.net/ddm/clk/628601142;435308584;f?https://www.capitalonecareers.com/tech?source=rd_builtin_job_posting_tm&utm_source=builtin.com&utm_medium=job_posting&utm_campaign=Tech&utm_content=niche_site&utm_term=435308584&ss=paid

Capital One Jobs

Lead Site Reliability Engineer

Capital One

Lead Site Reliability Engineer

Reposted Yesterday

Be an Early Applicant

Hiring Remotely in Mexico City, Ciudad De México, MEX

Remote or Hybrid

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

Change everything. Starting with your career.

The Role

Own reliability of batch settlement systems processing financial transactions across on-prem and AWS. Build observability, automate operational toil, participate in incident management, ensure SOX and PCI-DSS audit readiness, partner with UK engineers, and deliver runbooks and durable fixes.

Summary Generated by Built In

WeWork Reforma Latino (97001), Mexico, Ciudad de Mexico, Ciudad de Mexico
Lead Site Reliability Engineer
We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Manager-level Backend Engineer to own the reliability and operational maturity of our settlement platforms. These are batch-critical systems that process every credit and debit transaction across the network.
This is a foundational role. You'll be one of the first engineers in CDMX responsible for ensuring settlement cycles complete accurately, on time, and in compliance with SOX and PCI-DSS requirements. You'll work across hybrid infrastructure (on-prem data centers and AWS), partner closely with UK-based engineers, and build the automation and observability that allows Mexico City to operate settlement.
What You'll Do

Own reliability for batch settlement systems - ensure cycle completion windows are met, data integrity is maintained, and failures are detected before they reach downstream consumers
Build and improve observability for settlement pipelines - dashboards, alerts, and anomaly detection that make system health legible and reduce reliance on tribal knowledge
Drive automation of operational toil - certificate rotation, environment provisioning, compliance artifact generation, and manual validation steps that currently require human intervention
Partner with UK-based settlement engineers - acquire domain expertise on Durbin compliance windows, cross-border DCI routing, and acquirer/issuer SLA adherence
Participate in incident management - respond to settlement failures, drive root cause analysis, and implement durable fixes that prevent recurrence
Contribute to regulatory readiness - ensure SRE practices produce audit-ready artifacts for SOX and PCI-DSS exams without manual toil

What Success Looks Like

Independently validate and troubleshoot settlement cycle failures
At least two manual settlement operations processes fully automated
Settlement observability coverage sufficient to detect anomalies before cycle deadlines
Documented runbooks and severity criteria for all critical settlement failure modes

The Environment
You'll work with batch processing systems that handle financial transactions across multiple on-prem data centers with active/active and active/passive configurations. The stack includes Java, Python, shell scripting, SQL, AWS, Kubernetes, OpenShift containers, Datadog, Observe, and legacy payment platforms. CI/CD pipelines, API automation, and secret management via HashiCorp Vault are part of daily operations. You'll leverage agentic AI automation (Claude Code or others) to accelerate development and build automation solutions. You'll need strong troubleshooting and debugging skills and be comfortable with both modern cloud-native tooling and traditional enterprise batch systems.
Basic Qualifications

Professional English fluency
Bachelor's degree
At least 6 years of experience in SRE, production operations, or reliability engineering
Experience in DevOps Engineering (internship experience does not apply)
5+ years of experience in at least one of the following: Java, Python, Go
At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
3+ years of experience with container orchestration services including Docker or Kubernetes
Experience with Shell or Bash scripting
At least 3 years of Unix or Linux system administration experience

Preferred Qualifications

Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
Troubleshooting and debugging skills across distributed systems
Familiarity with payments, financial services, or other regulated high-availability domains
Knowledge or experience of Networking concepts (TCP/DNS/TLS)

At Capital One, we respect individual differences in culture, religion, and ethnicity. Likewise, we promote equal opportunities and development for all personnel. In the hiring process, we seek to provide equal employment opportunities to candidates, regardless of race, color, religion, gender, sexual orientation, marital or civil status, national origin, disability, or any other situation protected by federal, state, or local laws.
For technical support or questions about Capital One's recruiting process, please send an email to [email protected]
Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.
Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe, any position posted in the Philippines is for Capital One Service Corp (COPSSC), and any position posted in Mexico is for Capital One Technology Labs Mexico.

Skills Required

Professional English fluency
Bachelor's degree
At least 6 years experience in SRE, production operations, or reliability engineering
Experience in DevOps Engineering (internship experience does not apply)
5+ years experience in at least one: Java, Python, Go
At least 4 years experience with Cloud Native technologies (AWS, Azure, GCP)
3+ years experience with container orchestration (Docker or Kubernetes)
Experience with Shell or Bash scripting
At least 3 years Unix or Linux system administration experience
Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
Troubleshooting and debugging skills across distributed systems
Familiarity with payments, financial services, or other regulated high-availability domains
Knowledge or experience of Networking concepts (TCP/DNS/TLS)

What the Team is Saying

View all jobs at Capital One

View Capital One Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: McLean, VA

55,000 Employees

Year Founded: 1994

What We Do

At Capital One, we think and work like a tech company, using our digital fluency to transform everything about the customer experience. We’re bending data to our will, and turning a stodgy industry on its head. That’s reflected in our ranking as the number one business technology innovator in the U.S. in the 2016 InformationWeek Elite 100.

Why Work With Us

Here’s another question: What are you looking for? A place where curiosity is the starting point? Where data leads to human insights? Where humanity drives product development? We’re bringing breakthrough products and services to consumers, small businesses, and commercial clients. And each new idea makes life better for millions of people.