Position Overview
We are seeking an experienced Site Reliability Engineer with 10+ years of experience to ensure the reliability, scalability, and performance of our cloud-based infrastructure and services.
Key Responsibiliti es
* Design, implement, and maintain highly available, scalable cloud infrastructure across AWS/Azure/GCP
* Develop and maintain automation tools for deployment, monitoring, and incident response
* Implement and improve CI/CD pipelines for seamless application delivery
* Monitor system performance, conduct capacity planning, and optimize resource utilization
* Lead incident response, conduct root cause analysis, and implement preventive measures
* Establish and track SLIs, SLOs, and error budgets
* Mentor junior engineers and promote SRE best practices across teams
* Collaborate with development teams to improve system reliability and operational excellence
* Implement infrastructure as code using Terraform, C loudFormation, or similar tools
* Design and maintain disaster recovery and business continuity solutions
Required Qualifications
* 10+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering
* 5+ years hands-on experience with cloud platforms (AWS, Azure, or GCP)
* Strong proficiency in scripting languages (Python, Bash, Go, or similar)
* Expert knowledge of containerizati on (Docker, Kubernetes)
* Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, ELK stack)
* Deep understanding of Linux/Unix system administration
* Proven experience with infrastructure as code and configuration management tools
* Experience with incident management and on-call rotations
* Excellent troubleshootin g and problem-solvin g skills
Preferred Qualifications
* Bachelor's degree in Computer Science or related field
* Cloud certifications (AWS Solutions Architect, Azure Administrator, GCP Professional)
* Experience with service mesh technologies (Istio, Linkerd)
* Strong understanding of distributed systems and microservices architecture
About MetLife
Recognized on Fortune magazine's list of the "World's Most Admired Companies" and Fortune World's 25 Best Workplaces™, MetLife, through its subsidiaries and affiliates, is one of the world's leading financial services companies; providing insurance, annuities, employee benefits and asset management to individual and institutional customers. With operations in more than 40 markets, we hold leading positions in the United States, Latin America, Asia, Europe, and the Middle East.
Our purpose is simple - to help our colleagues, customers, communities, and the world at large create a more confident future. United by purpose and guided by our core values - Win Together, Do the Right Thing, Deliver Impact Over Activity, and Think Ahead - we're inspired to transform the next century in financial services. At MetLife, it's #AllTogetherPossible . Join us!
#BI-Hybrid
Skills Required
- 10+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering
- 5+ years hands-on experience with cloud platforms (AWS, Azure, or GCP)
- Strong proficiency in scripting languages (Python, Bash, Go, or similar)
- Expert knowledge of containerization (Docker, Kubernetes)
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, ELK stack)
- Deep understanding of Linux/Unix system administration
- Proven experience with infrastructure as code and configuration management tools
- Experience with incident management and on-call rotations
- Excellent troubleshooting and problem-solving skills
MetLife Compensation & Benefits Highlights
-
Retirement Support — Retirement benefits combine a 401(k) with company match and a company‑funded cash‑balance pension. This uncommon pairing strengthens long‑term security.
-
Parental & Family Support — Family supports include paid parental leave, adoption assistance, caregiving resources, fertility coverage, and practical travel supports for nursing parents. These provisions go beyond core insurance to address real‑world family needs.
-
Leave & Time Off Breadth — Time away includes a minimum of 22 days of PTO annually and a paid volunteer day, alongside bereavement and other leave options. This breadth provides meaningful flexibility for rest and life events.
MetLife Insights
What We Do
We're honored to be No. 10 on Great Place to Work's World's Best Workplaces and recognized in the Fortune 100 Best Companies to Work For® list in 2025. At MetLife, we're leading the global transformation of an industry we’ve defined for over 157 years. At MetLife, every innovation and line of code is a lifeline for our customers and their families—from victims of natural disasters to people living with disabilities and beyond. With operations in more than 40 markets and leading positions across the globe, MetLife fosters an inclusive culture where our people are energized and inspired to deliver for our customers and communities. Join our remarkable journey—one in which you help write the next century of innovation in financial services—because with MetLife, making the world a better place is All Together Possible.
Why Work With Us
At MetLife, you’ll be working for a company whose purpose is to help customers throughout their life’s journey, and often in their most critical time of need. You’ll be a part of developing leading-edge platforms that will have a lasting impact on the lives and well-being of tens of millions of customers.
Gallery
MetLife Teams
MetLife Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
MetLife's current workplace policies classify roles as Office, Hybrid or Virtual based on the nature of work, encouraging new ways of working together



.png)
















.png)























