Overview:
MetLife is seeking a highly skilled Site Reliability Engineer (SRE) to drive the reliability, scalability, and performance of our mission-critical systems across hybrid cloud environments, with a strong emphasis on Azure Cloud and Azure DevOps. You will play a key role in automation, observability, incident management, and continuous improvement, collaborating closely with engineering and operations teams.
Key Responsibilities:
- System Reliability & Performance:
- Ensure high availability and optimal performance of services across on-premises and Azure cloud platforms.
- Proactively identify, troubleshoot, and resolve system issues, minimizing downtime and impact.
- Automation:
- Design, develop, and maintain automation scripts and tools (Python, PowerShell, Bash) to streamline operations and deployments.
- Monitoring, Observability & Incident Management:
- Architect and maintain robust monitoring, logging, and alerting solutions using Grafana, Splunk, and Azure Monitor/Application Insights.
- Lead incident response, root cause analysis, and post-mortem processes, driving corrective and preventive actions.
- Cloud & Containerization:
- Deploy, manage, and optimize workloads on Azure, leveraging services such as AKS (Kubernetes), Azure Functions, and App Services.
- Build and maintain containerized environments using Docker and Kubernetes.
- Collaboration & Best Practices:
- Partner with engineering teams to align system architecture and performance with business objectives.
- Champion SRE and DevOps best practices, fostering a culture of reliability, automation, and continuous improvement.
- Documentation & Knowledge Sharing:
- Maintain comprehensive system documentation and runbooks.
- Share knowledge and mentor team members to elevate operational excellence.
Qualifications & Skills:
- 3+ years of SRE or DevOps experience supporting hybrid cloud environments (On-Prem & Azure).
- Advanced proficiency in Azure Cloud services, Azure DevOps (Pipelines, Repos).
- Mandatory : Hands-on experience with Docker, AKS (Azure Kubernetes Service) and CI/CD automation.
- Deep experience with monitoring and observability tools: ELK Stack, Grafana, Splunk, Azure Monitor/Application Insights.
- Strong scripting skills: Python, PowerShell, Bash
- Good to have : Experience with configuration management tools such as Puppet or Ansible.
- Solid SQL and database troubleshooting skills.
- Familiarity with ITSM tools (e.g., ServiceNow).
- Business proficiency in English; Japanese language skills are a plus.
- Relevant certifications (e.g., Azure Administrator (AZ-104), Azure DevOps Engineer (AZ-400), CKA) are highly desirable.
About MetLife
Recognized on Fortune magazine's list of the "World's Most Admired Companies" and Fortune World's 25 Best Workplaces™, MetLife, through its subsidiaries and affiliates, is one of the world's leading financial services companies; providing insurance, annuities, employee benefits and asset management to individual and institutional customers. With operations in more than 40 markets, we hold leading positions in the United States, Latin America, Asia, Europe, and the Middle East.
Our purpose is simple - to help our colleagues, customers, communities, and the world at large create a more confident future. United by purpose and guided by our core values - Win Together, Do the Right Thing, Deliver Impact Over Activity, and Think Ahead - we're inspired to transform the next century in financial services. At MetLife, it's #AllTogetherPossible . Join us!
#BI-Hybrid
Top Skills
What We Do
We're honored to be No. 10 on Great Place to Work's World's Best Workplaces and recognized in the Fortune 100 Best Companies to Work For® list in 2025. At MetLife, we're leading the global transformation of an industry we’ve defined for over 157 years.
At MetLife, every innovation and line of code is a lifeline for our customers and their families—from victims of natural disasters to people living with disabilities and beyond. With operations in more than 40 markets and leading positions across the globe, MetLife fosters an inclusive culture where our people are energized and inspired to deliver for our customers and communities.
Join our remarkable journey—one in which you help write the next century of innovation in financial services—because with MetLife, making the world a better place is All Together Possible.
Why Work With Us
At MetLife, you’ll be working for a company whose purpose is to help customers throughout their life’s journey, and often in their most critical time of need. You’ll be a part of developing leading-edge platforms that will have a lasting impact on the lives and well-being of tens of millions of customers.
Gallery
MetLife Teams
MetLife Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
MetLife's current workplace policies classify roles as Office, Hybrid or Virtual based on the nature of work, encouraging new ways of working together


















