Site Reliability Engineering Manager
About Us
Pangea offers a smarter way for people to move money around the world to friends and family. Founded in 2012, Pangea started with a mission to revolutionize the remittance industry by providing customers a powerful, easy-to-use mobile application to send money internationally. Since then, we have grown to offer additional products and services to help underbanked immigrants gain financial independence.
Pangea operates as an independent subsidiary under Enova International (NYSE: ENVA). Also headquartered in Chicago, Enova is a leading financial technology company offering accessible credit to millions of customers. Together, Pangea and Enova are on a mission to help hardworking people get access to fair financial services.
As Pangeans, we value introspection, accountability, empowerment, excellence, and above all, kindness. We believe in a fierce dedication to customer experience. We know that diversity is the key to innovation and creativity. If you have a growth mindset and you thrive under pressure, you’re a great fit for our team!
About the role
We are looking for a Site Reliability Engineering Manager to oversee all technical operational aspects of the Pangea platform to ensure its reliability. This role is at the intersection of business and technology, in which you will manage all production operational metrics, lead a team of first responders, and oversee the resolution of production incidents. Our platform includes a set of mobile apps and API services hosted on AWS. If you are motivated by supporting successful products and pushing them towards new levels of excellence in platform reliability, processes and observability tools, then we want to talk to you!
As Site Reliability Engineering Manager, you'll get to:
- Manage the reliability and availability of our production platform.
- Manage teams that provide 24x7 application and AWS infrastructure support and incident response.
- Define and maintain standardized runbooks and tools to prepare for future incidents.
- Manage team support workflows, schedules and response metrics via PagerDuty.
- Manage platform metrics, alert thresholds and response procedures for existing and new platform features.
- Troubleshoot production issues and contribute to improving team workflows and response effectiveness.
- Communicate with internal support and engineering teams and external partners to resolve production issues.
- Develop regular reporting on platform reliability and support team effectiveness.
- Drive incident post mortem reviews and follow up on identified improvements.
The ideal candidate will have:
- BS or MS degree in computer science or a related degree.
- 10+ years of experience in roles relevant to managing a production environment.
- Experience managing dedicated support teams with well established processes.
- Understanding of common site reliability engineering concepts.
- Strong problem-solving and analytical skills.
- Good technical communication skills.
Bonus points if you have:
- Experience managing platforms based on AWS.
- Experience using observability tools such as New Relic or Datadog.
Pangea is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran, or disability status.