Site Reliability Engineering Technical Program Manager
The Opportunity
Flexport’s Security, IT, Infrastructure and SRE organization is looking for a passionate, results oriented Site Reliability Technical Program Manager with solid understanding of site reliability engineering and incident management. In this role you will manage the deliveries of successful incident triages as well as resilience projects that improve the observability and reliability of all Flexport internal and external technologies globally, in a diverse and fast-paced environment with multiple stakeholders and multiple resource pools.
You must be able to initiate and manage programs and dependencies via solid understanding of the technology resilience objectives and how it relates to security objectives and the related technologies, communicate clearly among stakeholders and team members, and have a consistent track record of delivery. You are proactive in responding to incidents, creating program vision and roadmap, developing and following through on project plans, identifying program/project headwinds, facilitating resolutions, and can handle multiple competing priorities in a fast-paced environment. You are able to curate and present relevant data and articulate extremely complex technical dependencies to facilitate informed decision making during escalations.
You will:
- Manage multiple complex resilience improvement and scalability projects in a diverse environment with multiple stakeholders and multiple resource pools under the guidance of technology and security program leadership. Program scope can include anything from site reliability engineering, infrastructure, platform, integrations, engineering collaborations, security compliance, etc.
- Work closely with Site Reliability Engineering, Security, Product Engineering, Infrastructure, IT and cross-functional business leaders to align resilience efforts across multiple technologies stacks and domains for Flexport.
- Drive partnership efforts with stakeholders to develop program and project plans, including roadmap, dependency identifications, resourcing plan and other program/project collaterals as needed. Drive for project and program completion.
- Analyze business and technical processes and procedures, continuously increase the resilience level of Flexport technologies to scale and improve customer trust in Flexport technologies.
- Manage project operational aspects, scope, change management, and track scheduled progress through the appropriate metrics to meet changing needs and requirements
- Own the communications clearly, timely and effectively to management on the plans, status and critical issues. Escalate urgent issues appropriately and drive them to closure. Ensure clean operational hand-off upon project go-lives and be accountable for operational stability.
- Leverage existing program tools and best practices. Partner with fellow Technical Program Managers to cross train, build trust and amplify program outcomes.
- Proactively remove obstacles by managing issue escalations and scope changes.
- Be the internal liaison for incidents, follow Flexport Incident Response procedure, facilitate response team’s adherence to procedures for incidents to be resolved as quickly as possible. Assist incident commanders in incident triage and communication cross-functionally to ensure incident updates are shared timely and properly. Ensure incident data capture and build various incident metrics and dashboards to measure the success of the SRE function.
- Assist in incident retrospectives and remediations. Analyze incident trend data to identify and formulate resilience projects to prevent and eliminate root causes and fragile technology components. Initiate actions to fix potential technology interruptions under the guidance of site reliability engineering leadership.
- Act as a delegate for the incident commander as needed. Perform other incident or technical program duties as assigned
- Have a bias for action. Be driven, be organized, be resourceful with a “can-do” attitude
What you’ll need:
- Strong interpersonal and communications skills.
- 3-10 years of results-proven technical program management experience and incident management experience in Site Reliability Engineering, Incident Management, Security, Cloud Infrastructure, IT and/or Engineering.
- Current knowledge in Cloud Computing, Security, Site Reliability Engineering, Incident Response, Product Engineering, Infrastructure and/or IT.
- Demonstrated ability to apply end-to-end thinking on complex technical dependencies that inherently exist in modern cloud technologies (AWS, GCP, Azure or equivalent), SRE tools (PagerDuty, StatusPage, etc); and SecDevOps and/or DevOps technologies (CI/CD, Infrastructure as Code, Incident Automation, etc)
- Solid understanding and applications of the relationships between the SRE function and the rest of the company.
- Willingness to learn new technologies and apply existing and new technical knowledge to program management constantly.
- Demonstrated ability to navigate ambiguity and extrapolate concrete requirements, alignment, and next steps
About Flexport:
We believe global trade can move the human race forward. That’s why it’s our mission to make global trade easier for everyone. We aim to do this by building the Operating System for Global trade - a strategic model combining advanced technology and data analytics, logistics infrastructure, and supply chain expertise. Flexport today connects almost 10,000 clients and suppliers across 109 countries, including established global brands like Georgia-Pacific as well as emerging innovators like Sonos. Started in 2013, we've raised over $1.3B in funding from SoftBank Vision Fund, Founders Fund, GV, First Round Capital and Y Combinator. We’re excited about the three big ways we’re moving forward after our recent $1B investment from SoftBank Vision Fund in February 2019.
Worried about not having any freight forwarding experience?
- Don’t be! We’re building the first Operating System for Global Trade. That’s why it’s incredibly important for us to bring people from diverse backgrounds and experiences together with our industry veterans to help move the freight forwarding industry forward.
- What’s freight forwarding and why does it matter? Freight forwarding is the coordination and shipment of goods from one place to another and it’s what makes global trade possible. Flexport is on a mission to make global trade easier for everyone because we believe it can help connect the world and break down economic barriers.
- We know this industry is complex. That’s why we invest in education starting day one with Flexport Academy, a one week intensive onboarding program designed specifically to set every new Flexport employee up for success.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.