Senior Incident Problem Manager
WHAT IS BOX?
Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, collaboration and workflow. We have an amazing opportunity to further establish ourselves as leaders in the space, and we need strong advocates to help us achieve that goal.
By joining Box, you will have the unique opportunity to help capture a majority of this developing market and define what content management looks like for the digital enterprise. Today, Box powers over 98,000 businesses, including 70% of the Fortune 500 who trust Box to manage their content in the cloud.
WHY BOX NEEDS YOU
Box is looking for a dynamic Problem Management Analyst to drive our Problem Management Process in support of our industry-leading platform. It is the responsibility of the Problem Manager to analyze data and trends from multiple incidents and identify themes which will drive the prioritization of Post-Mortem Action Items across the organization. This is a critical role that supports our ever growing customer base; companies like GE, Pandora, Apple and Gap. This is an integral job function within the Global Technical Operations Center that ensures the overall production site health and the performance of core customer facing journeys. That's where you come in!
WHAT YOU'LL DO
- Overall ownership of the Problem Management process; including definition and tracking of critical success metrics.
- Active participation in Daily Incident Review meetings to understand the impact of recent incidents and identify patterns related to previous incidents.
- Lead a daily Problem Management review meeting, ensuring the team is meeting key objectives with respect to post-mortem SLA, repair item SLA as well as availability metrics
- Advise and guide the Incident Owner during the Post-mortem process, to ensure a thorough analysis is conducted and appropriate repair items have been identified.
- Track and report on the progress and completion of repair items across the Engineering organization; working with Engineering Leaders across Box to appropriately prioritize the work that comes out of our post-mortem process; driving improvements to availability, performance and reliability.
- Analysis of Incident, event and change data and the identification/prioritization of Problem trends. Drive resolution of problem tickets with appropriate service owners.
- Lead a cross-organizational group of post-mortem facilitators, including training of new team members and distribution of workload across the team.
- Lead efforts to mature the Problem Management process at Box including process, tooling and reporting improvements
WHO YOU ARE
- You have 5+ years Problem Management experience and 7+ years experience in a large SaaS environment.
- You have a solid understanding of ITIL processes - Incident Management, Problem Management, Change Management in a large scale, high uptime environment.
- You enjoy working collaboratively across all teams within an organization and are able to influence people and drive a team towards a shared objective.
- You are confident and comfortable communicating from the individual-contributor level up through C-level staff
- You are a natural problem solver; able to understand and drive a technical solution with engineering teams.
- You are data driven with a forward looking, collaborative perspective to continual service improvement
- Bachelor's degree in Computer Science or Information Systems or equivalent technical field, or similar work experience in a large-scale 24/7 production environment supporting critical, real-time applications
- ITIL Certified
- Remote friendly
Preferred Skills
- Understanding of Linux systems in support of a SaaS product
- Understanding of virtualization & containerized platforms: Openstack and Kubernetes
- Experience working in cloud implementations (GCP preferred, AWS)
- Understanding of networking technologies: TCP/IP, DNS, Routing, HTTP
- Experience supporting open source environments: Kafka, MQS, RabbitMQ, MySQL, HBase
- Understanding of observability tools and how to effectively use their capabilities in a large scale environment (Splunk, Datadog, Wavefront, Catchpoint, ThousandEyes, Sensu, Distributed Tracing, RUM)
- Understanding of with CI/CD pipelines
- Outstanding interpersonal and communication skills.
BENEFITS
- Visit this webpage to check out all of our exciting healthcare benefits: https://join.collectivehealth.com/box
- For all other benefits, please check out: Box Benefits + Perks
EQUAL OPPORTUNITY
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
For details on how we protect your information when you apply, please see our Personnel Privacy Notice.