The Role
Own and operate ITIL-aligned Problem Management lifecycle: detect, log, investigate root causes, manage Known Error Database, run RCA sessions, drive preventative actions, partner with L2/L3, platform and engineering teams, and report trends and metrics to leadership to reduce incident recurrence and systemic risk.
Summary Generated by Built In
Reward Gateway, part of Edenred, is a global leader in benefits and employee engagement. We help businesses attract, engage, and retain top talent through strategic reward, recognition, and well-being solutions.
Guided by our shared missions - ‘Making the World a Better Place to Work’ and ‘Enriching Connections, For Good’ - we’re committed to transforming workplaces and improving people’s daily lives.
Our team embodies entrepreneurial spirit, innovation, and respect. We push boundaries, speak up, and stay human, fostering a culture where imagination thrives.
Your Role in our Mission:
As Problem Manager, you will own and drive the Problem Management function within the Platform Engineering & Technical Operations (PETO) organisation, reporting directly to the Director of Application Operations. You will play a critical role in reducing the frequency and impact of incidents across our platforms by identifying root causes, managing known errors, and delivering preventative actions that lead to measurable, systemic improvements.
Your Role in our Mission:
As Problem Manager, you will own and drive the Problem Management function within the Platform Engineering & Technical Operations (PETO) organisation, reporting directly to the Director of Application Operations. You will play a critical role in reducing the frequency and impact of incidents across our platforms by identifying root causes, managing known errors, and delivering preventative actions that lead to measurable, systemic improvements.
This is a hands-on, high-impact role. You will work at the intersection of operational excellence and engineering quality, partnering closely with L2/L3 support teams, Platform, Infrastructure, and product-aligned Engineering squads to ensure that problems are properly identified, investigated, resolved, and most importantly don't recur.
What’s In It For Me?
What’s In It For Me?
A chance to be part of an extremely well established, stable and high growth ‘Unicorn’ SaaS company with plenty of benefits in our employee benefits package, including:
- Annual Wellness Bonus
- Monthly Edenred Electronic Food Voucher
- Udemy: Access for your professional development
- Flexible Holiday plan & other leave benefits
- Book Benefit: Professional development books and an additional annual budget for fiction books of your choice
- Subsidised sports card and many other benefits!
Flexible Hybrid Working: This is a hybrid role that would require presence in office at least twice per week, as agreed.
What You’ll be Doing:
Problem Management Process Ownership
- Own the end-to-end Problem Management lifecycle in line with ITIL best practice: problem detection, logging, categorisation, prioritisation, investigation, resolution, and closure
- Maintain and govern the Problem Record backlog in Jira Service Management, ensuring all records are accurate, prioritised, and progressing toward resolution
- Define and enforce the standards for problem identification, including criteria for reactive problem management (post-incident) and proactive problem management (trend analysis and risk identification)
- Manage the Known Error Database (KEDB), ensuring it is current, accurate, and actively used by L1/L2 support teams to improve first-contact resolution
Root Cause Analysis (RCA)
- Lead and facilitate structured RCA sessions following major and recurring incidents, using recognised methodologies (e.g. 5 Whys, Fishbone/Ishikawa, fault tree analysis)
- Produce high-quality Problem Records and RCA reports that clearly articulate the root cause, contributing factors, timeline, and recommended corrective/preventative actions
- Ensure RCA outputs translate into tracked, accountable action plans with clear owners, timelines, and success criteria
- Challenge superficial root cause findings and push for systemic, durable fixes rather than symptomatic workarounds
Proactive Problem Management
- Analyse incident, change, and event data to proactively identify trends, recurring issues, and systemic risks before they become major incidents
- Collaborate with Observability and Platform teams to use monitoring signals, error budgets, and SLO breach data as early-warning inputs to the problem management process
- Contribute to the shift-left support agenda by feeding problem findings into runbooks, playbooks, and operability improvements
Stakeholder Engagement & Reporting
- Communicate problem status, known errors, and risk exposure clearly to technical and non-technical stakeholders, including engineering leads and senior management
- Produce regular problem management reporting, including metrics such as: number of open problems by age/severity, incident recurrence rate, time to root cause, and percentage of problems with preventative actions closed on time
- Present insights and trends to the Director of Application Operations and wider PETO leadership to inform prioritisation decisions and continuous improvement initiatives
Collaboration & Integration
- Work closely with Incident Management to ensure seamless handoff from major incidents into the problem management process
- Partner with L2.5/L3 engineering teams to coordinate investigation effort, agree timelines, and remove blockers to root cause resolution
- Integrate problem management activity into the Service Catalogue and Jira Service Management workflows, ensuring service ownership and escalation paths are respected
- Contribute to Change Management processes by ensuring known problems and risks are visible to change approvers, reducing the risk of change-induced incidents
Continuous Improvement
- Continuously assess and improve the Problem Management process itself, maturing capability over time and aligning with evolving ITIL and organisational standards
- Build and maintain problem management documentation, templates, and guidance to enable consistent, high-quality practice across the PETO organisation
- Support the development of L2 team capability in recognising and logging potential problems, contributing to the team's progression toward greater autonomy
Experience and Skills You Need in this Role:
Essential
- Solid, demonstrable experience in an ITIL-aligned Problem Management role, ideally within a fast-paced, product-led technology organisation
- Strong working knowledge of ITIL Problem Management practices (ITIL 4 Foundation certification or above preferred), including the distinction between reactive and proactive problem management and the role of the KEDB
- Hands-on experience facilitating RCA sessions using structured methodologies (5 Whys, Fishbone, fault tree analysis, etc.) and translating findings into actionable improvement plans
- Experience working with Jira Service Management or a comparable ITSM platform to manage problem records, workflows, and reporting
- Ability to analyse incident and operational data to identify trends and systemic issues, with experience using dashboards or reporting tools to communicate findings
- Strong written and verbal communication skills, with the ability to produce clear RCA reports and updates for both technical audiences and senior non-technical stakeholders
- Collaborative working style with experience engaging engineering, infrastructure, and operations teams in problem investigation and resolution
- Familiarity with Agile ways of working and the ability to integrate ITIL practices within a modern, product-centric engineering environment
Desirable
- Experience with observability and monitoring tooling (e.g. Datadog, Grafana, PagerDuty) as inputs to proactive problem management
- Understanding of SLOs, error budgets, and their relationship to operational risk and problem prioritisation
- Experience contributing to or maintaining a knowledge base (e.g. Confluence), including runbooks and known error documentation
- Exposure to cloud-native application architectures and API-first platforms
- ITIL 4 Specialist or Practitioner certification in relevant practices (e.g. Problem Management, Incident Management)
- Experience with operational metrics and reporting frameworks, including DORA metrics or similar
The Interview Process:
- Screening call with Talent Acquisition Partner
- First Stage Interview with the Director of Application Operations & the VP Platform Engineering
At Reward Gateway | Edenred, we are committed to ensuring an inclusive and accessible recruitment process for all candidates. If you have any specific requirements or need reasonable adjustments at any stage of the recruitment journey, please let your Talent Acquisition Partner know. Your needs are important to us, and we want to ensure an equitable experience for every candidate.
Be comfortable. Be you.
We want every employee to feel comfortable bringing their passion, creativity and individuality to work. We value all cultures, backgrounds and experiences, because we believe diversity drives innovation and makes us stronger. Our approach to hiring and building teams is about more than filling roles - it’s about creating an environment where everyone can thrive, feel supported, and contribute to our mission of making the world a better place to work!
Be comfortable. Be you.
We want every employee to feel comfortable bringing their passion, creativity and individuality to work. We value all cultures, backgrounds and experiences, because we believe diversity drives innovation and makes us stronger. Our approach to hiring and building teams is about more than filling roles - it’s about creating an environment where everyone can thrive, feel supported, and contribute to our mission of making the world a better place to work!
About
Reward Gateway is culture and client driven. We’re obsessed with putting the “Human” in HR and are proud to have been 100% dedicated to HR for over a decade. Since 2007, we’ve been right by the side of the world’s most innovative HR people, giving them beautiful products and tools they can use to attract, engage and retain their people.The world’s most successful companies treat their people differently. They generate stock market returns of twice their peers and they have half the employee turnover. 76% of CEOs recognize that employee engagement is vital to their success but only 24% say they have a highly engaged company. Bridging that engagement gap is what drives us.
Skills Required
- Demonstrable experience in an ITIL-aligned Problem Management role
- Strong working knowledge of ITIL Problem Management practices
- ITIL 4 Foundation certification or above
- Hands-on experience facilitating RCA sessions (5 Whys, Fishbone, fault tree analysis)
- Experience using Jira Service Management or comparable ITSM platform
- Ability to analyse incident and operational data and produce dashboards/reports
- Strong written and verbal communication skills for technical and non-technical audiences
- Collaborative working style engaging engineering, infrastructure, and operations teams
- Familiarity with Agile ways of working and integrating ITIL in product-centric environments
- Experience with observability and monitoring tooling (Datadog, Grafana, PagerDuty)
- Understanding of SLOs, error budgets and operational risk/prioritisation
- Experience maintaining knowledge bases and runbooks (Confluence etc.)
- Exposure to cloud-native application architectures and API-first platforms
- ITIL 4 Specialist or Practitioner certification (Problem/Incident Management)
- Experience with operational metrics/reporting frameworks (DORA metrics or similar)
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company









