Incident and Problem Manager

Posted Yesterday
Be an Early Applicant
Dallas, TX, USA
In-Office
Senior level
Artificial Intelligence • Cloud • Machine Learning • Infrastructure as a Service (IaaS)
The Role
Lead and operate incident and problem management practices: own major incident response, coordinate cross-functional teams, perform root cause analysis, maintain known error records, analyze trends and KPIs (MTTR, SLA), and drive long-term remediation and continuous improvement using Jira Service Management.
Summary Generated by Built In

The Company

NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.
 

The Position

The Incident & Problem Manager is accountable for establishing and operating the Incident Management and Problem Management practices within NMC², ensuring that service disruptions are resolved quickly, root causes are identified and eliminated, and lessons learned drive continuous improvement across the ITSM ecosystem. This combined role owns the full lifecycle of reactive and proactive service restoration; from initial detection and triage through resolution, root cause analysis, and known error documentation, ensuring minimal business impact and sustained service reliability.

The ITSM team is responsible for ensuring the reliability and stability of services across NMC²’s infrastructure and operations. The Incident & Problem Manager owns the end-to-end lifecycle of service disruptions, ensuring rapid restoration, effective escalation, and long-term resolution of underlying issues.

Working alongside Service Desk, Engineering, Data Center Operations, and vendors, you will lead major incident response, drive root cause analysis, and implement continuous improvement across the ITSM ecosystem. This role plays a critical part in maintaining service availability and improving operational maturity at scale.

Responsibilities:

  • Own and manage the end-to-end major incident process, acting as the primary escalation point for high-severity incidents
  • Lead incident response efforts, coordinating cross-functional teams to restore service as quickly as possible
  • Define and improve incident and problem management processes, ensuring consistent execution and high-quality data in Jira Service Management
  • Drive root cause analysis and problem management activities, ensuring recurring issues are identified and permanently resolved
  • Maintain and leverage a Known Error Database to document workarounds and solutions
  • Analyze incident trends and performance metrics to identify systemic issues and improvement opportunities
  • Partner with engineering, service owners, and change management to implement fixes and prevent recurrence
  • Produce regular reporting on KPIs such as MTTR, SLA performance, and incident trends
     

Requirements:

  • Bachelor’s Degree or equivalent experience
  • 5+ years of experience in IT Service Management, with ownership of Incident and/or Problem Management
  • Proven experience managing major incidents in high-availability or mission-critical environments
  • Hands-on experience with Jira Service Management or similar ITSM tooling
  • Strong understanding of incident lifecycle management, escalation, and service restoration
  • Experience conducting root cause analysis and driving long-term remediation
  • Strong analytical and problem-solving skills, with the ability to identify trends in operational data
  • Excellent communication skills with the ability to coordinate across technical and non-technical teams
  • ITIL certification or equivalent experience preferred

It is impossible to list every requirement for, or responsibility of, any position.  Similarly, we cannot identify all the skills a position may require since job responsibilities and the Company’s needs may change over time.  Therefore, the above job description is not comprehensive or exhaustive.  The Company reserves the right to adjust, add to or eliminate any aspect of the above description.  The Company also retains the right to require all employees to undertake additional or different job responsibilities when necessary to meet business needs.

Must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future.

Benefits & Perks:

  • Company-Paid Lunch Stipend: Lunch is provided via GrubHub

  • Company-Paid Benefits: 100% Employer-Paid Medical in our High Deductible Health Plan, Dental and Vision benefits for employees and their families, 16 weeks of Paid Parental Leave, Employee Assistance Program, Life insurance, Short-Term Disability and Long-Term Disability

  • 401(k): Company will match 100% of your contributions up to 6%

  • Optional Employee-Paid Benefits: Medical insurance in our PPO plan and a variety of other benefits such as Health Savings Accounts (with Company Contribution!), Flexible Spending Accounts, Supplemental Life Insurance, Wellhub and more.

  • Time Off:  25 days of Paid Time Off plus 12 company holidays

EQUAL OPPORTUNITY EMPLOYER

NORTHMARK STRATEGIES LLC IS AN EQUAL EMPLOYMENT OPPORTUNITY EMPLOYER. THE COMPANY'S POLICY IS NOT TO DISCRIMINATE AGAINST ANY APPLICANT OR EMPLOYEE BASED ON RACE, COLOR, RELIGION, NATIONAL ORIGIN, GENDER, AGE, SEXUAL ORIENTATION, GENDER IDENTITY OR EXPRESSION, MARITAL STATUS, MENTAL OR PHYSICAL DISABILITY, AND GENETIC INFORMATION, OR ANY OTHER BASIS PROTECTED BY APPLICABLE LAW. THE FIRM ALSO PROHIBITS HARASSMENT OF APPLICANTS OR EMPLOYEES BASED ON ANY OF THESE PROTECTED CATEGORIES.

Skills Required

  • Bachelor's degree or equivalent experience
  • 5+ years of experience in IT Service Management with ownership of Incident and/or Problem Management
  • Proven experience managing major incidents in high-availability or mission-critical environments
  • Hands-on experience with Jira Service Management or similar ITSM tooling
  • Strong understanding of incident lifecycle management, escalation, and service restoration
  • Experience conducting root cause analysis and driving long-term remediation
  • Ability to analyze incident trends and operational data
  • Excellent communication and cross-functional coordination skills
  • ITIL certification or equivalent experience
  • Legally authorized to work in the United States without employer sponsorship
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
157 Employees

What We Do

NorthMark Strategies is a strategic capital firm that combines investment capital with engineering and technology to build enduring businesses. The firm operates a High-Performance Computing platform and supports simulation, AI/ML-enabled engineering and data-driven design to accelerate portfolio companies. NorthMark deploys capital, operates complex businesses, and builds infrastructure (including compute and cloud services) to drive long‑term innovation and operational outcomes.

Similar Jobs

Chewy Logo Chewy

Licensed Healthcare Agent

eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Hybrid
Richardson, TX, USA
17800 Employees
21-21 Hourly

Chewy Logo Chewy

Customer Care Representative- Work from Home in Texas

eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Hybrid
Richardson, TX, USA
17800 Employees
16-16 Hourly

Wise Logo Wise

Financial Customer Service Specialist (Requires FINRA Series 7)

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
Austin, TX, USA
9000 Employees
30-33 Hourly

Chewy Logo Chewy

Client Concierge

eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Hybrid
City of Fort Worth, TX, USA
17800 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account