Director of Incident Response

Posted 2 Days Ago
Be an Early Applicant
Dallas, TX, USA
In-Office
Expert/Leader
Artificial Intelligence • Cloud • Machine Learning • Infrastructure as a Service (IaaS)
The Role
Lead and build a 24/7 incident response function for HPC and multi-tenant cloud environments. Own major incident command, forensic readiness, detection-to-containment runbooks mapped to MITRE ATT&CK, and drive engineering remediation with enforceable SLAs. Instrument metrics (MTTD/MTTA/MTTC/MTTR), operate Jira Service Management and paging integrations, and run tabletop/red-team exercises while partnering with Security and Platform Engineering.
Summary Generated by Built In

The Company 

NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients’ research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation. 

The Position 

The Director of Incident Response owns the full incident lifecycle across NMC²'s HPC and multi-tenant cloud environments, reporting to the CISO. This is a builder role. You will stand up the IR function from the ground up: playbooks, on-call rotations, tooling integration, forensic capability, and the team itself. You will run major incidents in environments where detection-to-containment is measured in minutes, where forensic preservation must survive tenant-specific legal hold requirements, and where every post-incident finding feeds directly into engineering backlogs with enforceable SLAs. 

You will operate as the senior IR authority across Security Engineering, Platform Engineering, Data Center Operations, and customer-facing technical teams. 

Responsibilities: 

  • Build the IR function end to end: staffing model, 24/7 coverage plan, severity matrix, escalation tree, retainer relationships, and tooling stack aligned to NIST SP 800-61r2 phase structure 

  • Own major incident command for all Sev-0 and Sev-1 events, security and operational, including customer-facing communications and regulatory notification decisions 

  • Develop detection-to-containment runbooks mapped to MITRE ATT&CK techniques relevant to HPC and cloud tenancy threats: credential abuse (T1078), lateral movement via Kubernetes and scheduler primitives (T1610, T1613), data exfiltration over research network egress (T1041, T1567), and supply chain compromise in scientific software pipelines (T1195) 

  • Establish forensic readiness across bare-metal HPC nodes, Kubernetes workloads, and hypervisor layers: memory capture, disk imaging, container runtime evidence, and audit log chain-of-custody standards 

  • Drive root cause analysis to engineering remediation with measurable closeout SLAs, not written reports that sit on a Confluence page 

  • Build and maintain the Known Error Database, runbook library, and tabletop exercise program with scheduled red team, customer-triggered, and infrastructure failure scenarios 

  • Instrument the IR function with hard metrics: MTTD, MTTA, MTTC, MTTR by severity and incident class, recurrence rate, playbook coverage percentage, and on-call load distribution 

  • Operate Jira Service Management as the authoritative incident system of record, with defined integrations to detection tooling, paging (PagerDuty or equivalent), and engineering backlog systems 

  • Partner with Security Engineering on detection engineering feedback loops: every incident either validates an existing detection, triggers a new one, or exposes a detection gap that becomes a tracked engineering item 

  • Own executive and board-level incident reporting, including quarterly trend analysis, regulatory and contractual incident disclosures, and customer trust reporting for enterprise accounts 

  • Co-own business continuity and disaster recovery testing with Platform and DC Operations, ensuring IR plans integrate cleanly with BCP/DR runbooks 

Requirements: 

  • 10+ years in security operations or incident response, with at least 5 years running major incident response in high-availability, multi-tenant, or mission-critical infrastructure environments 

  • 5+ years leading IR or SOC teams, including direct accountability for hiring, performance management, and 24/7 operational coverage 

  • Demonstrated incident command experience on Sev-0 events with customer, regulatory, or board-level exposure 

  • Deep technical fluency in at least two of: HPC environments (Slurm, InfiniBand, GPU clusters), Kubernetes and container security, hypervisor and bare-metal forensics, or public cloud incident response (AWS, Azure, GCP) 

  • Working command of NIST SP 800-61r2, MITRE ATT&CK, and CIS Controls v8 Incident Response domain (Controls 17.1 through 17.9) 

  • Hands-on experience with Jira Service Management, PagerDuty or equivalent, and at least one enterprise SIEM or XDR platform in a production IR context 

  • Experience building detection-to-response feedback loops with a detection engineering or SOC counterpart, not operating IR as a downstream consumer of alerts 

  • Track record of RCA work that produced engineering remediation with measurable defect reduction, not documentation for its own sake 

  • Comfort operating in a pre-scale organization where tooling, process, and team do not yet exist and must be designed before they can be run 

 

Preferred: 

  • GCIH, GCFA, GCFR, or equivalent hands-on IR certification 

  • ITIL 4 Foundation or Practitioner certification 

  • Experience with regulated or contractually constrained environments: financial services customers, export-controlled workloads, or sovereign cloud requirements 

  • Prior experience during a CSP independence or infrastructure repatriation program 

It is impossible to list every requirement for, or responsibility of, any position.  Similarly, we cannot identify all the skills a position may require since job responsibilities and the Company’s needs may change over time.  Therefore, the above job description is not comprehensive or exhaustive.  The Company reserves the right to adjust, add to or eliminate any aspect of the above description.  The Company also retains the right to require all employees to undertake additional or different job responsibilities when necessary to meet business needs.

Must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future.

Benefits & Perks:

  • Company-Paid Lunch Stipend: Lunch is provided via GrubHub

  • Company-Paid Benefits: 100% Employer-Paid Medical in our High Deductible Health Plan, Dental and Vision benefits for employees and their families, 16 weeks of Paid Parental Leave, Employee Assistance Program, Life insurance, Short-Term Disability and Long-Term Disability

  • 401(k): Company will match 100% of your contributions up to 6%

  • Optional Employee-Paid Benefits: Medical insurance in our PPO plan and a variety of other benefits such as Health Savings Accounts (with Company Contribution!), Flexible Spending Accounts, Supplemental Life Insurance, Wellhub and more.

  • Time Off:  25 days of Paid Time Off plus 12 company holidays

EQUAL OPPORTUNITY EMPLOYER

NORTHMARK STRATEGIES LLC IS AN EQUAL EMPLOYMENT OPPORTUNITY EMPLOYER. THE COMPANY'S POLICY IS NOT TO DISCRIMINATE AGAINST ANY APPLICANT OR EMPLOYEE BASED ON RACE, COLOR, RELIGION, NATIONAL ORIGIN, GENDER, AGE, SEXUAL ORIENTATION, GENDER IDENTITY OR EXPRESSION, MARITAL STATUS, MENTAL OR PHYSICAL DISABILITY, AND GENETIC INFORMATION, OR ANY OTHER BASIS PROTECTED BY APPLICABLE LAW. THE FIRM ALSO PROHIBITS HARASSMENT OF APPLICANTS OR EMPLOYEES BASED ON ANY OF THESE PROTECTED CATEGORIES.

Skills Required

  • 10+ years in security operations or incident response, with at least 5 years running major incident response in high-availability, multi-tenant, or mission-critical infrastructure environments
  • 5+ years leading IR or SOC teams, including hiring, performance management, and 24/7 operational coverage
  • Demonstrated incident command experience on Sev-0 events with customer, regulatory, or board-level exposure
  • Deep technical fluency in at least two of: HPC environments (Slurm, InfiniBand, GPU clusters), Kubernetes and container security, hypervisor and bare-metal forensics, or public cloud incident response (AWS, Azure, GCP)
  • Working command of NIST SP 800-61r2, MITRE ATT&CK, and CIS Controls v8 Incident Response domain
  • Hands-on experience with Jira Service Management, PagerDuty or equivalent, and at least one enterprise SIEM or XDR platform in a production IR context
  • Experience building detection-to-response feedback loops with a detection engineering or SOC counterpart
  • Track record of root cause analysis that produced engineering remediation with measurable defect reduction
  • Comfort operating in a pre-scale organization where tooling, process, and team must be designed and built
  • Must be legally authorized to work in the United States without employer sponsorship now or in the future
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
157 Employees

What We Do

NorthMark Strategies is a strategic capital firm that combines investment capital with engineering and technology to build enduring businesses. The firm operates a High-Performance Computing platform and supports simulation, AI/ML-enabled engineering and data-driven design to accelerate portfolio companies. NorthMark deploys capital, operates complex businesses, and builds infrastructure (including compute and cloud services) to drive long‑term innovation and operational outcomes.

Similar Jobs

Atlassian Logo Atlassian

Team Lead

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
In-Office or Remote
Austin, TX, USA
11000 Employees
113K-176K Annually
Easy Apply
Remote or Hybrid
2 Locations
180 Employees
110K-160K Annually

Eve Logo Eve

Software Engineer

Legal Tech • Software • Generative AI
Easy Apply
Remote or Hybrid
United States
180 Employees
250K-300K Annually

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sales Support Associate III

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
Austin, TX, USA
16000 Employees
15-20 Hourly

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account