Workplace Platforms - Site Reliability Engineer (SRE) Lead - Dallas

Posted 2 Days Ago
Be an Early Applicant
Dallas, TX, USA
In-Office
Senior level
Fintech • Financial Services
The Role
Lead reliability engineering for endpoint compute platforms (physical, virtual, cloud desktops) and supporting services. Define SLOs/SLIs, observability, failure models, runbooks, and automation. Drive incident response, post-incident remediation, and resilience improvements. Partner with security, identity, and platform teams to align risk and governance. Mentor engineers and communicate reliability posture to leadership, improving operability and reducing incident frequency and impact.
Summary Generated by Built In

Team Overview

The Workplace Engineering organization is responsible for the reliability, resilience, and operational integrity of the firm’s endpoint compute platforms and services, including:

  • Corporate‑owned physical devices
  • Virtual and cloud‑hosted desktops
  • Core endpoint services such as device lifecycle management, access and identity integration, profile and session services, and application delivery frameworks

The Endpoint Compute SRE function applies Site Reliability Engineering (SRE) principles to ensure these platforms and services are highly available, observable, scalable, and recoverable, while meeting operational and regulatory expectations.

Role Summary

We are seeking an Endpoint Compute SRE Lead to own reliability engineering and operational excellence across endpoint compute platforms and their foundational services.

This role is focused on systems and services, not applications, and covers the reliability of:

  • Endpoint compute platforms (physical, virtual, cloud desktops)
  • Device and desktop lifecycle services
  • Access and sign‑in dependency platforms
  • Profile, policy, and session services
  • Application delivery and execution frameworks (packaging, deployment, availability—not app functionality)

The successful candidate will define service-level objectives, observability strategies, failure models, and operational practices that ensure a predictable and resilient end‑user compute experience at enterprise scale.

Job Responsibilities 

Reliability Engineering Across Endpoint Services

  • Own end-to-end reliability of endpoint compute platforms and supporting services
  • Define service boundaries, dependencies, and critical paths from user sign‑in through productive desktop use
  • Model failure modes and blast radius across lifecycle, access, and delivery services
  • Drive designs that support graceful degradation and fast recovery

Observability & Telemetry

  • Establish observability standards across endpoint compute services, including:
    • Enrollment and provisioning success rates
    • Access and session establishment health
    • Policy and profile delivery latency/failures
    • Application delivery availability
  • Ensure telemetry enables:
    • Fast incident detection
    • Root cause analysis
    • Proactive trend identification

SLOs, SLIs & Error Budgets

  • Define SLOs and SLIs for key endpoint services (e.g., sign‑in success, provisioning time, policy convergence)
  • Implement error budget frameworks to guide change, security control rollout, and platform evolution
  • Use reliability signals to influence platform design and operational priorities

Incident, Problem & Resilience Management

  • Lead reliability aspects of incident response involving endpoint compute or services
  • Drive post‑incident reviews focused on systemic corrections
  • Identify recurring failure patterns in:
    • Lifecycle flows
    • Access paths
    • Policy or profile delivery
  • Sponsor and track permanent fixes, not workarounds

Operational Excellence & Automation

  • Define and maintain runbooks, playbooks, and escalation models for endpoint services
  • Drive automation to reduce:
    • Manual remediation
    • Repeat incidents
    • Operational toil
  • Influence engineering designs to improve operability and debuggability

Risk & Governance Alignment

  • Partner with Technology Risk and Security teams to:
    • Demonstrate reliability and recoverability controls
    • Support operational risk and resilience assessments
    • Provide audit‑ready evidence for availability and incident management
  • Ensure reliability metrics support control effectiveness narratives

Leadership & Collaboration

  • Act as the reliability authority for endpoint compute and services
  • Partner closely with:
    • Endpoint platform engineers
    • Device management teams
    • Security engineering and identity teams
  • Mentor engineers in applying SRE principles to workplace platforms
  • Communicate reliability posture clearly to leadership

Basic Qualifications

  • 8+ years in SRE, platform operations, reliability engineering, or workplace infrastructure roles
  • Strong experience operating endpoint compute platforms and core supporting services at enterprise scale
  • Proven ability to define and implement:
    • Observability frameworks
    • SLOs / SLIs
    • Incident and problem management models
  • Strong systems thinking across lifecycle, access, and service dependencies
  • Excellent documentation and communication skills

Preferred Qualifications

  • Experience applying SRE concepts to end‑user computing or digital workplace platforms
  • Deep understanding of:
    • Device lifecycle and provisioning services
    • Identity and access dependencies (availability-focused)
    • Profile, policy, and session orchestration
  • Experience in regulated or high‑assurance environments
  • Strong ability to influence architecture using data‑driven reliability insights

What Success Looks Like

  • Endpoint compute and services have clear reliability targets
  • Lifecycle, access, and delivery failures are predictable, observable, and fast to remediate
  • Incidents are less frequent, shorter, and less impactful
  • Platforms are designed with operability and resilience built in
  • Leadership has confidence in desktop stability as a service

Skills Required

  • 8+ years in SRE, platform operations, reliability engineering, or workplace infrastructure roles
  • Strong experience operating endpoint compute platforms and core supporting services at enterprise scale
  • Proven ability to define and implement observability frameworks
  • Proven ability to define and implement SLOs / SLIs and error budget frameworks
  • Proven ability to define and implement incident and problem management models
  • Strong systems thinking across lifecycle, access, and service dependencies
  • Excellent documentation and communication skills
  • Experience applying SRE concepts to end-user computing or digital workplace platforms
  • Deep understanding of device lifecycle and provisioning services
  • Understanding of identity and access availability dependencies
  • Experience with profile, policy, and session orchestration
  • Experience in regulated or high-assurance environments
  • Ability to influence architecture using data-driven reliability insights

Goldman Sachs Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Goldman Sachs and has not been reviewed or approved by Goldman Sachs.

  • Healthcare Strength Coverage includes medical, dental, vision, disability, life and accident insurance, with multiple plan options and most premiums subsidized; coverage often starts on day one. Wellness resources, on-site health centers in some locations, and EAP access reinforce the depth of health support.
  • Parental & Family Support Family care includes on-site childcare in some offices, expectant parent resources, and transitional programs for returning parents. Feedback suggests parental leave is very generous, with reports of around 20 weeks paid leave and stipends for adoption, surrogacy, and fertility-related services.
  • Retirement Support The firm provides a 401(k) plan with employer matching contributions and broad financial education to help employees plan for retirement. Resources also support saving for education and preparing for unexpected events.

Goldman Sachs Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
67,118 Employees

What We Do

At Goldman Sachs, we believe progress is everyone’s business. That’s why we commit our people, capital and ideas to help our clients, shareholders and the communities we serve to grow. Founded in 1869, Goldman Sachs is a leading global investment banking, securities and investment management firm. Headquartered in New York, we maintain offices in all major financial centers around the world. More about our company can be found at www.goldmansachs.com

Similar Jobs

Caterpillar Logo Caterpillar

Lead Simulation Engineer (Digital Twin)

Artificial Intelligence • Cloud • Internet of Things • Software • Cybersecurity • Industrial
Hybrid
Irving, TX, USA
100000 Employees
128K-209K Annually

Caterpillar Logo Caterpillar

Senior Manager Software Engineering

Artificial Intelligence • Cloud • Internet of Things • Software • Cybersecurity • Industrial
Hybrid
Irving, TX, USA
100000 Employees
159K-259K Annually

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Assistant Store Manager I

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
San Marcos, TX, USA
16000 Employees
19-32 Hourly

PwC Logo PwC

Front Office Strategy Consulting - PLS Customer Analytics - Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
9 Locations
370000 Employees
99K-232K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account