Sr Site Reliability Engineer - Azure

Posted 2 Days Ago
Be an Early Applicant
Dallas, TX
In-Office
Senior level
Healthtech • Travel
The Role
The Senior Site Reliability Engineer leads reliability engineering for Azure, focusing on scripting, automation, observability, and incident response, ensuring service quality and uptime.
Summary Generated by Built In

Job Description:

Position Overview

The primary responsibility of the Senior Site Reliability Engineer (SRE) to lead reliability engineering initiatives across our Azure estate and Command Center operations. This role focuses on scripting, automation, and observability to ensure uptime, performance, and rapid incident response. The Senior SRE will design and implement monitoring-as-code, optimize alerting, and build self-healing automation that reduces toil and accelerates recovery.

As part of our journey from traditional operations toward a mature SRE model, the Senior SRE will partner with product engineering, platform teams, and the Command Center including Service Desk and Major Incident Command (MIC) to deliver measurable improvements in service reliability.

All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.’s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the company’s standards, work requirements and rules of conduct.

Essential Duties & Responsibilities

  • Observability & Monitoring

    • Architect end-to-end monitoring using Azure Monitor, Log Analytics, Application Insights, and ITRS Geneos.

    • Implement monitoring-as-code with Terraform/Bicep, including alerts, dashboards, and diagnostic settings.

    • Create actionable dashboards (Azure Workbooks, Grafana) for SLIs/SLOs and real-time service health.

  • Alerting & Incident Response

    • Design alert taxonomies with severity mapping (P0–P4), dynamic thresholds, and escalation policies.

    • Reduce alert noise and ensure 100% alert-to-runbook mapping.

    • Support Major Incident Command (MIC) during P0/P1 bridges with technical expertise and rapid remediation.

  • Automation & Tooling

    • Build automation using PowerShell, Python, and Azure Functions for alert lifecycle, runbooks, and self-healing workflows.

    • Integrate with ITSM (ServiceNow/Jira) for automated ticket enrichment and routing.

    • Eliminate repetitive operational tasks and reduce toil through automation-first practices.

  • Reliability Engineering

    • Define and enforce SLIs/SLOs, error budgets, and resilience patterns (bulkheads, retries, timeouts).

    • Conduct production readiness reviews, chaos drills, and failover rehearsals.

    • Partner with app teams to embed instrumentation and structured logging.

  • Governance & Compliance

    • Enforce desired state with Azure Policy, DSC/Guest Configuration, and drift detection.

    • Harden networking (VNet, NSGs, Private Link, Firewall), identity (Entra ID), and secrets (Key Vault).

    • Ensure auditability and compliance across environments.

  • Perform job duties in a safe manner.

  • Attend work as scheduled on a consistent and regular basis.

  • Perform other related duties as assigned.

Minimum Qualifications

  • At least 21 years of age.

  • Proof of authorization to work in the United States.

  • Bachelor’s degree in Computer Science or IT field, or equivalent experience.

  • Must be able to obtain and maintain any certification or license, as required by law or policy. 

  • 7+ years of experience in SRE/DevOps/Platform roles, with 4+ years focused on Azure in production at scale.

  • Expert knowledge in Infrastructure as Code (Terraform or Bicep) and Git-based workflows (GitHub Actions/Azure DevOps).

  • Proficiency in CI/CD, deployment strategies (canary, blue-green), and automated rollbacks.

  • Proficiency in PowerShell and Python for automation; experience building reusable modules.

  • Demonstrated experience with AKS, App Services, Functions, VM Scale Sets, and Azure networking/security.

  • Deep knowledge of:  

    • Azure: AKS, App Services, Functions, VMSS, Storage, Front Door, API Management, Load Balancers, Monitor, Log Analytics, App Insights, Key Vault, Policy, Defender

    • Automation & IaC: Terraform/Bicep, PowerShell, Python, GitHub Actions/Azure DevOps

    • Observability: Azure Monitor, Log Analytics, App Insights, Prometheus/OpenTelemetry; experience with ITRS Geneos.

    • Service Management: ServiceNow, Jira

  • Proficiency in SRE fundamentals: SLIs/SLOs, error budgets, capacity planning, chaos testing, and toil reduction.

  • Demonstrated experience leading incidents and collaborating across teams.

  • Strong interpersonal skills with the ability to communicate effectively and interact appropriately with management, other Team Members and outside contacts of different backgrounds and levels of experience.

  • Must be available to work varied shifts including nights, weekends, and holidays, to ensure 24/7 coverage.

  • Provide off-hours support on an infrequent, but as needed basis during critical incidents. (Potential shifts may run 24/7 due to the need of the business.) 

  • Team Members are required to be on site within the IT Command Center. 

Preferred Qualifications

  • Certifications & Training

    • AZ-400: Azure DevOps Engineer Expert

    • AZ-305: Azure Solutions Architect Expert or AZ-104: Azure Administrator

    • AZ-500: Azure Security Engineer Associate

    • ITIL v4 for operational rigor

    • SRE Foundation/Practitioner Certification (DevOps Institute or equivalent)

Physical Requirements

Must be able to:

  • Lift or carry 50 pounds, unassisted, in the performance of specific tasks, as assigned.

  • Physically access assigned workspace areas with or without reasonable accommodation.

  • Work indoors and be exposed to various environmental factors such as, but not limited to, CRT, noise, and dust.

  • Utilize laptop and standard keyboard to perform essential functions of the job.

Top Skills

Aks
App Services
Application Insights
Azure
Azure Devops
Azure Monitor
Bicep
Functions
Github Actions
Grafana
Itrs Geneos
JIRA
Log Analytics
Powershell
Python
Servicenow
Terraform
Vm Scale Sets
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
947 Employees

What We Do

Founded in 1990, Las Vegas Sands is the preeminent developer and operator of world-class integrated resorts that drive valuable business and leisure tourism in the regions where we operate. Featuring an array of richly diverse and compelling offerings under one roof, our integrated resorts blend luxury hotels and state-of-the-art meeting and convention facilities with a variety of amenities such as gaming, celebrity chef restaurants, high-end shopping and an action-packed schedule of concerts, shows, exhibits and other attractions.

Sands has a 30-year track record of successfully developing and operating some of the largest and most complex business and leisure properties in the world, generating significant economic benefits for our host regions and enhancing their stature as global tourism and business capitals. Our integrated resorts propel continuous positive impact through tourism, jobs and community investments that make our regions great places to live, work and visit.

Sands is dedicated to being a good corporate citizen, anchored by the core tenets of serving people, planet and communities. We deliver a great working environment for our team members worldwide, drive social impact through the Sands Cares community engagement and charitable giving program and lead in environmental performance through the award-winning Sands ECO360 global sustainability program.

Sands is not just a developer. We are developers of positive impact.

Similar Jobs

Atlassian Logo Atlassian

Principal Software Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
In-Office or Remote
Austin, TX, USA
11000 Employees
199K-313K Annually

Optimum Logo Optimum

Manager, Endpoint Engineering

AdTech • Digital Media • Internet of Things • Marketing Tech • Mobile • Retail • Software
Hybrid
3 Locations
9000 Employees
123K-203K Annually

Boeing Logo Boeing

Equipment & Tool Engineer (Asset Engineering)

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
San Antonio, TX, USA
170000 Employees
85K-140K Annually

Boeing Logo Boeing

Systems Engineer

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Richardson, TX, USA
170000 Employees
120K-162K Annually

Similar Companies Hiring

Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account