Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Chamberlain Group

Staff Site Reliability Engineer

5 Hours AgoSaved

Hybrid

Oak Brook, IL, USA

103K-193K Annually

Senior level

103K-193K Annually

Senior level

Automotive • Hardware • Internet of Things • Mobile • Software • App development • PropTech

Design, implement, and optimize global cloud infrastructure and platforms for an IoT service. Lead platform improvement initiatives, automate infrastructure (IaC/GitOps), ensure observability and security, troubleshoot incidents, mentor SRE team members, and collaborate with executives, architects, and security stakeholders to execute the infrastructure roadmap.

Top Skills: Active DirectoryArgocdAWSBashDatadogEdge FirewallsGitopsGoGrafanaIacKubernetesLinuxNew RelicPowershellPrometheusPythonSIEMTerraformVpcWindows

TransUnion

Staff Site Reliability Engineer

Reposted 20 Hours AgoSaved

Hybrid

6 Locations

113K-188K Annually

Senior level

113K-188K Annually

Senior level

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics

The Staff Site Reliability Engineer will lead reliability strategies, manage high-risk initiatives, and enhance engineering standards while ensuring system reliability and operational excellence within a hybrid work environment.

Top Skills: BashCi/CdDatabase ArchitectureGoGoogle Cloud PlatformInfrastructure-As-CodeKubernetesMonitoring PlatformsPulumiPythonTerraform

Tapestry - Coach and Kate Spade

Lead Site Reliability Engineer

Reposted 20 Hours AgoSaved

Hybrid

North Bergen, NJ, USA

103K-185K Annually

Senior level

103K-185K Annually

Senior level

eCommerce • Fashion • Retail • Sales • Wearables • Design

The Lead Site Reliability Engineer is responsible for ensuring system reliability, uptime, and performance across Tapestry brands. They will develop tools and automation, oversee monitoring solutions, and implement SRE practices to promote reliability. Additionally, the role involves production support and collaboration across engineering teams to manage the Salesforce Commerce Cloud platform effectively.

Top Skills: AppdynamicsAWSAzureBlue TriangleConfluenceGCPJavaJIRANode.jsPythonQuantum MetricSplunk

General Motors

Staff Engineer, Staff Engineer, Hybrid Services & Reliability (SRE)

Reposted 20 Hours AgoSaved

Hybrid

2 Locations

184K-275K Annually

Senior level

184K-275K Annually

Senior level

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing

The Staff Engineer will define reliability architecture, automate foundational utilities, develop observability tools, ensure environment integrity, and mentor colleagues.

Top Skills: AnsibleChefDhcpKubernetesLinuxNtpPxe

Deepgram

Site Reliability Engineer - AI & ML Infrastructure (Kubernetes, AWS & Terraform)

Reposted 20 Hours AgoSaved

Remote

USA

150K-220K Annually

Senior level

150K-220K Annually

Senior level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI

The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.

Top Skills: AWSBashGoKubernetesPythonSlurmTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 20 Hours AgoSaved

Easy Apply

Remote or Hybrid

10 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

GRAIL

Staff Site Reliability Engineer (SRE) | Dev Ops Engineer #4770

Reposted 20 Hours AgoSaved

Hybrid

Menlo Park, CA, USA

169K-224K Annually

Senior level

169K-224K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

Lead the design and operation of a fault-tolerant cloud infrastructure, implement infrastructure-as-code, manage Kubernetes reliability, and mentor engineers.

Top Skills: AnsibleAWSAzureBashCloudFormationDatadogGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpentelemetryPowershellPrometheusPythonTerraform

Tulip

Senior Site Reliability Engineer (SRE)

Reposted YesterdaySaved

Easy Apply

Hybrid

Somerville, MA, USA

Easy Apply

150K-185K Annually

Senior level

150K-185K Annually

Senior level

Enterprise Web • Hardware • Internet of Things • Software

The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.

Top Skills: GoKubernetesLgtm StackOpentelemetryPrometheusTypescript

BAE Systems, Inc.

Site Reliability Engineer

Reposted YesterdaySaved

Hybrid

Fort Meade, MD, USA

150K-254K Annually

Expert/Leader

150K-254K Annually

Expert/Leader

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense

Design, build, and maintain scalable, reliable infrastructure by applying software engineering to operations. Improve automation, observability, CI/CD, and security hardening; troubleshoot Linux/Unix systems and networking; reduce operational toil while supporting mission-critical defense systems requiring TS/SCI clearance and polygraph.

Top Skills: DnsGitGithub ActionsGitlab CiGoJavaJenkinsLinuxPythonTcp/IpUnix

Kalshi

Site Reliability Engineer

Reposted 2 Days AgoSaved

In-Office

New York, NY, USA

100K-250K Annually

Mid level

100K-250K Annually

Mid level

Fintech • Payments • Financial Services

Design, build, and maintain highly reliable, observable production services. Automate operations, performance-tune cloud deployments, own incident response/on-call, mentor engineers, and improve system reliability and scalability.

Top Skills: AWSAzureDatadogDockerEc2GCPGoKubernetesRustTerraform

Tempus AI

Site Reliability Engineer

2 Days AgoSaved

Hybrid

Chicago, IL, USA

85K-130K Annually

Mid level

85K-130K Annually

Mid level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI

Join the SRE team to design, deploy, and operate resilient cloud infrastructure. Recommend solutions, automate workflows, configure Terraform and CI, implement monitoring and alerts, and support developers and users.

Top Skills: AnsibleAurora MysqlAWSAzureBashChefCloudFormationComposerConcourseDataprocDockerGCPGoHipaaHitrustIsoKubernetesPackerPostgresPuppetPythonRubySaltSlackTerraform

Legora

Staff Site Reliability Engineer

Reposted 2 Days AgoSaved

In-Office

New York City, NY, USA

332K-449K Annually

Senior level

332K-449K Annually

Senior level

Artificial Intelligence • Legal Tech • Software

The Staff Site Reliability Engineer will architect reliability strategies, manage observability, and drive operational excellence across teams, particularly in large-scale production systems.

Top Skills: Cloud InfrastructureDistributed Systems

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

New York Life Insurance Company

Senior Associate - Site Reliability Engineer

Reposted 2 Days AgoSaved

Hybrid

Lebanon, NJ, USA

100K-143K Annually

Senior level

100K-143K Annually

Senior level

Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics

Design, automate, and maintain scalable, secure AWS infrastructure and CI/CD pipelines. Lead observability, incident response, and reliability improvements while modernizing mainframe workloads to AWS and collaborating with engineering, security, and data teams.

Top Skills: Ai ToolsAWSCi/CdCloudFormationData AnalyticsDatabasesEvent-Driven ArchitectureMainframe DevelopmentMessaging ServicesObservability And Alerting ToolsTerraform

Domino Data Lab

Staff Site Reliability Engineer

Reposted 2 Days AgoSaved

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

Dropbox

Staff Site Reliability Engineer, Production Engineering

Reposted 2 Days AgoSaved

Remote

United States

223K-302K Annually

Expert/Leader

223K-302K Annually

Expert/Leader

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy

The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.

Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos

Optum

Lead Site Reliability Engineer - Richardson, TX - 2373627

3 Days AgoSaved

In-Office

Richardson, TX, USA

157K-210K Annually

Senior level

157K-210K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Lead SRE responsible for reliability, security, and efficiency of cloud environments for an Enterprise Imaging SaaS platform. Design, develop, automate, and operate cloud infrastructure (GCP/AWS/Azure), implement CI/CD and IaC (Terraform), manage Kubernetes, perform 24x7 on-call support, and automate IAM, monitoring, and security reporting.

Top Skills: AnsibleAWSAzureBashCi/CdCloud LoggingGCPGcp Pub/SubGitGithub ActionsGitlabJenkinsKubernetesNode.jsPingonePowershellPythonSplunkTerraform

Optum

Lead Site Reliability Engineer - 2373616

3 Days AgoSaved

In-Office

Eden Prairie, MN, USA

131K-195K Annually

Mid level

131K-195K Annually

Mid level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Design, develop, and maintain proprietary Optum software and systems with emphasis on Salesforce (Aura, Lightning, Apex). Ensure security, compliance, performance, scalability, and accessibility. Implement data models, custom objects, triggers, batch automation, and REST integrations. Use GIT workflows, troubleshoot incidents, support post-deployment, and collaborate with users and QA to meet SLAs and project protocols.

Top Skills: AlmApexApex RestAura ComponentsGitJIRARest ApisSalesforceSalesforce Lightning ExperienceServicenow

DraftKings

Principal Site Reliability Engineer

3 Days AgoSaved

Remote or Hybrid

United States

200K-250K Annually

Senior level

200K-250K Annually

Senior level

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Lead long-term strategy and architecture for cloud and on‑prem platform infrastructure, driving Kubernetes and multi‑cloud reliability, IaC/GitOps automation, observability, SLO/SLI/error‑budget practices, incident leadership, AI‑augmented tooling adoption, and mentorship of senior engineers to improve platform resilience and developer experience.

Top Skills: Amazon Elastic Kubernetes Service (Eks)AutoscalingAWSCapacity PlanningCi/CdGitopsGoGoogle Cloud PlatformGoogle Kubernetes Engine (Gke)Identity And Access ManagementInfrastructure As CodeKubernetesLinuxNetworkingObservabilityOperatorsPulumiPythonRke2StorageTerraform

Datadog

Senior Software Engineer - Bits AI SRE

Reposted 3 Days AgoSaved

Easy Apply

Hybrid

New York, NY, USA

Easy Apply

187K-240K Annually

Senior level

187K-240K Annually

Senior level

Artificial Intelligence • Cloud • Security • Software • Cybersecurity

The role involves developing AI-assisted product experiences for Datadog by building systems for chat, remediations, and codefixes, alongside collaboration with cross-functional teams to enhance user outcomes.

Top Skills: Ai Coding ToolsGoKubernetesLlm-Based Systems

Citadel Securities

Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office

6 Locations

125K-350K Annually

Mid level

125K-350K Annually

Mid level

Information Technology • Software • Financial Services • Quantitative Trading

The Site Reliability Engineer will provide support and diagnose issues within a real-time, distributed environment, focusing on large-scale application and infrastructure management, with basic required skills in UNIX/Linux, networking, SQL, and scripting languages.

Top Skills: BashPythonSQLTcp/IpUdpUnix/Linux

Corporate Tools LLC

Site Reliability Engineer

Reposted 4 Days AgoSaved

Remote or Hybrid

4 Locations

175K-175K Annually

Senior level

175K-175K Annually

Senior level

eCommerce • Legal Tech • Professional Services • Software • Data Privacy

The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.

Top Skills: AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform

Comcast

Site Reliability Engineer, Streaming HUB - FreeWheel

5 Days AgoSaved

Hybrid

Reston, VA, USA

Mid level

Digital Media • Information Technology • News + Entertainment

Own and support production infrastructure for FreeWheel's Streaming Hub. Design and implement cloud and Kubernetes infrastructure, automate with Terraform/Ansible and Python/Go, troubleshoot incidents, improve observability and CI/CD, and ensure platform reliability and scalability during high-traffic streaming events.

Top Skills: Amazon EksAnsibleAWSDockerGo (Golang)IamJenkinsKubernetesLoad BalancerOracle Cloud Infrastructure (Oci)PythonRoute 53Security GroupsTerraformVpc

Comcast

Lead Site Reliability Engineer, Data- FreeWheel

5 Days AgoSaved

Hybrid

Reston, VA, USA

Expert/Leader

Digital Media • Information Technology • News + Entertainment

Lead Data SRE responsible for ensuring reliability, scalability, and performance of data platforms. Design monitoring, automate operations, optimize performance, respond to incidents, plan capacity, enforce security/compliance, document systems, and collaborate with engineering, data science, and product teams.

Top Skills: AerospikeAnsibleApache KafkaAWSAws S3AzureCassandraContainerizationDockerElk StackGCPGoGrafanaHadoopHdfsJavaKubernetesMicroservicesMySQLNoSQLPostgresPrometheusPythonScalaSnowflakeSparkTerraform

Comcast

Site Reliability Engineer, Streaming HUB - FreeWheel

5 Days AgoSaved

Hybrid

Reston, VA, USA

Mid level

Digital Media • Information Technology • News + Entertainment

Own and support production infrastructure for a high-scale streaming advertising platform. Design and implement cloud and Kubernetes infrastructure, automate with IaC and scripting, troubleshoot incidents, ensure reliability and performance, collaborate with engineering and offshore operations, and support live events and traffic spikes.

Top Skills: Amazon EksAnsibleAWSDockerGoIamJenkinsKubernetesLoad BalancerNetworkingOracle Cloud InfrastructurePythonRoute 53Security GroupsTerraformVpc

Mastercard

Lead Site Reliability Engineer

Reposted 5 Days AgoSaved

Hybrid

O'Fallon, MO, USA

Mid level

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing

The Lead Site Reliability Engineer will ensure reliability, scalability, and performance of Mastercard's applications, enhancing operational practices and developer collaboration in a proactive environment.

Top Skills: Ci/CdDevOpsGoJavaPythonSpring Framework