Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Site Reliability Engineer II, tvScientific

16 Days AgoSaved

In-Office or Remote

San Francisco, CA, USA

114K-235K Annually

Mid level

114K-235K Annually

Mid level

Social Media

Operate, scale, and improve a cloud-native platform on AWS and Kubernetes. Manage GitOps deployments with ArgoCD and Helm, provision infra with Terraform/Terragrunt, build CI/CD automation, enhance observability, respond to incidents, reduce operational toil through scripting, and collaborate with security and application teams to improve reliability and platform guardrails.

Top Skills: ArgocdAWSBashContainersEksGithub ActionsGitopsHelmIamKubernetesLinuxPythonTerraformTerragrunt

AYR Global IT Solutions Inc

Cloud Site Reliability Engineer

16 Days AgoSaved

In-Office

New York, NY, USA

Senior level

Cloud • Information Technology • Consulting • Cybersecurity

Design, templatize and deploy scalable infrastructure in public clouds (AWS, GCP) using IaC (CloudFormation). Support architects, troubleshoot developer escalations, ensure compliance, and build stable platform services; work within agile teams to create configuration templates and automated deployments.

Top Skills: AWSAws CloudformationAws EfsEc2GCPPythonRdsRuby

The Depository Trust & Clearing Corporation (DTCC)

Senior Application Support Engineer (SRE)

16 Days AgoSaved

In-Office

2 Locations

Senior level

Financial Services

Drive reliability, scalability, and performance for mission-critical applications using SRE principles. Implement monitoring, SLIs/SLOs, automation, and fault-tolerance strategies. Lead incident response, RCA, and embed reliability practices into the SDLC while collaborating across development, infrastructure, network, and security teams.

Top Skills: AutosysAWSDevops ToolsDynatraceJIRAMainframeMonitoring/ObservabilityPlsqlPythonServicenowShellSplunkSQLUnix/LinuxWindows

VERISIGN

Site Reliability Engineer - IBM AIX

16 Days AgoSaved

In-Office

Reston, VA, USA

136K-184K Annually

Senior level

136K-184K Annually

Senior level

Information Technology • Software

Operate, provision, and secure IBM POWER and AIX infrastructure for mission-critical services. Install, configure, and maintain physical hosts, PowerVM/PowerVC environments, and AIX images to meet security controls. Troubleshoot production issues, document procedures, coordinate deployments with engineering teams, and participate in a 24x7 on-call rotation.

Top Skills: AixCisHmcIbm PowerLinuxNimPowervcPowervmVios

Diligent

Staff Site Reliability Engineer

16 Days AgoSaved

In-Office

New York, NY, USA

131K-164K Annually

Expert/Leader

131K-164K Annually

Expert/Leader

Software

Design, deploy, and automate VMware-based private cloud infrastructure across global datacenters. Administer Linux and Windows Server platforms, integrate Active Directory, manage storage, networking, ADCs (F5/AVI), and ensure availability, security, and compliance. Build automation (PowerCLI/Ansible/Python), participate in on-call rotations, document systems, and mentor junior engineers while driving infrastructure modernization and reliability improvements.

Top Skills: Active DirectoryAnsibleAvi (Nsx Advanced Load Balancer)CentosCi/CdDnsF5 Big-IpGitNasPowercliPowershellPythonRhelSanTcp/IpUbuntuVcenter)Vmware Vsphere (EsxiVpnWindows Server

Optum

Senior Site Reliability Engineer - Remote

Reposted 21 Days AgoSaved

In-Office or Remote

Eden Prairie, MN, USA

92K-164K Annually

Senior level

92K-164K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Senior Site Reliability Engineer will architect and maintain cloud infrastructure, collaborating with software and DevOps engineers while ensuring security and performance.

Top Skills: ArgocdAWSAzureAzure MonitorDynatraceFluxGraphanaHelmKubernetesPrometheusPulumiRestful ServicesSplunkTerraform

Thinking Machines Lab

Site Reliability Engineer (SRE)

Reposted 16 Days AgoSaved

In-Office

San Francisco, CA, USA

350K-475K Annually

Mid level

350K-475K Annually

Mid level

Artificial Intelligence • Information Technology

The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.

Top Skills: Cloud InfrastructureKubernetes

SpaceX

Sr. IT Linux Site Reliability Engineer

Reposted 16 Days AgoSaved

In-Office

Cape Canaveral, FL, USA

Senior level

Aerospace • Other

The Sr. IT Linux Site Reliability Engineer will manage and optimize Kubernetes clusters, automate systems, and foster collaboration to support SpaceX's engineering teams and infrastructure needs.

Top Skills: AnsibleDockerGitGoGrafanaHelmJSONKubernetesLinuxPrometheusPythonTerraformYaml

InfStones

Blockchain Site Reliability Engineer

Reposted 16 Days AgoSaved

Remote

Texas, USA

Mid level

Blockchain

The Blockchain Site Reliability Engineer is responsible for maintaining blockchain nodes' reliability, monitoring, incident response, and building automation tools to enhance operations.

Top Skills: DockerElkGoGrafanaJavaScriptKubernetesLinuxPrometheusPythonRustShell

Deutsche Bank

IB CTO Team - Lead Site Reliability Engineer (SRE) - Vice President

Reposted 16 Days AgoSaved

In-Office

Centre, Green, OH, USA

125K-185K Annually

Senior level

125K-185K Annually

Senior level

Fintech • Financial Services

Lead the Site Reliability Engineering efforts for the Investment Banking CTO team, focusing on resilience, architectural guidance, and SRE adoption across applications and platforms.

Top Skills: ArgocdFluxcdGCPGoJavaKubernetesPythonTerraform

Okta

Staff Site Reliability Engineer - Observability GCP

Reposted 17 Days AgoSaved

In-Office

5 Locations

194K-267K Annually

Senior level

194K-267K Annually

Senior level

Cloud

The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.

Top Skills: GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform

SpaceX

Site Reliability Engineer, Kubernetes Platform (Starshield)

17 Days AgoSaved

In-Office

Hawthorne, CA, USA

125K-175K Annually

Junior

125K-175K Annually

Junior

Aerospace • Other

Design, operate, and scale on-premise infrastructure for the Starshield satellite constellation. Build automation for Kubernetes cluster deployment and management, operate core infrastructure (databases, monitoring, distributed storage), collaborate with software teams, troubleshoot across the stack, improve service lifecycle, and ensure high availability through monitoring and performance improvements.

Top Skills: AnsibleBashC++GoKubernetesLinuxOci ContainersPythonTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

SpaceX

Site Reliability Engineer, Kubernetes Platform (Starshield)

17 Days AgoSaved

In-Office

Redmond, WA, USA

125K-175K Annually

Junior

125K-175K Annually

Junior

Aerospace • Other

Design, deploy, and operate on-premises Kubernetes clusters and core infrastructure (databases, monitoring, distributed storage). Build automation, troubleshoot across the Starshield stack, collaborate with software teams to ensure scalable, highly available services, and improve lifecycle processes.

Top Skills: AnsibleBashBazelC++DatabasesDistributed StorageGoKubernetesLinuxMakefilesMonitoringOci ContainersPythonTcp/IpTerraform

Tradeweb

Senior Site Reliability Engineer (SRE)

17 Days AgoSaved

Remote

United States

170K-210K Annually

Senior level

170K-210K Annually

Senior level

eCommerce

Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.

Top Skills: ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns

Guidehouse

Site Reliability Engineer

17 Days AgoSaved

In-Office or Remote

2 Locations

80K-133K Annually

Mid level

80K-133K Annually

Mid level

Consulting

Maintain and improve reliability of cloud-based enterprise systems by implementing SRE practices. Participate in design and code reviews, incident management, automation (IaC/CI-CD), monitoring, documentation, and collaboration with cross-functional teams to reduce downtime and improve scalability and security.

Top Skills: Ansible Automation PlatformArtifactoryAWSAzureBashCi/CdGitlabIacLinuxPackerPowershellPythonTerraformWindows

Xbox

Associate Site Reliability Engineer

17 Days AgoSaved

In-Office

Santa Monica, CA, USA

31-56 Hourly

Junior

31-56 Hourly

Junior

Gaming • Hardware

Entry-level Site Reliability Engineer responsible for monitoring service health, incident response, troubleshooting Kubernetes, networking, DNS, and application issues, building observability (dashboards, alerts, runbooks), automating repetitive tasks, and supporting release reliability and post-incident remediation.

Top Skills: BashCloudContainersDashboardsDnsGitHTTPKubernetesLinuxLoggingMetricsMonitoringPython

E2B

SRE/Infrastructure Engineer

Reposted 17 Days AgoSaved

In-Office

San Francisco, CA, USA

200K-350K Annually

Senior level

200K-350K Annually

Senior level

Artificial Intelligence

The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.

Top Skills: AWSAzureCloudflareGCPKubernetesTerraform

The Walt Disney Company

Principal Site Reliability Engineer

Reposted 17 Days AgoSaved

In-Office

Bay Lake, FL, USA

Expert/Leader

Digital Media • Gaming • News + Entertainment • Sports

The Principal Site Reliability Engineer will lead DevOps culture, architect security solutions, and monitor emerging technologies for Disney Experiences Technology.

Top Skills: AIAkamaiMobile TechnologiesSecurity ToolsWeb Technologies

The Walt Disney Company

Site Reliability Engineer II

Reposted 17 Days AgoSaved

In-Office

New York, NY, USA

123K-165K Annually

Mid level

123K-165K Annually

Mid level

Digital Media • Gaming • News + Entertainment • Sports

The Site Reliability Engineer II contributes to system stability and scalability by implementing automation, enhancing observability, and participating in incident response and root cause analysis.

Top Skills: Argo CdAWSAzureBashCi/CdCloudFormationDatadogDockerEfkElkFluxGCPGithub ActionsGitlab CiGoGrafanaJavaScriptJenkinsKubernetesLinuxNew RelicPrometheusPythonSplunkTerraform

The Walt Disney Company

Principal Site Reliability Engineer

Reposted 17 Days AgoSaved

In-Office

Bay Lake, FL, USA

Expert/Leader

Digital Media • Gaming • News + Entertainment • Sports

Lead SRE culture, mentor teams, manage observability and reliability. Design and support cloud-agnostic systems, and automate infrastructure using advanced tools while enhancing organizational performance.

Top Skills: AIAnsibleAWSAzureChefCloud FormationDevops ToolsGCPLinuxLlmsTerraformWindows

C3 AI

Senior/Lead Site Reliability Engineer – Federal

Reposted 17 Days AgoSaved

In-Office

Tyson's Corner, VA, USA

159K-230K Annually

Senior level

159K-230K Annually

Senior level

Artificial Intelligence • Big Data • Machine Learning • Software

The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.

Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform

GM Financial

Lead Site Reliability Engineer

Reposted 17 Days AgoSaved

Hybrid

Arlington, TX, USA

Senior level

Fintech • Financial Services

The Lead Site Reliability Engineer will manage and optimize Kubernetes and Spark environments, supporting data processing and machine learning platforms while collaborating with diverse teams to enhance system reliability and performance.

Top Skills: AnsibleSparkAzureAzure DevopsChefDockerGoJavaJenkinsKubernetesObject StoragePuppetPythonRubyTerraform

Quindar

Site Reliability Engineer (SRE)

Reposted 17 Days AgoSaved

Hybrid

2 Locations

160K-200K Annually

Mid level

160K-200K Annually

Mid level

Aerospace • Cloud • Software • Defense • Automation

The Site Reliability Engineer will design, automate, and operate cloud systems, focusing on DevSecOps and operational stability, while improving reliability and collaborating with engineers across the platform.

Top Skills: AWSAws IamDatadogGitlabGrafanaKubernetesLinux/UnixPythonRancherTerraform

Versana

SRE/DevOps Engineer

Reposted 17 Days AgoSaved

Hybrid

New York, NY, USA

Senior level

Fintech

The SRE/DevOps Engineer will enhance observability and monitoring tools, improve system reliability, conduct post-incident reviews, and collaborate with developers to optimize workflows and CI/CD processes.

Top Skills: AWSAzureAzure BicepAzure DevopsChaos MeshCloud FormationDatadogDockerElasticsearchGCPGithub ActionsGitlab Ci/CdGrafanaGremlinJenkinsKafkaKubernetesTerraform

Mimecast

Site Reliability Engineer II

Reposted 17 Days AgoSaved

In-Office

Columbus, OH, USA

124K-186K Annually

Senior level

124K-186K Annually

Senior level

Information Technology • Security • Software • Consulting

As a Site Reliability Engineer, you will design and support AWS infrastructure, build CI/CD pipelines, debug systems, and promote self-service for product teams. Collaborate with teams for continuous deployment and automation in a cloud environment, leveraging AI tools for efficiency.

Top Skills: Ai ToolingAWSCi/CdGithub ActionsKubernetesPostgresTerraform