Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

General Dynamics Information Technology

Site Reliability Engineer

8 Days AgoSaved

Remote

Location, WV, USA

164K-215K Annually

Expert/Leader

164K-215K Annually

Expert/Leader

Aerospace • Information Technology • Professional Services • Security • Software

Design, build, and maintain highly available cloud and on‑prem systems. Automate operations, implement monitoring/alerting, tune performance, and drive incident response and root cause fixes. Collaborate on reliable architectures and CI/CD pipelines, champion SRE best practices (SLIs/SLOs, error budgets), and support proposal technical content.

Top Skills: AWSAzureBashCi/Cd PipelinesContainer OrchestrationDatadogElkGrafanaKubernetesLinuxNetworkingPowershellPrometheusPythonSplunk

WESCO International

Site Reliability Engineer (REMOTE)

8 Days AgoSaved

In-Office or Remote

2 Locations

Mid level

Hardware

Lead technical services engineer guiding and training engineers, designing IT architecture, troubleshooting network security and third-party control integrations, coordinating projects, providing customer training and field support, and managing personnel and resources.

Top Skills: 802.1XAmxCrestronExcelMicrosoft OutlookMicrosoft PowerpointMicrosoft WordRadiusSecurity Certificate Management

SpaceX

Sr. IT Linux Site Reliability Engineer

8 Days AgoSaved

In-Office

Bastrop, TX, USA

Senior level

Aerospace • Other

Design, build, operate, scale, and optimize Kubernetes and RKE clusters and Linux infrastructure using automation (Ansible, Terraform). Collaborate with engineers to deploy resilient, high-performance systems, drive automation, define standards, upskill the team, and participate in on-call rotation.

Top Skills: AnsibleArgocdAwx/TowerCephCgroupsCi/CdCiliumCloud-InitCniCriCri-OCsiDockerGitGitopsGoGrafanaHelmInfluxdbIptablesIstioJenkinsJinjaJsonnetKernel ModulesKubernetesLinuxMetallbPkiPrometheusPuppetPythonRedfishRkeRook-CephShellSubversionTerraformVagrantVMwareYaml

Daxko

Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote

United States

90K-159K Annually

Mid level

90K-159K Annually

Mid level

Fitness • Healthtech • Information Technology • Payments • Software

The Site Reliability Engineer will enhance system reliability, manage cloud infrastructure, automate processes, support CI/CD pipelines, and troubleshoot production issues.

Top Skills: AnsibleAWSBashChefDockerGitGitlabJenkinsKubernetesMySQLPostgresPythonSQL ServerTerraformVMware

Waystar

Senior SRE / DevOps Engineer

Reposted 13 Days AgoSaved

In-Office

Louisville, KY, USA

Senior level

Healthtech • Payments • Software

The Senior SRE I will design and maintain automation for infrastructure provisioning, monitor system health, resolve production incidents, and mentor junior SREs, ensuring reliability and operational efficiency across cloud platforms.

Top Skills: AnsibleAWSAzureBashCloudFormationDatadogDockerGCPGithub ActionsGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonRubySplunkTerraform

Mattermost

Lead Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote

United States

170K-200K Annually

Senior level

170K-200K Annually

Senior level

Software

Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.

Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform

Quindar

DevSecOps - Site Reliability Engineer (SRE) / US Gov

Reposted 8 Days AgoSaved

Hybrid

Denver, CO, USA

160K-200K Annually

Mid level

160K-200K Annually

Mid level

Aerospace • Cloud • Software • Defense • Automation

Design and automate cloud systems for U.S. Government, focusing on DevSecOps, reliability, deployment automation, and observability. Participate in on-call rotations, supporting production environments and improving system resilience.

Top Skills: Aws EksDatadogGitlabGrafanaKubernetesLinux/UnixPythonTerraform

Assured

Staff Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote

USA

180K-210K Annually

Senior level

180K-210K Annually

Senior level

Artificial Intelligence • Insurance • Software • Automation

The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.

Top Skills: AWSKubernetesPostgresTerraform

WellSky

Sr. Software Engineer - SRE

Reposted 8 Days AgoSaved

In-Office

Overland Park, KS, USA

Senior level

Healthtech • Professional Services • Software

The Sr Software Engineer leads complex software development, ensuring solution scalability, collaborating with teams, solving technical problems, and advocating for high-quality software solutions.

Top Skills: AngularArgo CdAzure DevopsCi/CdGoogle Cloud PlatformKubernetesNew RelicOpentelemetryRuby On RailsTerraform

Quest Diagnostics

Epic Principal Site Reliability Engineer

Reposted 8 Days AgoSaved

In-Office

Secaucus, NJ, USA

150K-170K Annually

Expert/Leader

150K-170K Annually

Expert/Leader

Healthtech • Database

Seeking a Principal Site Reliability Engineer to build a SRE practice, enhance reliability, mentor teams, and drive performance engineering to optimize Quest products and services.

Top Skills: AnsibleAuroraAWSAzureBigtableCassandraCi/CdCloud Pub/SubCloud SpannerCloud SqlDockerDynamoDBDynatraceGitlabGoGCPJavaJmsKafkaKinesisKubernetesMqPerlPythonRdsRubyShell ScriptingTerraform

PlayStation

Site Reliability Engineer II

Reposted 8 Days AgoSaved

In-Office

Aliso Viejo, CA, USA

146K-219K Annually

Senior level

146K-219K Annually

Senior level

Gaming

The role involves ensuring production quality, owning system reliability, and participating in decision-making. Responsibilities include incident response and lifecycle management in cloud gaming technologies.

Top Skills: BashC++ElasticsearchGoIstioJavaKafkaKong Api GatewayKubernetesKumaLinkerdMongoDBMySQLPostgresPythonRedisRust

Cresta

Senior Infrastructure Engineer/SRE

Reposted 8 Days AgoSaved

Remote

United States

205K-270K Annually

Senior level

205K-270K Annually

Senior level

Artificial Intelligence • Other • Sales • Software

The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.

Top Skills: ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

WEX Inc.

Senior Staff Site Reliability Engineer

Reposted 8 Days AgoSaved

In-Office or Remote

11 Locations

160K-179K Annually

Senior level

160K-179K Annually

Senior level

Fintech • Payments

The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.

Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk

TP-Link USA Corporation

Site Reliability Engineer

9 Days AgoSaved

In-Office

Irvine, CA, USA

100K-140K Annually

Junior

100K-140K Annually

Junior

Hardware • Manufacturing

Operate and harden a multi-cloud microservices platform: deploy on Kubernetes, run load/chaos tests, build observability, automate with scripts, define SLO/SLA, ensure security/compliance, participate in incident response, disaster recovery, on-call rotation, and mentor junior team members.

Top Skills: AWSAzureBashGCPGoHpaJavaJvmKubernetesMicroservicesOciPowershellPython

General Dynamics Information Technology

Site Reliability Engineer - TS/SCI with Poly

9 Days AgoSaved

In-Office

5 Locations

128K-173K Annually

Senior level

128K-173K Annually

Senior level

Aerospace • Information Technology • Professional Services • Security • Software

Maintain and improve reliability, scalability, and performance of enterprise infrastructure across global sites. Implement automation and infrastructure-as-code, build monitoring and observability, perform RCA and incident response, support patching and RMF changes, integrate new capabilities, and maintain operational documentation and ITIL/ITSM processes to ensure mission-ready, high-availability environments.

Top Skills: AnsibleElkNagiosPowershellPythonScomSolarwindsSplunkTerraform

Goldman Sachs

Engineering - SRE Platforms - Site Reliability Engineer - Vice President - Dallas

9 Days AgoSaved

In-Office

Dallas, TX, USA

Senior level

Fintech • Financial Services

Lead SRE technical strategy and architecture for highly available, scalable enterprise platforms. Build automation, observability, and incident response practices; mentor senior engineers; drive capacity planning, production reliability, and adoption of SRE best practices across cloud and on-prem environments.

Top Skills: AnsibleAWSBigQueryChefCloudFormationDatadogDockerElasticsearchElk StackGCPGitlabGoGrafanaJavaJenkinsKafkaKubernetesLinuxMavenPagerdutyPrometheusPrompt EngineeringPuppetPythonRetrieval-Augmented Generation (Rag)Terraform

Goldman Sachs

Compliance Engineering, Site Reliability Engineer SRE, Associate, Dallas

9 Days AgoSaved

In-Office

Dallas, TX, USA

Mid level

Fintech • Financial Services

Site Reliability Engineer on the Compliance Engineering team responsible for ensuring production service health, capacity planning, monitoring, incident management, SLIs/SLOs, automation to reduce toil, and collaborating with engineers to improve scalability, reliability, and observability across distributed, cloud-native and big-data systems.

Top Skills: Automated TestingAWSAzureDistributed TracingElkGCPGrafanaHadoopJavaLinuxLoggingMetricsObservabilityOpentelemetryPerlPrometheusPythonRelational Databases

Deutsche Bank

Dev Ops SRE VP, JN - VP

9 Days AgoSaved

In-Office

Park, MI, USA

Expert/Leader

Fintech • Financial Services

VP-level SRE/DevOps leader responsible for global strategy and delivery of CI/CD, IaC, cloud-native platforms, observability, reliability engineering (SRE), security/compliance, automation, incident management, and mentoring teams to enable migration to microservices and optimize costs and resilience.

Top Skills: AksApp InsightsArmAWSAzureAzure DevopsBashCloudFormationDockerDynatraceEksElkGCPGithub ActionsGkeGrafanaJenkinsKubernetesNew RelicOraclePrometheusPythonSplunkSQLTerraform

Northern Trust

Sr Implementation Lead, SRE (CoP)

9 Days AgoSaved

Hybrid

Chicago, IL, USA

165K-288K Annually

Senior level

165K-288K Annually

Senior level

Artificial Intelligence • Cloud • Fintech • Information Technology • Analytics • Financial Services • Cybersecurity

Lead adoption and standardization of SRE practices across the enterprise. Establish SRE governance, define reliability metrics (SLOs/SLIs), build a Community of Practice, run training/forums, enable automation and tooling, partner with platform teams on observability, chaos engineering, and self-healing, and drive cross-functional alignment for resilience and incident management.

Top Skills: AutomationAzure MonitorChaos EngineeringCi/CdCloud-NativeDevOpsDynatraceHybrid ArchitecturesIncident ManagementObservabilityPlatform EngineeringPrometheusRelease EngineeringSelf-HealingSplunkSre

Todyl

Site Reliability Engineer II

9 Days AgoSaved

Hybrid

2 Locations

130K-160K Annually

Mid level

130K-160K Annually

Mid level

Cloud • Security

Build and operate the production platform (Kubernetes, AWS, IaC, CI/CD, observability), automate self-service deployment, embed security and secrets management, run and modernize on-call, drive cost efficiency, mentor teammates, and maintain runbooks and post-incident reviews.

Top Skills: AWSBashCi/CdClaudeGitGrafanaKubernetesLinuxPrometheusPythonSaltTerraform

Photon

SRE Architect | Onsite

9 Days AgoSaved

Remote

United States

Senior level

Agency • Information Technology

Lead SRE role designing and maintaining CI/CD pipelines (GitHub Actions), containerized deployments (Docker, Kubernetes, AKS, Helm), web/mobile app releases, observability, automated testing, and DevOps best practices across cloud environments with cross-functional collaboration and regulatory compliance.

Top Skills: AksAndroidAzure Application InsightsAzure Log AnalyticsAzure MonitorBashBranchingDockerDocker ComposeGitGit HooksGithub ActionsGoogle PlayHelmHerokuiOSIos App StoreJavaKubernetesNpmPowershellPull RequestsPythonSonarqubeVeracodeVercel

Vertiv

Platform Operations Engineer (Site Reliability Engineer)

9 Days AgoSaved

In-Office

Westerville, OH, USA

Senior level

Hardware • Software • Analytics

Owner of cross-platform observability and incident management for Vertiv Digital platforms. Design and operate monitoring, SLOs/SLIs, incident response, SLA governance, capacity planning, automation to reduce toil, CI/CD reliability, and enforce DevSecOps and operational governance across cloud and containerized environments.

Top Skills: AnsibleAWSAzure DevopsAzure MonitorC#Ci/CdCompass AiCursorDastDatadogDockerFeature FlagsGithub ActionsGitlabGrafanaJavaJavaScriptJenkinsKubernetesPower AutomatePowershellPrometheusPythonRubySastSecrets ManagementSite ScopeSplunkTerraformUipathWorkatoWriter Ai

NOV

Site Reliability Engineer

9 Days AgoSaved

Hybrid

Houston, TX, USA

Senior level

Hardware • Other • Energy

Maintain and monitor production systems for availability and performance; lead incident response and postmortems; implement observability, alerting, and automated remediation; optimize distributed systems (AKKA.NET) and PostgreSQL; build CI/CD pipelines and infrastructure-as-code.

Top Skills: Akka.NetAWSAzureAzure DevopsAzure PipelinesBashC#DatadogDockerElkGCPGitGithub ActionsGitlabGitlab CiGrafanaKubernetesOpentelemetryPhobosPostgresPowershellPrometheusPythonTerraform

Archer Aviation

Sr Staff Site Reliability Engineer

Reposted 9 Days AgoSaved

In-Office

San Jose, CA, USA

207K-259K Annually

Senior level

207K-259K Annually

Senior level

Aerospace

Responsible for the reliability, scalability, performance, and security of core systems, implementing infrastructure, maintaining cloud-native services, and developing automation solutions.

Top Skills: AirflowAmazon EksArgocdAWSBashDockerElk StackGitlab CiGrafanaJenkinsKafkaPowershellPrometheusPythonSpark

DigiCert

Principal Site Reliability Engineer

10 Days AgoSaved

In-Office

Lehi, UT, USA

160K-190K Annually

Senior level

160K-190K Annually

Senior level

Security • Software • Cybersecurity

Lead platform reliability and cloud modernization across multi-cloud (AWS/Azure/GCP). Define SLIs/SLOs, run incident response, build observability and IaC (Terraform), champion Kubernetes and GitOps, automate operational workflows, and mentor engineers to reduce toil and improve platform reliability and developer velocity.

Top Skills: AksAWSAzureBashCi/CdCniContainersDnsEksGCPGithub ActionsGitopsGkeGoGrafanaKubernetesLoad BalancingOpentelemetryPkiPod SecurityPrometheusPythonRbacServerlessService MeshSplunkTerraformTlsZero-Trust Networking