Get the job you really want.

Job Title, Company or Keyword

Location

Maximum of 25 job preferences reached.

Top Senior Site Reliability Engineer Jobs

Gradial

Principal SRE

20 Days AgoSaved

Easy Apply

In-Office

Seattle, WA, USA

Easy Apply

180K-240K Annually

Senior level

180K-240K Annually

Senior level

Artificial Intelligence • Software • Generative AI

As a Principal SRE, you will lead reliability, scalability, and operational health of Gradial's platform, driving improvements and collaborating with engineering.

Top Skills: Ci/CdInfrastructure As CodeKubernetesObservabilityPythonTypescript

CargoSprint

Director of DevOps and Site Reliability Engineering (SRE)

20 Days AgoSaved

Remote

United States

Senior level

Logistics • Software • Transportation

Lead and mentor teams in DevOps and SRE, architect scalable Azure Cloud infrastructure, implement CI/CD and IaC, ensure database reliability, and drive cross-functional collaboration.

Top Skills: Azure CloudAzure DevopsCi/CdCosmosdbDockerElkGrafanaKubernetesMySQLPostgresPrometheusRedisSQL ServerTerraform

Cooley

Senior Technology Site Reliability Engineer

Reposted 20 Days AgoSaved

In-Office or Remote

12 Locations

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

Cority

Site Reliability Engineer II

Reposted 20 Days AgoSaved

Remote

United States

Mid level

Healthtech • Software

Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.

Top Skills: Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server

xLabs

Senior / Staff Site Reliability Engineer (Blockchain Infra)

Reposted 20 Days AgoSaved

In-Office or Remote

2 Locations

Senior level

Software

The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.

Top Skills: BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages

Senior Manager, SRE & Networking

Reposted 20 Days AgoSaved

In-Office

2 Locations

197K-295K Annually

Senior level

197K-295K Annually

Senior level

Cloud • Information Technology • Security • Software

Lead multi-team SRE, Virtualization, Networking, and AI/GPU infrastructure to deliver reliable, scalable hybrid platforms. Own roadmap, operational excellence, SLO/SLI programs, automation/GitOps, Kubernetes and OpenStack operations, AI compute reliability, and cross-functional alignment and staffing.

Top Skills: Ai/Gpu ComputeAnsibleCephCi/CdCinderCsiFirewallsGitopsGpu SchedulingIngress ControllersKeystoneKubeflowKubernetesKubernetes CniKvmL4L7MlflowNetwork PolicyNeutronNovaObservabilityOpenshiftOpenstackProxmoxPulumiRayRobinRoutingSdnService MeshSlo/SliSwitchingTerraformTitan-K8STriton Inference ServerVanilla KubernetesVMwareVsanXcp-NgZfs

Clarity

Principal Platform Engineer (Site Reliability Engineer)

Reposted 21 Days AgoSaved

Easy Apply

In-Office

Fort Meade, MD, USA

Easy Apply

Senior level

Information Technology • Security • Software

Manage daily operations of a classified NOC, focusing on Kubernetes services, incident response, system monitoring, and ensuring security and availability.

Top Skills: Aws GovcloudAzure GovernmentC2EC2SDockerElastic StackFluentdFluxGrafanaHelmJIRAJwccKubernetesOsticketPrometheusTerraform

ACI Worldwide

Principal Site Reliability Engineer

21 Days AgoSaved

Hybrid

2 Locations

15-15 Annually

Expert/Leader

15-15 Annually

Expert/Leader

Software

The Principal Site Reliability Engineer will design and improve systems for reliability in payments software, guiding development cycles and incident response, while ensuring service health and organizational efficiency.

Top Skills: CassandraGoJavaKafkaOraclePostgresPythonRabbitMQShell

ACI Worldwide

Principal Site Reliability Engineer

21 Days AgoSaved

Hybrid

2 Locations

Expert/Leader

Software

The Principal Site Reliability Engineer will enhance system reliability, promote SRE practices, lead organizational improvements, and ensure efficient software development and incident response processes.

Top Skills: CassandraGoJavaKafkaOraclePostgresPythonRabbitMQShell

Axon

Site Reliability Engineer II

21 Days AgoSaved

Easy Apply

In-Office

Seattle, WA, USA

Easy Apply

125K-150K Annually

Senior level

125K-150K Annually

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

As a Site Reliability Engineer II, you'll develop automation workflows, manage cloud operations, and enhance service reliability while participating in incident response and code reviews.

Top Skills: ApmAWSAws CloudformationAzureC#Ci/CdGoJavaKubernetesObservability ToolsPythonTemporalTerraform

Patterson-Uti Energy

Site Reliability Engineer Lead

21 Days AgoSaved

In-Office

Houston, TX, USA

Senior level

Other • Energy

Lead SRE practices for GCP-based data platforms, automate workflows, design reliable architectures, mentor engineers, and improve operational processes.

Top Skills: BigQueryCi/CdCloud LoggingCloud MonitoringCloud StorageCompute EngineDataflowDatastreamGithub ActionsGitlab CiGkeGoogle Cloud PlatformIamKubernetesPub/SubPythonTerraform

WEX Inc.

Senior Staff Site Reliability Engineer

Reposted 21 Days AgoSaved

In-Office or Remote

11 Locations

160K-179K Annually

Senior level

160K-179K Annually

Senior level

Fintech • Payments

The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.

Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

PUMA Group

Global Director D2C Platform Operations & SRE

Reposted 21 Days AgoSaved

In-Office

Headquarters, AZ, USA

Expert/Leader

Retail • Sports

Lead global D2C Site Reliability and Platform Operations to ensure availability, performance, and scalability of eCommerce and omnichannel systems. Define SRE strategy, SLIs/SLOs, incident management, observability, cloud operations, FinOps, vendor management, and global on-call models while building and developing high-performing teams and operational playbooks.

Top Skills: AlertingCi/CdCloud InfrastructureError BudgetsFinopsIncident ManagementMonitoringObservabilitySite Reliability Engineering (Sre)SlasSlisSlos

Upshop

SRE / DevOps Manager

Reposted 21 Days AgoSaved

Easy Apply

Remote

USA

Easy Apply

Senior level

Artificial Intelligence • eCommerce • Retail

Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.

Top Skills: AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform

Planet

Site Reliability Engineer

Reposted 21 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

172K-215K Annually

Senior level

172K-215K Annually

Senior level

Aerospace • Big Data • Greentech • Hardware • Social Impact

Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.

Top Skills: AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform

Focused

SRE - Observability

Reposted 21 Days AgoSaved

Easy Apply

In-Office

Denver, CO, USA

Easy Apply

130K-170K Annually

Mid level

130K-170K Annually

Mid level

Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting

The role involves designing and implementing observability solutions using OpenTelemetry, managing platform engineering tasks, and ensuring site reliability through various engineering practices.

Top Skills: AWSAzureCi/CdCloudFormationDockerGCPGoJavaKubernetesNode.jsOpentelemetryPulumiPythonRustTerraform

Focused

Staff SRE - Observability

Reposted 21 Days AgoSaved

Easy Apply

In-Office

Chicago, IL, USA

Easy Apply

130K-170K Annually

Senior level

130K-170K Annually

Senior level

Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting

The role involves designing and implementing OpenTelemetry solutions, optimizing telemetry infrastructure, establishing SRE practices, and managing observability across cloud platforms.

Top Skills: ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsNode.jsOpentelemetryPowershellPulumiPythonRustTerraform

Fireblocks

Site Reliability Engineer

Reposted 21 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

150K-185K Annually

Mid level

150K-185K Annually

Mid level

Software

Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.

Top Skills: AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk

Baseten

Forward Deployed SRE

Reposted 21 Days AgoSaved

Hybrid

2 Locations

135K-285K Annually

Mid level

135K-285K Annually

Mid level

Software

As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.

Top Skills: Tensorrt

SitusAMC

Site Reliability Engineer - Remote US

Reposted 21 Days AgoSaved

Remote

USA

110K-130K Annually

Senior level

110K-130K Annually

Senior level

Real Estate • Financial Services • PropTech

As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.

Top Skills: ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform

Harvey

Staff Software Engineer, Site Reliability Engineer (SRE)

Reposted 21 Days AgoSaved

In-Office

San Francisco, CA, USA

238K-290K Annually

Expert/Leader

238K-290K Annually

Expert/Leader

Artificial Intelligence • Legal Tech • Professional Services • Software

As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.

Top Skills: AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform

Harvey

Senior Software Engineer, Site Reliability Engineer (SRE)

Reposted 21 Days AgoSaved

In-Office

San Francisco, CA, USA

200K-260K Annually

Mid level

200K-260K Annually

Mid level

Artificial Intelligence • Legal Tech • Professional Services • Software

As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.

Top Skills: AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform

Redwood Materials

Site Reliability Engineer

Reposted 22 Days AgoSaved

Easy Apply

In-Office

San Francisco, CA, USA

Easy Apply

130K-175K Annually

Junior

130K-175K Annually

Junior

Energy

The Site Reliability Engineer will design and implement systems, drive automation, coordinate between teams, support deployed systems, and ensure scalability for rapid growth.

Top Skills: Active DirectoryAnsibleAWSAzureChefJSONLinuxPuppetPythonRestVMwareWindows ServerYaml

Mattermost

Lead Site Reliability Engineer

Reposted 22 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

170K-200K Annually

Senior level

170K-200K Annually

Senior level

Software

Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.

Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform

Akamai Technologies

Senior Site Reliability Engineer

Reposted 4 Hours AgoSaved

In-Office or Remote

2 Locations

107K-221K Annually

Senior level

107K-221K Annually

Senior level

Cloud • Security • Software • Cybersecurity

The Senior Site Reliability Engineer will enhance performance and reliability of distributed systems, define KPIs, and collaborate cross-functionally to improve infrastructure and operational efficiency.

Top Skills: AdbmsBashDatadogGrafanaInternet ProtocolsJavaScriptOracle SqlPrometheusPython