Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Ad Hoc

Senior Site Reliability Engineer

3 Days AgoSaved

In-Office

McLean, VA, USA

135K-150K Annually

Senior level

135K-150K Annually

Senior level

Software

Lead reliability for a large federal cloud platform: define SLOs, build observability, run incident response and postmortems, automate toil, design AWS/EKS infrastructure, mentor engineers, and present reliability designs to stakeholders.

Top Skills: Amazon EksAWSAws CertificationCkaCksFedrampKubernetesNist 800-53Observability (Metrics/Logging/Tracing/Alerting)SlosTerraformZero-Trust

FreedomPay

Sr. Site Reliability Engineer

3 Days AgoSaved

Hybrid

Philadelphia, PA, USA

Senior level

Payments

Senior SRE responsible for ensuring high availability and resiliency of a global payments platform by building observability, automations, AI-driven remediation, incident response, and self-healing workflows; participates in on-call rotation and hybrid Philadelphia-based work.

Top Skills: AiopsAksAnthropic (Claude)ApmAzureAzure Ai (Foundry)Azure Sre AgentCi/CdDatadogDnsDynatraceHTTPHttpsIisKubernetesLoad BalancingNew RelicOpenai (Codex)Pagerduty Process AutomationPowershellPythonRundeckSQLT-SqlTcp/IpVMwareWindows Server

Bank of America

Senior Site Reliability Engineer

3 Days AgoSaved

In-Office

3 Locations

153K-192K Annually

Senior level

153K-192K Annually

Senior level

Big Data • Fintech • Mobile • Payments • Financial Services • Data Privacy

Senior SRE responsible for designing and maturing cloud reliability on GCP/Azure: build observability, define SLIs/SLOs, create Terraform modules and CI/CD automation, lead incident/root-cause investigations, partner with security/governance, mentor engineers, and drive platform resiliency and production readiness.

Top Skills: Azure Log AnalyticsAzure Resource GraphCi/CdDevsecopsDnsDynatraceFirewallsGenaiGoogle Cloud Platform (Gcp)IamLoad BalancingAzurePolicy-As-CodeTerraformTerraform EnterpriseVpc

Employer Direct Healthcare

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

Hybrid

2 Locations

Senior level

Healthtech

As a Senior Site Reliability Engineer, you will ensure the reliability and performance of our Azure-based healthcare platform, implementing SRE practices, driving incident management, and automating operational tasks.

Top Skills: AzureAzure MonitorBashDatadogPowershellPythonTerraform

Andromeda (andromeda.ai)

Senior Site Reliability Engineer - AI Infrastructure

Reposted 3 Days AgoSaved

In-Office or Remote

8 Locations

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.

Top Skills: AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform

Early Warning

Sr. Site Reliability Engineer - Paze

4 Days AgoSaved

In-Office

3 Locations

106K-156K Annually

Senior level

106K-156K Annually

Senior level

Fintech

Design, build, and maintain scalable, reliable application infrastructure. Automate deployments and configuration, implement observability and monitoring, troubleshoot performance, advise development teams on SDLC and microservice best practices, create runbooks, participate in 24x7 on-call rotation, and ensure security and disaster recovery readiness.

Top Skills: AWSCi/CdDockerGitGoIpJavaJavaScriptKubernetesLinuxMonitoringObservabilityPythonRubyScripting LanguagesSecurity Encryption ProtocolsSwarmTcpUdp

Airbyte

Senior Site Reliability Engineer - Hiring Sprint

4 Days AgoSaved

Hybrid

San Francisco, CA, USA

196K-255K Annually

Senior level

196K-255K Annually

Senior level

Artificial Intelligence • Big Data • Software

Own and improve infrastructure for the Data Replication platform: Kubernetes, CI/CD, secrets, networking, cloud (AWS/GCP). Drive reliability, observability, AI-augmented tooling, canary rollouts, incident reduction, runbooks, and partner with product engineers.

Top Skills: Agentic FrameworksAirbyteAWSCdksCi/CdConnector-Based ArchitecturesDatadogGCPGrafanaHelmJavaKubernetesLlmsPrometheusPythonSecrets ManagementTerraform

Akamai Technologies

Senior Site Reliability Engineer

4 Days AgoSaved

In-Office or Remote

2 Locations

121K-219K Annually

Senior level

121K-219K Annually

Senior level

Cloud • Security • Software • Cybersecurity

Lead reliability, automation, and observability for high-density AI hardware infrastructure. Build Python-based IaC tooling, telemetry pipelines, Prometheus/Grafana dashboards, and AI-assisted tooling. Run 24x7 incident response, coordinate vendors and field technicians, define operational readiness, and drive post-mortems to improve uptime and performance.

Top Skills: Bare-MetalBgpGrafanaIpv4Ipv6LlmsLokiOpentelemetryPagerdutyPrivate CloudPrometheusPythonSlackTimeseries EnginesVirtualized Environments

Akamai Technologies

Senior Site Reliability Engineer

4 Days AgoSaved

In-Office or Remote

2 Locations

121K-219K Annually

Senior level

121K-219K Annually

Senior level

Cloud • Security • Software • Cybersecurity

Design, build, and operate scalable infrastructure and CI/CD/IaC systems. Implement observability (monitoring, logging, alerting), automate reliability improvements, mentor engineers, collaborate on incident response, and participate in on-call rotations to maintain Akamai Cloud services.

Top Skills: AlertingAnsibleBashChefCi/CdGithub ActionsGitlab Ci/CdGoInfrastructure As CodeJenkinsLoggingMonitoringPuppetPythonSaltstackTelemetryTerraform

Plaud.ai

Senior SRE Engineer - San Francisco

4 Days AgoSaved

Hybrid

San Francisco, CA, USA

Senior level

Artificial Intelligence • Software • Generative AI

Ensure reliability and performance of Plaud.ai's AI products at scale by designing and operating cloud-native systems, owning production reliability and incident response, building observability and automation, defining SLOs/SLIs, driving postmortems, and partnering with product and engineering teams to improve operational maturity.

Top Skills: AWSAzureGCPGoJavaKubernetesPython

i4DM

Senior Site Reliability Engineer

4 Days AgoSaved

Remote

USA

Senior level

Software

Drive SRE practices for VA enterprise healthcare platforms: automate infrastructure and CI/CD, define SLIs/SLOs, improve observability and reliability, support incident response, and ensure cloud-native, secure, compliant operations in AWS and containerized environments.

Top Skills: AnsibleAWSBashCi/CdCloudwatchDockerEcsEksElkGoGrafanaInfrastructure As CodeKubernetesLinuxOpentelemetryPowershellPrometheusPythonSplunkTerraform

AuthZed

Sr. Site Reliability Engineer

Reposted 4 Days AgoSaved

Remote

2 Locations

Senior level

Artificial Intelligence • Information Technology • Software • Database

As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.

Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

OutSystems

Senior Site Reliability Engineer

Reposted 5 Days AgoSaved

In-Office or Remote

7 Locations

Senior level

Software

The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.

Top Skills: AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform

CCC Intelligent Solutions

Senior SRE

Reposted 5 Days AgoSaved

In-Office

Chicago, IL, USA

106K-145K Annually

Senior level

106K-145K Annually

Senior level

Artificial Intelligence • Automotive • Internet of Things • Software

The Senior Site Reliability Engineer will manage system health, automate solutions, resolve incidents, and collaborate across teams to enhance performance and reliability.

Top Skills: APIsArmAzureAzure CliBicepCloud InfrastructureDevOpsGitPowershellTerraformVirtualization

SpaceX

Sr. Site Reliability Engineer (Application Software)

6 Days AgoSaved

In-Office

Hawthorne, CA, USA

165K-230K Annually

Senior level

165K-230K Annually

Senior level

Aerospace • Other

Build, operate, and scale mission-critical application platforms to accelerate vehicle software delivery. Manage infrastructure as code, improve observability, collaborate with developers, run on-call rotations, conduct blameless postmortems, and reduce performance bottlenecks to support Falcon, Starship, Dragon, and Starlink software lifecycles.

Top Skills: AnsibleBazelBuckC#C++ClickhouseDockerJavaScriptKubernetesKvmLinuxMakeMySQLPostgresPuppetPythonQemuTerraformVsphere

Wikimedia Foundation

Senior Site Reliability Engineer, Wikimedia Enterprise

Reposted 6 Days AgoSaved

Remote

USA

117K-181K Annually

Senior level

117K-181K Annually

Senior level

Other • Social Impact

As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.

Top Skills: AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform

Wikimedia Foundation

Senior Site Reliability Engineer, Data Persistence

Reposted 6 Days AgoSaved

Remote

USA

113K-176K Annually

Senior level

113K-176K Annually

Senior level

Other • Social Impact

The Senior Site Reliability Engineer is responsible for maintaining Wikimedia's infrastructure, improving reliability, automating processes, and collaborating with teams. The role involves troubleshooting, managing deployments, and leading incident responses while working remotely.

Top Skills: AnsibleBashCassandraDebianGoGrafanaHhvmKubernetesMariadbMemcachedPHPPrometheusPuppetPythonRedisRubyShell

Alkami

Sr Site Reliability Engineer - Release

7 Days AgoSaved

Remote

110K-137K Annually

Senior level

110K-137K Annually

Senior level

Financial Services

Prototype, write, test, document, and deploy release automation across environments. Build and maintain pipelines, collaborate with engineers and product teams, troubleshoot issues, participate in on-call rotation, and improve software delivery, configuration, monitoring, and operations.

Top Skills: AnsibleBashDockerGitlabJenkinsKubernetesMssqlPostgresPowershellPythonRedisTeamcity

IXL Learning

Senior Site Reliability Engineer

7 Days AgoSaved

In-Office

Raleigh, NC, USA

Senior level

Edtech

Maintain and improve site performance, uptime, and scalability. Build monitoring, alerting, runbooks, deployment tooling, and scalable architecture. Troubleshoot across the stack and partner with application teams to deliver reliable production systems.

Top Skills: AWSBashCC++DockerGCPJavaKubernetesPerlPython

Practice by Numbers

Sr. Site Reliability Engineer

7 Days AgoSaved

Remote or Hybrid

Redmond, WA, USA

120K-150K Annually

Senior level

120K-150K Annually

Senior level

Healthtech • Software • Analytics • Business Intelligence

Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.

Top Skills: AWSAzureDockerGCPGithub ActionsGoGrafanaJavaKubernetesOpentelemetryPrometheusPythonTerraformTypescript

TP-Link USA Corporation

Senior Site Reliability Engineer

8 Days AgoSaved

In-Office

Irvine, CA, USA

140K-180K Annually

Senior level

140K-180K Annually

Senior level

Hardware • Manufacturing

Lead implementation and operation of microservices on Kubernetes across multi-cloud environments. Build observability, run load/chaos tests, define SLOs/SLA/SLIs, automate with scripts, ensure security/compliance, lead incident response, perform DR planning, mentor teammates, and participate in on-call rotation.

Top Skills: Application SecurityAWSAzureBashData ProtectionGCPGdprGoHpaIdentity And Access Management (Iam)Iso27001JavaJvmKubernetesMicroservicesNetwork SecurityObservabilityOciPowershellPythonSoc2

Varda Space Industries

Senior Site Reliability Engineer

8 Days AgoSaved

In-Office

El Segundo, CA, USA

153K-185K Annually

Senior level

153K-185K Annually

Senior level

Aerospace • Hardware • Software • Biotech • Pharmaceutical • Manufacturing

Lead design, build, and operate mission-critical infrastructure across cloud, on-prem, and spacecraft contexts. Implement IaC, CI/CD, observability, and scalable Kubernetes-based systems; respond to incidents, perform root cause analysis, optimize performance, and collaborate with software and hardware teams. Participate in on-call rotations and occasional travel.

Top Skills: AnsibleArgocdAzureBashCi/CdContainerdDatabasesDockerFirewallsGitopsGpu WorkloadsGrafanaHpcInfluxdbKubernetesLinuxPowershellPrometheusPythonSaltSlurmSubnetsTerraformVpcVpns

CertifyOS

Senior Site Reliability Engineer

8 Days AgoSaved

Remote

Senior level

Healthtech • Social Impact • Software

Own the operational lifecycle of cloud-native data infrastructure: design and automate reliable deployments, observability, incident response, SLIs/SLOs, autoscaling and IaC, and improve platform efficiency and data freshness across GKE and Cloud Run.

Top Skills: BashBigQueryCloud BuildCloud MonitoringCloud RunDatadogDockerGCPGithub ActionsGkeGoGrafanaJIRAKubernetesPrometheusPulumiPythonSentrySlackSnykSonarqubeTerraform

RELX

Senior Site Reliability Engineer II

8 Days AgoSaved

In-Office or Remote

15 Locations

100K-210K Annually

Senior level

100K-210K Annually

Senior level

Information Technology • Legal Tech • Analytics

Design, build, and operate highly available AWS systems. Write and maintain Terraform, improve observability (Grafana, Pingdom, Uptrends), run on-call incident response, define SLOs/SLIs, build CI/CD with Azure DevOps/GitHub, automate operational work, document in Confluence, and mentor engineers.

Top Skills: AWSAzure DevopsCi/CdConfluenceDockerGitGitGrafanaJIRAKubernetesLinuxPingdomServicenowTerraformUptrends

RELX

Senior Site Reliability Engineer

8 Days AgoSaved

In-Office or Remote

9 Locations

105K-198K Annually

Senior level

105K-198K Annually

Senior level

Information Technology • Legal Tech • Analytics

Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS; manage and optimize cloud infrastructure; develop IaC and automation; implement CI/CD (GitHub Actions); monitor multi-region systems, troubleshoot incidents, perform root cause analysis; document best practices; and mentor junior engineers.

Top Skills: AWSAws EksCi/CdContainersGithub ActionsInfrastructure As CodeKubernetesNewrelicPythonRbac