Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Remote Site Reliability Engineer Jobs

PlayOn Sports

Senior Site Reliability Engineer

Reposted YesterdaySaved

Remote

USA

Senior level

Digital Media • Software • Sports

Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.

Top Skills: AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform

Arctiq

Site Reliability Engineer

Reposted 24 Days AgoSaved

In-Office or Remote

San Diego, CA, USA

Mid level

Information Technology

The Site Reliability Engineer will implement reliability engineering practices, develop automation, maintain CI/CD pipelines, and ensure system health through monitoring.

Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform

Oscilar

Sr./Staff - Infrastructure/Site Reliability Engineer (SRE)

Reposted 24 Days AgoSaved

Remote

2 Locations

Senior level

Artificial Intelligence • Fintech • Software • Financial Services

The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.

Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform

Bedrock Ocean Exploration

Senior Site Reliability Engineer, Robotics & Cloud Infrastructure

2 Days AgoSaved

Remote

USA

164K-220K Annually

Senior level

164K-220K Annually

Senior level

Robotics • Software

Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.

Top Skills: AWSBashContainerizationDockerGoGrafanaIamJetsonKubernetesLinuxPrometheusPythonRosRos 2Terraform

Andromeda (andromeda.ai)

Senior Site Reliability Engineer - AI Infrastructure

Reposted 2 Days AgoSaved

In-Office or Remote

8 Locations

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.

Top Skills: AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform

Aalyria

Site Reliability Engineer

Reposted 25 Days AgoSaved

Remote

United States

115K-135K Annually

Mid level

115K-135K Annually

Mid level

Aerospace • Manufacturing

As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.

Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform

Unify (unifygtm.com)

Staff Site Reliability Engineer, Tech Lead

Reposted 25 Days AgoSaved

Remote or Hybrid

2 Locations

250K-295K Annually

Senior level

250K-295K Annually

Senior level

Artificial Intelligence • Software

As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.

Top Skills: ClickhouseGoPostgresPythonTypescript

Akamai Technologies

Senior Site Reliability Engineer

3 Days AgoSaved

In-Office or Remote

2 Locations

121K-219K Annually

Senior level

121K-219K Annually

Senior level

Cloud • Security • Software • Cybersecurity

Lead reliability, automation, and observability for high-density AI hardware infrastructure. Build Python-based IaC tooling, telemetry pipelines, Prometheus/Grafana dashboards, and AI-assisted tooling. Run 24x7 incident response, coordinate vendors and field technicians, define operational readiness, and drive post-mortems to improve uptime and performance.

Top Skills: Bare-MetalBgpGrafanaIpv4Ipv6LlmsLokiOpentelemetryPagerdutyPrivate CloudPrometheusPythonSlackTimeseries EnginesVirtualized Environments

Akamai Technologies

Senior Site Reliability Engineer

3 Days AgoSaved

In-Office or Remote

2 Locations

121K-219K Annually

Senior level

121K-219K Annually

Senior level

Cloud • Security • Software • Cybersecurity

Design, build, and operate scalable infrastructure and CI/CD/IaC systems. Implement observability (monitoring, logging, alerting), automate reliability improvements, mentor engineers, collaborate on incident response, and participate in on-call rotations to maintain Akamai Cloud services.

Top Skills: AlertingAnsibleBashChefCi/CdGithub ActionsGitlab Ci/CdGoInfrastructure As CodeJenkinsLoggingMonitoringPuppetPythonSaltstackTelemetryTerraform

i4DM

Senior Site Reliability Engineer

3 Days AgoSaved

Remote

USA

Senior level

Software

Drive SRE practices for VA enterprise healthcare platforms: automate infrastructure and CI/CD, define SLIs/SLOs, improve observability and reliability, support incident response, and ensure cloud-native, secure, compliant operations in AWS and containerized environments.

Top Skills: AnsibleAWSBashCi/CdCloudwatchDockerEcsEksElkGoGrafanaInfrastructure As CodeKubernetesLinuxOpentelemetryPowershellPrometheusPythonSplunkTerraform

AuthZed

Sr. Site Reliability Engineer

Reposted 3 Days AgoSaved

Remote

2 Locations

Senior level

Artificial Intelligence • Information Technology • Software • Database

As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.

Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform

OutSystems

Senior Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office or Remote

7 Locations

Senior level

Software

The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.

Top Skills: AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

Wikimedia Foundation

Senior Site Reliability Engineer, Wikimedia Enterprise

Reposted 5 Days AgoSaved

Remote

USA

117K-181K Annually

Senior level

117K-181K Annually

Senior level

Other • Social Impact

As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.

Top Skills: AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform

Wikimedia Foundation

Senior Site Reliability Engineer, Data Persistence

Reposted 5 Days AgoSaved

Remote

USA

113K-176K Annually

Senior level

113K-176K Annually

Senior level

Other • Social Impact

The Senior Site Reliability Engineer is responsible for maintaining Wikimedia's infrastructure, improving reliability, automating processes, and collaborating with teams. The role involves troubleshooting, managing deployments, and leading incident responses while working remotely.

Top Skills: AnsibleBashCassandraDebianGoGrafanaHhvmKubernetesMariadbMemcachedPHPPrometheusPuppetPythonRedisRubyShell

Alkami

Sr Site Reliability Engineer - Release

6 Days AgoSaved

Remote

110K-137K Annually

Senior level

110K-137K Annually

Senior level

Financial Services

Prototype, write, test, document, and deploy release automation across environments. Build and maintain pipelines, collaborate with engineers and product teams, troubleshoot issues, participate in on-call rotation, and improve software delivery, configuration, monitoring, and operations.

Top Skills: AnsibleBashDockerGitlabJenkinsKubernetesMssqlPostgresPowershellPythonRedisTeamcity

Practice by Numbers

Sr. Site Reliability Engineer

6 Days AgoSaved

Remote or Hybrid

Redmond, WA, USA

120K-150K Annually

Senior level

120K-150K Annually

Senior level

Healthtech • Software • Analytics • Business Intelligence

Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.

Top Skills: AWSAzureDockerGCPGithub ActionsGoGrafanaJavaKubernetesOpentelemetryPrometheusPythonTerraformTypescript

CertifyOS

Senior Site Reliability Engineer

7 Days AgoSaved

Remote

Senior level

Healthtech • Social Impact • Software

Own the operational lifecycle of cloud-native data infrastructure: design and automate reliable deployments, observability, incident response, SLIs/SLOs, autoscaling and IaC, and improve platform efficiency and data freshness across GKE and Cloud Run.

Top Skills: BashBigQueryCloud BuildCloud MonitoringCloud RunDatadogDockerGCPGithub ActionsGkeGoGrafanaJIRAKubernetesPrometheusPulumiPythonSentrySlackSnykSonarqubeTerraform

RELX

Senior Site Reliability Engineer II

7 Days AgoSaved

In-Office or Remote

15 Locations

100K-210K Annually

Senior level

100K-210K Annually

Senior level

Information Technology • Legal Tech • Analytics

Design, build, and operate highly available AWS systems. Write and maintain Terraform, improve observability (Grafana, Pingdom, Uptrends), run on-call incident response, define SLOs/SLIs, build CI/CD with Azure DevOps/GitHub, automate operational work, document in Confluence, and mentor engineers.

Top Skills: AWSAzure DevopsCi/CdConfluenceDockerGitGitGrafanaJIRAKubernetesLinuxPingdomServicenowTerraformUptrends

RELX

Senior Site Reliability Engineer

7 Days AgoSaved

In-Office or Remote

9 Locations

105K-198K Annually

Senior level

105K-198K Annually

Senior level

Information Technology • Legal Tech • Analytics

Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS; manage and optimize cloud infrastructure; develop IaC and automation; implement CI/CD (GitHub Actions); monitor multi-region systems, troubleshoot incidents, perform root cause analysis; document best practices; and mentor junior engineers.

Top Skills: AWSAws EksCi/CdContainersGithub ActionsInfrastructure As CodeKubernetesNewrelicPythonRbac

Loft Orbital

Senior Site Reliability Engineer

7 Days AgoSaved

Remote or Hybrid

180K-240K Annually

Senior level

180K-240K Annually

Senior level

Aerospace • Defense

Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.

Top Skills: C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)

Arkestro

Senior Site Reliability Engineer

7 Days AgoSaved

Remote

United States

160K-180K Annually

Senior level

160K-180K Annually

Senior level

Software

Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.

Top Skills: Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk

OfficeSpace Software

Senior Site Reliability Engineer

Reposted 7 Days AgoSaved

Remote

United States

Senior level

Real Estate • Software

As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.

Top Skills: AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt

ClickHouse

Senior Site Reliability Engineer- Remote

Reposted 8 Days AgoSaved

Remote

United States

141K-208K Annually

Senior level

141K-208K Annually

Senior level

Database • Analytics

This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.

Top Skills: AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform

Cato Networks

Senior SRE - Government Cloud Operations

9 Days AgoSaved

Remote

United States

Senior level

Information Technology • Security • Cybersecurity

Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.

Top Skills: Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform

Climavision

Senior Site Reliability Engineer (C#, .NET)

10 Days AgoSaved

Remote

United States

135K-170K Annually

Senior level

135K-170K Annually

Senior level

Big Data • Analytics

Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.

Top Skills: .NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform