Top Remote Site Reliability Engineer Jobs

Reposted YesterdaySaved
Remote
USA
Senior level
Senior level
Digital Media • Software • Sports
Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.
Top Skills: AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
Reposted 24 Days AgoSaved
In-Office or Remote
San Diego, CA, USA
Mid level
Mid level
Information Technology
The Site Reliability Engineer will implement reliability engineering practices, develop automation, maintain CI/CD pipelines, and ensure system health through monitoring.
Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
Reposted 24 Days AgoSaved
Remote
2 Locations
Senior level
Senior level
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
2 Days AgoSaved
Remote
USA
164K-220K Annually
Senior level
164K-220K Annually
Senior level
Robotics • Software
Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.
Top Skills: AWSBashContainerizationDockerGoGrafanaIamJetsonKubernetesLinuxPrometheusPythonRosRos 2Terraform
Reposted 2 Days AgoSaved
In-Office or Remote
8 Locations
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.
Top Skills: AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform
Reposted 25 Days AgoSaved
Remote
United States
115K-135K Annually
Mid level
115K-135K Annually
Mid level
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Reposted 25 Days AgoSaved
Remote or Hybrid
2 Locations
250K-295K Annually
Senior level
250K-295K Annually
Senior level
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills: ClickhouseGoPostgresPythonTypescript
3 Days AgoSaved
In-Office or Remote
2 Locations
121K-219K Annually
Senior level
121K-219K Annually
Senior level
Cloud • Security • Software • Cybersecurity
Lead reliability, automation, and observability for high-density AI hardware infrastructure. Build Python-based IaC tooling, telemetry pipelines, Prometheus/Grafana dashboards, and AI-assisted tooling. Run 24x7 incident response, coordinate vendors and field technicians, define operational readiness, and drive post-mortems to improve uptime and performance.
Top Skills: Bare-MetalBgpGrafanaIpv4Ipv6LlmsLokiOpentelemetryPagerdutyPrivate CloudPrometheusPythonSlackTimeseries EnginesVirtualized Environments
3 Days AgoSaved
In-Office or Remote
2 Locations
121K-219K Annually
Senior level
121K-219K Annually
Senior level
Cloud • Security • Software • Cybersecurity
Design, build, and operate scalable infrastructure and CI/CD/IaC systems. Implement observability (monitoring, logging, alerting), automate reliability improvements, mentor engineers, collaborate on incident response, and participate in on-call rotations to maintain Akamai Cloud services.
Top Skills: AlertingAnsibleBashChefCi/CdGithub ActionsGitlab Ci/CdGoInfrastructure As CodeJenkinsLoggingMonitoringPuppetPythonSaltstackTelemetryTerraform
3 Days AgoSaved
Remote
USA
Senior level
Senior level
Software
Drive SRE practices for VA enterprise healthcare platforms: automate infrastructure and CI/CD, define SLIs/SLOs, improve observability and reliability, support incident response, and ensure cloud-native, secure, compliant operations in AWS and containerized environments.
Top Skills: AnsibleAWSBashCi/CdCloudwatchDockerEcsEksElkGoGrafanaInfrastructure As CodeKubernetesLinuxOpentelemetryPowershellPrometheusPythonSplunkTerraform
Reposted 3 Days AgoSaved
Remote
2 Locations
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
Reposted 4 Days AgoSaved
In-Office or Remote
7 Locations
Senior level
Senior level
Software
The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.
Top Skills: AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 5 Days AgoSaved
Remote
USA
117K-181K Annually
Senior level
117K-181K Annually
Senior level
Other • Social Impact
As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.
Top Skills: AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform
Reposted 5 Days AgoSaved
Remote
USA
113K-176K Annually
Senior level
113K-176K Annually
Senior level
Other • Social Impact
The Senior Site Reliability Engineer is responsible for maintaining Wikimedia's infrastructure, improving reliability, automating processes, and collaborating with teams. The role involves troubleshooting, managing deployments, and leading incident responses while working remotely.
Top Skills: AnsibleBashCassandraDebianGoGrafanaHhvmKubernetesMariadbMemcachedPHPPrometheusPuppetPythonRedisRubyShell
6 Days AgoSaved
Remote
US
110K-137K Annually
Senior level
110K-137K Annually
Senior level
Financial Services
Prototype, write, test, document, and deploy release automation across environments. Build and maintain pipelines, collaborate with engineers and product teams, troubleshoot issues, participate in on-call rotation, and improve software delivery, configuration, monitoring, and operations.
Top Skills: AnsibleBashDockerGitlabJenkinsKubernetesMssqlPostgresPowershellPythonRedisTeamcity
6 Days AgoSaved
Remote or Hybrid
Redmond, WA, USA
120K-150K Annually
Senior level
120K-150K Annually
Senior level
Healthtech • Software • Analytics • Business Intelligence
Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.
Top Skills: AWSAzureDockerGCPGithub ActionsGoGrafanaJavaKubernetesOpentelemetryPrometheusPythonTerraformTypescript
7 Days AgoSaved
Remote
US
Senior level
Senior level
Healthtech • Social Impact • Software
Own the operational lifecycle of cloud-native data infrastructure: design and automate reliable deployments, observability, incident response, SLIs/SLOs, autoscaling and IaC, and improve platform efficiency and data freshness across GKE and Cloud Run.
Top Skills: BashBigQueryCloud BuildCloud MonitoringCloud RunDatadogDockerGCPGithub ActionsGkeGoGrafanaJIRAKubernetesPrometheusPulumiPythonSentrySlackSnykSonarqubeTerraform
7 Days AgoSaved
In-Office or Remote
15 Locations
100K-210K Annually
Senior level
100K-210K Annually
Senior level
Information Technology • Legal Tech • Analytics
Design, build, and operate highly available AWS systems. Write and maintain Terraform, improve observability (Grafana, Pingdom, Uptrends), run on-call incident response, define SLOs/SLIs, build CI/CD with Azure DevOps/GitHub, automate operational work, document in Confluence, and mentor engineers.
Top Skills: AWSAzure DevopsCi/CdConfluenceDockerGitGitGrafanaJIRAKubernetesLinuxPingdomServicenowTerraformUptrends
7 Days AgoSaved
In-Office or Remote
9 Locations
105K-198K Annually
Senior level
105K-198K Annually
Senior level
Information Technology • Legal Tech • Analytics
Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS; manage and optimize cloud infrastructure; develop IaC and automation; implement CI/CD (GitHub Actions); monitor multi-region systems, troubleshoot incidents, perform root cause analysis; document best practices; and mentor junior engineers.
Top Skills: AWSAws EksCi/CdContainersGithub ActionsInfrastructure As CodeKubernetesNewrelicPythonRbac
7 Days AgoSaved
Remote or Hybrid
US
180K-240K Annually
Senior level
180K-240K Annually
Senior level
Aerospace • Defense
Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.
Top Skills: C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)
7 Days AgoSaved
Remote
United States
160K-180K Annually
Senior level
160K-180K Annually
Senior level
Software
Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.
Top Skills: Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk
Reposted 7 Days AgoSaved
Remote
United States
Senior level
Senior level
Real Estate • Software
As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.
Top Skills: AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt
Reposted 8 Days AgoSaved
Remote
United States
141K-208K Annually
Senior level
141K-208K Annually
Senior level
Database • Analytics
This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.
Top Skills: AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform
9 Days AgoSaved
Remote
United States
Senior level
Senior level
Information Technology • Security • Cybersecurity
Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.
Top Skills: Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform
10 Days AgoSaved
Remote
United States
135K-170K Annually
Senior level
135K-170K Annually
Senior level
Big Data • Analytics
Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.
Top Skills: .NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account