Maximum of 25 job preferences reached.
Top Remote Site Reliability Engineer Jobs
Digital Media • Software • Sports
Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.
Top Skills:
AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
Information Technology
The Site Reliability Engineer will implement reliability engineering practices, develop automation, maintain CI/CD pipelines, and ensure system health through monitoring.
Top Skills:
AnsibleAWSAzureBashDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills:
AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
2 Days AgoSaved
Robotics • Software
Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.
Top Skills:
AWSBashContainerizationDockerGoGrafanaIamJetsonKubernetesLinuxPrometheusPythonRosRos 2Terraform
Reposted 2 Days AgoSaved
Artificial Intelligence • Cloud • Information Technology • Software
Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.
Top Skills:
AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Cloud • Security • Software • Cybersecurity
Lead reliability, automation, and observability for high-density AI hardware infrastructure. Build Python-based IaC tooling, telemetry pipelines, Prometheus/Grafana dashboards, and AI-assisted tooling. Run 24x7 incident response, coordinate vendors and field technicians, define operational readiness, and drive post-mortems to improve uptime and performance.
Top Skills:
Bare-MetalBgpGrafanaIpv4Ipv6LlmsLokiOpentelemetryPagerdutyPrivate CloudPrometheusPythonSlackTimeseries EnginesVirtualized Environments
Cloud • Security • Software • Cybersecurity
Design, build, and operate scalable infrastructure and CI/CD/IaC systems. Implement observability (monitoring, logging, alerting), automate reliability improvements, mentor engineers, collaborate on incident response, and participate in on-call rotations to maintain Akamai Cloud services.
Top Skills:
AlertingAnsibleBashChefCi/CdGithub ActionsGitlab Ci/CdGoInfrastructure As CodeJenkinsLoggingMonitoringPuppetPythonSaltstackTelemetryTerraform
Software
Drive SRE practices for VA enterprise healthcare platforms: automate infrastructure and CI/CD, define SLIs/SLOs, improve observability and reliability, support incident response, and ensure cloud-native, secure, compliant operations in AWS and containerized environments.
Top Skills:
AnsibleAWSBashCi/CdCloudwatchDockerEcsEksElkGoGrafanaInfrastructure As CodeKubernetesLinuxOpentelemetryPowershellPrometheusPythonSplunkTerraform
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills:
DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
Software
The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.
Top Skills:
AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Reposted 5 Days AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.
Top Skills:
AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform
Other • Social Impact
The Senior Site Reliability Engineer is responsible for maintaining Wikimedia's infrastructure, improving reliability, automating processes, and collaborating with teams. The role involves troubleshooting, managing deployments, and leading incident responses while working remotely.
Top Skills:
AnsibleBashCassandraDebianGoGrafanaHhvmKubernetesMariadbMemcachedPHPPrometheusPuppetPythonRedisRubyShell
Financial Services
Prototype, write, test, document, and deploy release automation across environments. Build and maintain pipelines, collaborate with engineers and product teams, troubleshoot issues, participate in on-call rotation, and improve software delivery, configuration, monitoring, and operations.
Top Skills:
AnsibleBashDockerGitlabJenkinsKubernetesMssqlPostgresPowershellPythonRedisTeamcity
Healthtech • Software • Analytics • Business Intelligence
Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.
Top Skills:
AWSAzureDockerGCPGithub ActionsGoGrafanaJavaKubernetesOpentelemetryPrometheusPythonTerraformTypescript
Healthtech • Social Impact • Software
Own the operational lifecycle of cloud-native data infrastructure: design and automate reliable deployments, observability, incident response, SLIs/SLOs, autoscaling and IaC, and improve platform efficiency and data freshness across GKE and Cloud Run.
Top Skills:
BashBigQueryCloud BuildCloud MonitoringCloud RunDatadogDockerGCPGithub ActionsGkeGoGrafanaJIRAKubernetesPrometheusPulumiPythonSentrySlackSnykSonarqubeTerraform
Information Technology • Legal Tech • Analytics
Design, build, and operate highly available AWS systems. Write and maintain Terraform, improve observability (Grafana, Pingdom, Uptrends), run on-call incident response, define SLOs/SLIs, build CI/CD with Azure DevOps/GitHub, automate operational work, document in Confluence, and mentor engineers.
Top Skills:
AWSAzure DevopsCi/CdConfluenceDockerGitGitGrafanaJIRAKubernetesLinuxPingdomServicenowTerraformUptrends
Information Technology • Legal Tech • Analytics
Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS; manage and optimize cloud infrastructure; develop IaC and automation; implement CI/CD (GitHub Actions); monitor multi-region systems, troubleshoot incidents, perform root cause analysis; document best practices; and mentor junior engineers.
Top Skills:
AWSAws EksCi/CdContainersGithub ActionsInfrastructure As CodeKubernetesNewrelicPythonRbac
Aerospace • Defense
Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.
Top Skills:
C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)
Software
Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.
Top Skills:
Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk
Real Estate • Software
As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.
Top Skills:
AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt
Database • Analytics
This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.
Top Skills:
AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform
Information Technology • Security • Cybersecurity
Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.
Top Skills:
Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform
Big Data • Analytics
Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.
Top Skills:
.NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Remote Site Reliability Engineers
See AllPopular Job Searches
All Remote Software Engineer Jobs
Remote .NET Developer Jobs
Remote AI Engineer Jobs
Remote Android Developer Jobs
Remote Android Engineer Jobs
Remote Automation Engineer Jobs
Remote AWS Jobs
Remote Backend Engineer Jobs
Remote C# Jobs
Remote C++ Jobs
Remote Cloud Architect Jobs
Remote Cloud Engineer Jobs
Remote Design Engineer Jobs
Remote DevOps Engineer Jobs
Remote DevOps Jobs
Remote Embedded Software Engineer Jobs
Remote Engineering Director Jobs
Remote Engineering Manager Jobs
Remote Enterprise Architect Jobs
Remote Field Engineer Jobs
Remote Front-End Developer Jobs
Remote Front-End Engineer Jobs
Remote Full-Stack Engineer Jobs
Remote Game Developer Jobs
Remote Golang Jobs
Remote Hardware Engineer Jobs
Remote Infrastructure Engineer Jobs
Remote Integration Engineer Jobs
Remote iOS Developer Jobs
Remote iOS Engineer Jobs
Remote IT Engineer Jobs
Remote Java Developer Jobs
Remote Javascript Jobs
Remote Lead Software Engineer Jobs
Remote Linux Engineer Jobs
Remote Linux Jobs
Remote Network Engineer Jobs
Remote Perl Jobs
Remote PHP Developer Jobs
Remote Platform Engineer Jobs
Remote Principal Software Engineer Jobs
Remote Project Engineer Jobs
Remote Python Developer + Engineer Jobs
Remote Python Jobs
Remote QA Analyst Jobs
Remote QA Automation Engineer Jobs
Remote QA Engineer Jobs
Remote Ruby Jobs
Remote Sales Engineer Jobs
Remote Salesforce Administrator Jobs
Remote Salesforce Developer Jobs
Remote Salesforce Developer Jobs
Remote Scala Jobs
Remote Senior DevOps Engineer Jobs
Remote Software Architect Jobs
Remote Software Development Manager Jobs
Remote Software Engineering Manager Jobs
Remote Solutions Architect Jobs
Remote Solutions Engineer Jobs
Remote SRE Jobs
Remote Staff Software Engineer Jobs
Remote Systems Engineer Jobs
Remote Tech Lead Jobs
Remote Test Engineer Jobs
Remote VP of Engineering Jobs
Remote Web Developer Jobs
All Filters
Total selected ()
No Results
No Results









.png)




















