Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Remote Site Reliability Engineer Jobs

Zscaler

Sr. Staff Site Reliability Engineer-Federal, Security Clearance

Reposted 2 Hours AgoSaved

Easy Apply

Remote or Hybrid

Crystal City, VA, USA

Easy Apply

140K-200K Annually

Senior level

140K-200K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

Responsible for managing operations within classified environments, overseeing cloud infrastructure, automating tasks, and ensuring system stability in a high-security setting.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

NBCUniversal

Site Reliability Engineer

Reposted 2 Hours AgoSaved

Remote or Hybrid

Centennial, CO, USA

110K-145K Annually

Mid level

110K-145K Annually

Mid level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

Build and maintain automation and reliability for live video distribution across on-prem and cloud. Deploy and manage systems, develop monitoring and automated recovery, troubleshoot complex incidents, coordinate with vendors, document SOPs, support live broadcast components, and participate in L2 on-call rotation.

Top Skills: AacAc3AnsibleAtscAvcAWSBashChefCloudFormationCmafDockerEksGitHevcHlsJavaScriptJSONKubernetesLinuxMicrosoft Graph ApiMpeg Transport StreamsPythonRistScte104Scte224Scte35SrtSsaiSt2022-7St2110StatmuxTerraformUnixXMLYmlZixi

NBCUniversal

Staff Software Engineer (SAP BTP SRE Lead)

Reposted 2 Hours AgoSaved

Remote or Hybrid

New York, NY, USA

130K-170K Annually

Senior level

130K-170K Annually

Senior level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

Oversee operational support of SAP BTP CPI applications, manage incidents, lead support specialists, and collaborate on architecture and governance for finance processes.

Top Skills: Abap ProxiesAemCapmCloud ConnectorCloud FoundryEdge Integration CellIdocJSONMessage QueuesOauthOdataRestSAMLSap BtpSfapiSftpSoapXML

JPMorganChase

Lead Site Reliability Engineer

YesterdaySaved

Remote or Hybrid

2 Locations

Senior level

Financial Services

Lead SRE responsible for resiliency design reviews, mentoring, SRE best-practice adoption, building IaC and CI/CD pipelines, operating containerized services, observability and SLO-driven incident prevention, 24x7 production support, and driving AI-assisted reliability workflows with governance and auditability.

Top Skills: .NetAWSCi/CdDatadogDnsDockerDynatraceEcsGitlabGrafanaJavaJenkinsKafkaKubernetesLinuxLoad BalancingPrometheusPythonSplunkSpring BootTcp/IpTerraformTls

NBCUniversal

Staff Site Reliability Engineer (Collaboration Engineering)

Reposted 2 Days AgoSaved

Remote or Hybrid

Orlando, FL, USA

Expert/Leader

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

The Staff Site Reliability Engineer is responsible for ensuring the reliability, performance, and security of workplace collaboration services, focusing on automation, incident management, and operational excellence while providing technical leadership and mentoring to engineers.

Top Skills: Ai EngineeringAzure Virtual DesktopDefender For Office 365Exchange OnlineGraph ApiIntuneJamf ProMicrosoft 365Microsoft Entra IdMicrosoft PurviewOnedrivePowershellSharepoint OnlineTeams

Citadel

Site Reliability Engineer

Reposted 3 Days AgoSaved

In-Office or Remote

4 Locations

105K-300K Annually

Entry level

105K-300K Annually

Entry level

Information Technology • Software • Financial Services • Big Data Analytics

SREs at Citadel focus on optimizing and maintaining system reliability, performance, and automation for investment applications, collaborating closely with teams.

Top Skills: Ci/CdCSSJavaScriptPythonReactSQL

Zscaler

Staff Site Reliability Engineer (Production Engineer)

Reposted 3 Days AgoSaved

Easy Apply

Remote or Hybrid

9 Locations

Easy Apply

119K-170K Annually

Senior level

119K-170K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

As a Staff Site Reliability Engineer, you'll oversee Zscaler production data center services, optimize code, and ensure cloud service availability and performance. Collaborate with cross-functional teams to improve processes and resolve escalated issues.

Top Skills: BashDnsFirewallsGrafanaHTTPIcmpLoad BalancingNagiosOsi ModelPrometheusPythonTcp/Ip

MongoDB

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Reposted 4 Days AgoSaved

Easy Apply

Remote or Hybrid

6 Locations

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls

MongoDB

Staff Site Reliability Engineer, Fabric

Reposted 4 Days AgoSaved

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Expert/Leader

127K-249K Annually

Expert/Leader

Big Data • Cloud • Software • Database

Seeking a Site Reliability Engineer with expertise in networking and distributed systems for building secure multi-cloud infrastructure. Responsibilities include maintaining network architecture and ensuring reliable service-to-service communication, involving a 24/7 on-call rotation.

Top Skills: AWSAzureBgpDnsGCPIpv6KubernetesLoad BalancingMtlsService MeshTcp/IpTlsVpcsVpns

Radar

Senior / Staff Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office or Remote

New York, NY, USA

150K-250K Annually

Mid level

150K-250K Annually

Mid level

Mobile • Software

Site Reliability Engineers will work on production infrastructure, focusing on AWS and Kubernetes while ensuring high availability and customer satisfaction.

Top Skills: AirflowAWSCircleCICloudwatchEksGrafanaMongoDBPagerdutyPingdomRustScala SparkTerraformTypescript

Cohere Health

Site Reliability Engineer ll

6 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

Operate and maintain AWS-hosted MERN applications and large-scale data workflows. Manage serverless and Spark-based pipelines, perform incident response and on-call duties, engineer automation to eliminate operational toil, ensure HIPAA/SOC2/HITRUST compliance, build observability and lead blameless post-mortems.

Top Skills: Amazon EcsAmazon EksAmazon EmrAthenaAws GlueAws LambdaAws SnsAws SqsCloudwatchEc2IamJavaScriptMernMySQLNode.jsOpentofuPysparkPythonRabbitMQTerraformTypescriptVpc

Openly

DevOps/SRE II (Remote, US)

Reposted 7 Days AgoSaved

Easy Apply

Remote

United States

Easy Apply

115K-130K Annually

Junior

115K-130K Annually

Junior

Insurance

As a Site Reliability Engineer II, you will build, test, and maintain the technology infrastructure for Openly's insurance platform, focusing on automation, monitoring, incident response, and operational decisions.

Top Skills: Aiven DebeziumArcgisBigQueryCircleCICloud FunctionsCloud RunCloudsqlComposer/AirflowDatadogFivetranGcp GcsGitGoGCPJupyter NotebooksKafkaKubernetesNuxtPostgresPub/SubPythonRSQLTailwindTerraformVuejsWebpack

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Movable Ink

Lead Site Reliability Engineer

Reposted 7 Days AgoSaved

Easy Apply

Remote or Hybrid

Ontario, CA, USA

Easy Apply

Senior level

Artificial Intelligence • Marketing Tech • Software

Lead technical reliability initiatives across a multi-cloud, multi-region active-active content platform. Architect and evolve core services, observability and logging, automation and capacity planning. Mentor engineers, drive cross-team reliability projects, define standards (IaC, SLOs, on-call) and proactively improve platform scalability and incident outcomes.

Top Skills: Apache KafkaApache PulsarAWSCassandraChefEksGCPGkeGoGrafana AlloyGrafana LokiKubernetesLinuxNode.jsPrometheusPythonRubyScylladbShell ScriptingTempoTerraformThanos

Domino Data Lab

Staff Site Reliability Engineer

Reposted 8 Days AgoSaved

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

Dropbox

Staff Site Reliability Engineer, Production Engineering

Reposted 8 Days AgoSaved

Remote

United States

223K-302K Annually

Expert/Leader

223K-302K Annually

Expert/Leader

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy

The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.

Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos

DraftKings

Principal Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote or Hybrid

United States

200K-250K Annually

Senior level

200K-250K Annually

Senior level

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Lead long-term strategy and architecture for cloud and on‑prem platform infrastructure, driving Kubernetes and multi‑cloud reliability, IaC/GitOps automation, observability, SLO/SLI/error‑budget practices, incident leadership, AI‑augmented tooling adoption, and mentorship of senior engineers to improve platform resilience and developer experience.

Top Skills: Amazon Elastic Kubernetes Service (Eks)AutoscalingAWSCapacity PlanningCi/CdGitopsGoGoogle Cloud PlatformGoogle Kubernetes Engine (Gke)Identity And Access ManagementInfrastructure As CodeKubernetesLinuxNetworkingObservabilityOperatorsPulumiPythonRke2StorageTerraform

Optum

Principal Site Reliability Engineer - Remote

Reposted 9 Days AgoSaved

In-Office or Remote

Minnetonka, MN, USA

Expert/Leader

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Define and scale SRE standards across teams, implement SLOs/SLIs/error budgets, build observability and resiliency patterns, drive automation and AIOps, improve reliability for large-scale Azure cloud systems, and influence engineering and platform teams.

Top Skills: Ai/MlAiopsAutomationAzureError BudgetsIncident ManagementLogsObservability (MetricsOpentelemetrySlisSlosTracing)

MongoDB

Site Reliability Engineer (Senior or Staff), Deployments

Reposted 10 Days AgoSaved

Easy Apply

Remote or Hybrid

7 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.

Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls

Runpod

Site Reliability Engineer

13 Days AgoSaved

Remote

USA

150K-200K Annually

Senior level

150K-200K Annually

Senior level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

Ensure stability and resilience of Runpod's distributed AI platform by defining SLIs/SLOs, leading incident response, building observability and reliability tooling, automating operational workflows, and partnering with engineering teams to reduce toil and improve production readiness.

Top Skills: BashCi/CdContainerized Production SystemsGoGpu Observability ToolingGrafanaInfrastructure As CodeLinuxPrometheusPython

Zscaler

Site Reliability Engineer-SkillBridge Intern

Reposted 13 Days AgoSaved

Easy Apply

Remote or Hybrid

USA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

Zscaler

Site Reliability Engineer Federal- SkillBridge Intern

Reposted 13 Days AgoSaved

Easy Apply

Remote or Hybrid

Virginia, USA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

As an intern, manage operational tasks in classified environments, develop automation tools, create documentation, and enhance services for Zscaler's cloud security platform.

Top Skills: Aws EcsKubernetesPython

GitLab

Site Reliability Engineer, Cloud Cost Utilization

Reposted 14 Days AgoSaved

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.

Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Reposted 16 Days AgoSaved

Easy Apply

Remote or Hybrid

5 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.

Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 20 Days AgoSaved

Easy Apply

Remote or Hybrid

10 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

DFIN

Sr. Site Reliability Engineer

A Minute AgoSaved

Remote or Hybrid

United States

Senior level

Fintech • Software

Lead SRE efforts for DFIN SaaS: ensure availability, performance, scalability, and automation. Implement monitoring, CI/CD, IaC, container orchestration, AI-enhanced observability, incident response, RCA, and runbook automation while collaborating across engineering teams.

Top Skills: .NetAiopsAksAnsibleAppdynamicsAWSAzureAzure DevopsBashC#Ci/CdCloud Ai ServicesContainersCosmosDatadogDynatraceEksFirewallHarnessIdera Sql Diagnostic ManagerInfrastructure As Code (Iac)JavaJenkinsKubernetesLinuxLoad BalancingNew RelicPowershellPythonRedgate Sql MonitorSolarwinds Database Performance AnalyzerSQLTerraformWindows