Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Radar

Senior / Staff Site Reliability Engineer

Reposted 9 Days AgoSaved

In-Office or Remote

New York, NY, USA

150K-250K Annually

Mid level

150K-250K Annually

Mid level

Mobile • Software

Site Reliability Engineers will work on production infrastructure, focusing on AWS and Kubernetes while ensuring high availability and customer satisfaction.

Top Skills: AirflowAWSCircleCICloudwatchEksGrafanaMongoDBPagerdutyPingdomRustScala SparkTerraformTypescript

Sage

Senior/Staff Site Reliability Engineer

Reposted 9 Days AgoSaved

Hybrid

New York, NY, USA

175K-230K Annually

Senior level

175K-230K Annually

Senior level

Hardware • Healthtech • Software • Analytics

The Site Reliability Engineer will ensure high availability of Sage's platform, lead incident response, design reliable systems, and improve operational workflows.

Top Skills: Amazon Web ServicesDatadogGoGoogle Cloud PlatformGrafanaJavaKubernetesMySQLPostgresPrometheusPulumiPythonTerraform

CrowdStrike

Manager, Engineering - Dev Ops/SRE (Hybrid)

Reposted 11 Days AgoSaved

Hybrid

Sunnyvale, CA, USA

140K-215K Annually

Expert/Leader

140K-215K Annually

Expert/Leader

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.

Top Skills: Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk

CrowdStrike

SRE/Dev Ops Engineer (Hybrid, Sunnyvale)

Reposted 11 Days AgoSaved

Hybrid

Sunnyvale, CA, USA

120K-180K Annually

Senior level

120K-180K Annually

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

The role involves managing production infrastructure across multiple cloud providers and Kubernetes, building CI/CD pipelines, ensuring system reliability, and implementing security practices.

Top Skills: ArgocdFluxGitopsGoGrafanaJaegerKubernetesOpentelemetryPrometheusPulumiTerraform

PNC Bank

Technology Engineer - SRC (Run the Bank/SRE)

Reposted 11 Days AgoSaved

Hybrid

Pittsburgh, PA, USA

55K-152K Annually

Mid level

55K-152K Annually

Mid level

Machine Learning • Payments • Security • Software • Financial Services

The Technology Engineer - Mainframe Systems at PNC supports and enhances mainframe environments, ensuring system stability and performance, collaborating with various teams, and managing batch processes and file transfers.

Top Skills: CobolDb2File-AidIbm Mainframe TechnologiesTsoVsam

Zscaler

Staff Site Reliability Engineer

Reposted 11 Days AgoSaved

Easy Apply

Hybrid

San Jose, CA, USA

Easy Apply

123K-175K Annually

Senior level

123K-175K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

The role involves creating scalable solutions using Linux and Kubernetes, troubleshooting performance issues, maintaining security, and writing automation tools.

Top Skills: AnsibleBashDockerFirewall TechnologiesGoKubernetesKvmLinuxMulti-Factor AuthenticationOpenstackPgpPkiPythonSshUnix

Mastercard

Lead Site Reliability Engineer-2

Reposted 11 Days AgoSaved

Hybrid

O'Fallon, MO, USA

122K-207K Annually

Senior level

122K-207K Annually

Senior level

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing

The Lead Site Reliability Engineer will ensure the reliability and performance of Mastercard's applications, mentor junior engineers, and improve service lifecycle through automation and DevOps practices.

Top Skills: GoJavaPythonSpring Framework

Sprinter Health

Staff, Site Reliability Engineer (SRE)

Reposted 11 Days AgoSaved

Remote or Hybrid

2 Locations

160K-255K Annually

Senior level

160K-255K Annually

Senior level

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth

The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.

Top Skills: Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript

EchoStar

Lead DevOps Site Reliability Engineer

Reposted 11 Days AgoSaved

In-Office

Englewood, CO, USA

110K-157K Annually

Senior level

110K-157K Annually

Senior level

Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI

The Lead DevOps Site Reliability Engineer drives automation in software development, manages cloud stacks, supports containerization, and leads response to outages.

Top Skills: AnsibleAWSBashDockerDynatraceGitlab CiGoGCPHelmJavaJenkinsKubernetesLinuxOciPHPPythonRancherSpringTerraform

Movable Ink

Lead Site Reliability Engineer

Reposted 12 Days AgoSaved

Easy Apply

Remote or Hybrid

Ontario, CA, USA

Easy Apply

Senior level

Artificial Intelligence • Marketing Tech • Software

Lead technical reliability initiatives across a multi-cloud, multi-region active-active content platform. Architect and evolve core services, observability and logging, automation and capacity planning. Mentor engineers, drive cross-team reliability projects, define standards (IaC, SLOs, on-call) and proactively improve platform scalability and incident outcomes.

Top Skills: Apache KafkaApache PulsarAWSCassandraChefEksGCPGkeGoGrafana AlloyGrafana LokiKubernetesLinuxNode.jsPrometheusPythonRubyScylladbShell ScriptingTempoTerraformThanos

Mastercard

Site Reliability Engineer II

15 Days AgoSaved

Hybrid

O'Fallon, MO, USA

76K-127K Annually

Mid level

76K-127K Annually

Mid level

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing

Ensure reliability, scalability, and performance of Mastercard applications by implementing observability, automation, CI/CD, and cloud infrastructure best practices. Support production readiness, triage incidents, perform root-cause analysis and blameless post-mortems, mentor developers, and drive operational standards, capacity planning, and risk/compliance activities to maximize service availability and customer experience.

Top Skills: AWSAzureBashBitbucketCi/CdContainerizationDynatraceGCPGoJenkinsLinux/UnixOrchestrationPcfPythonSplunkXlr

Optum

Principal Site Reliability Engineer - Remote

15 Days AgoSaved

In-Office or Remote

Minnetonka, MN, USA

Expert/Leader

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Define and scale SRE standards across teams, implement SLOs/SLIs/error budgets, build observability and resiliency patterns, drive automation and AIOps, improve reliability for large-scale Azure cloud systems, and influence engineering and platform teams.

Top Skills: Ai/MlAiopsAutomationAzureError BudgetsIncident ManagementLogsObservability (MetricsOpentelemetrySlisSlosTracing)

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Vertafore

Principal Site Reliability Engineer

Reposted 15 Days AgoSaved

Hybrid

Denver, CO, USA

160K-180K Annually

Expert/Leader

160K-180K Annually

Expert/Leader

Information Technology • Insurance • Software

The Principal Site Reliability Engineer will lead the enterprise's reliability, scalability, and performance efforts, influencing architecture, managing incidents, and fostering a proactive engineering culture across teams.

Top Skills: .NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactRelational DatabasesWindows

Vertafore

Site Reliability Engineer II

Reposted 15 Days AgoSaved

Hybrid

Denver, CO, USA

75K-120K Annually

Mid level

75K-120K Annually

Mid level

Information Technology • Insurance • Software

The Site Reliability Engineer II ensures system reliability, participates in incident responses, and automates tasks to enhance operational health in production environments.

Top Skills: .NetAWSC#Ci/Cd PipelinesGitlabJavaJenkinsPythonReact

MongoDB

Site Reliability Engineer (Senior or Staff)

16 Days AgoSaved

Easy Apply

Remote or Hybrid

7 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.

Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls

Capital One

Manager, SRE Risk Advisory and Oversight

16 Days AgoSaved

Hybrid

2 Locations

197K-246K Annually

Mid level

197K-246K Annually

Mid level

Fintech • Machine Learning • Payments • Software • Financial Services

Lead technical, second-line oversight of SRE and cloud engineering practices. Perform deep-dive risk analyses of cloud architectures, resiliency, CI/CD, observability, and Gen AI integrations. Produce data-driven risk findings, mitigation recommendations, and executive-facing reports while partnering with first-line engineers and leadership to ensure robust controls and operational reliability.

Top Skills: AWSAzureCi/CdCloud-NativeContainerizationDatadogElkGCPGenerative AiKubernetesPagerdutyPrometheusSplunk

Zscaler

Site Reliability Engineer-SkillBridge Intern

Reposted 19 Days AgoSaved

Easy Apply

Remote or Hybrid

USA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

Zscaler

Site Reliability Engineer Federal- SkillBridge Intern

Reposted 19 Days AgoSaved

Easy Apply

Remote or Hybrid

Virginia, USA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

As an intern, manage operational tasks in classified environments, develop automation tools, create documentation, and enhance services for Zscaler's cloud security platform.

Top Skills: Aws EcsKubernetesPython

GitLab

Site Reliability Engineer, Cloud Cost Utilization

Reposted 19 Days AgoSaved

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.

Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform

Capital One

Senior Manager, SRE Risk Advisory and Oversight

20 Days AgoSaved

Hybrid

3 Locations

209K-286K Annually

Senior level

209K-286K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

Lead technical risk advisory for SRE and cloud-native engineering, assess resiliency, SLIs/SLOs, CI/CD, and observability, perform independent risk reviews, drive AI/automation adoption, and deliver executive-facing risk reporting and remediation guidance.

Top Skills: AutomationAWSAzureCi/CdCloud-Native ArchitecturesContainerizationDatadogElkGCPGen AiObservabilityPagerdutyPrometheusSplunk

Optum

Site Reliability Engineer - Remote

Reposted 20 Days AgoSaved

In-Office or Remote

Eden Prairie, MN, USA

73K-130K Annually

Mid level

73K-130K Annually

Mid level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Site Reliability Engineer will design, develop, and support a secure cloud infrastructure while collaborating with development and DevOps teams, ensuring high performance and reliability of systems.

Top Skills: AWSAzureDynatraceGrafanaKubernetesPrometheusPulumiSplunkTerraform

MongoDB

Site Reliability Engineer 3

Reposted 21 Days AgoSaved

Easy Apply

Hybrid

New York City, NY, USA

Easy Apply

111K-218K Annually

Mid level

111K-218K Annually

Mid level

Big Data • Cloud • Software • Database

The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.

Top Skills: AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Reposted 22 Days AgoSaved

Easy Apply

Remote or Hybrid

5 Locations

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.

Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform

Zello

Senior Site Reliability Engineer

Reposted 21 Hours AgoSaved

Hybrid

Austin, TX, USA

Senior level

Logistics • Mobile • Productivity • Software • Transportation

The Senior Site Reliability Engineer will manage the reliability of Zello's data tier, contribute to monitoring and incident response while improving cloud infrastructure and database performance.

Top Skills: BashDockerElasticsearchGoKubernetesLokiMongoDBMySQLPrometheusPythonRedisScylladbTempo

Superblocks

Infrastructure Engineer & SRE

Reposted 23 Days AgoSaved

In-Office

New York City, NY, USA

175K-275K Annually

Expert/Leader

175K-275K Annually

Expert/Leader

Artificial Intelligence • Cloud • Enterprise Web • Natural Language Processing • Software • App development • Automation

Design and implement large-scale distributed systems that integrate AI safely and reliably, focusing on infrastructure, observability, and security.

Top Skills: Cloud NetworkingContainersDistributed SystemsEvent Driven RuntimesKedaKnativeKubernetesMulti Cloud ArchitectureOperating SystemsScalability