Top Site Reliability Engineer Jobs

Reposted 9 Days AgoSaved
In-Office or Remote
New York, NY, USA
150K-250K Annually
Mid level
150K-250K Annually
Mid level
Mobile • Software
Site Reliability Engineers will work on production infrastructure, focusing on AWS and Kubernetes while ensuring high availability and customer satisfaction.
Top Skills: AirflowAWSCircleCICloudwatchEksGrafanaMongoDBPagerdutyPingdomRustScala SparkTerraformTypescript
Reposted 9 Days AgoSaved
Hybrid
New York, NY, USA
175K-230K Annually
Senior level
175K-230K Annually
Senior level
Hardware • Healthtech • Software • Analytics
The Site Reliability Engineer will ensure high availability of Sage's platform, lead incident response, design reliable systems, and improve operational workflows.
Top Skills: Amazon Web ServicesDatadogGoGoogle Cloud PlatformGrafanaJavaKubernetesMySQLPostgresPrometheusPulumiPythonTerraform
Reposted 11 Days AgoSaved
Hybrid
Sunnyvale, CA, USA
140K-215K Annually
Expert/Leader
140K-215K Annually
Expert/Leader
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.
Top Skills: Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk
Reposted 11 Days AgoSaved
Hybrid
Sunnyvale, CA, USA
120K-180K Annually
Senior level
120K-180K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves managing production infrastructure across multiple cloud providers and Kubernetes, building CI/CD pipelines, ensuring system reliability, and implementing security practices.
Top Skills: ArgocdFluxGitopsGoGrafanaJaegerKubernetesOpentelemetryPrometheusPulumiTerraform
Reposted 11 Days AgoSaved
Hybrid
Pittsburgh, PA, USA
55K-152K Annually
Mid level
55K-152K Annually
Mid level
Machine Learning • Payments • Security • Software • Financial Services
The Technology Engineer - Mainframe Systems at PNC supports and enhances mainframe environments, ensuring system stability and performance, collaborating with various teams, and managing batch processes and file transfers.
Top Skills: CobolDb2File-AidIbm Mainframe TechnologiesTsoVsam
Reposted 11 Days AgoSaved
Easy Apply
Hybrid
San Jose, CA, USA
Easy Apply
123K-175K Annually
Senior level
123K-175K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves creating scalable solutions using Linux and Kubernetes, troubleshooting performance issues, maintaining security, and writing automation tools.
Top Skills: AnsibleBashDockerFirewall TechnologiesGoKubernetesKvmLinuxMulti-Factor AuthenticationOpenstackPgpPkiPythonSshUnix
Reposted 11 Days AgoSaved
Hybrid
O'Fallon, MO, USA
122K-207K Annually
Senior level
122K-207K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Lead Site Reliability Engineer will ensure the reliability and performance of Mastercard's applications, mentor junior engineers, and improve service lifecycle through automation and DevOps practices.
Top Skills: GoJavaPythonSpring Framework
Reposted 11 Days AgoSaved
Remote or Hybrid
2 Locations
160K-255K Annually
Senior level
160K-255K Annually
Senior level
Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.
Top Skills: Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript
Reposted 11 Days AgoSaved
In-Office
Englewood, CO, USA
110K-157K Annually
Senior level
110K-157K Annually
Senior level
Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
The Lead DevOps Site Reliability Engineer drives automation in software development, manages cloud stacks, supports containerization, and leads response to outages.
Top Skills: AnsibleAWSBashDockerDynatraceGitlab CiGoGCPHelmJavaJenkinsKubernetesLinuxOciPHPPythonRancherSpringTerraform
Reposted 12 Days AgoSaved
Easy Apply
Remote or Hybrid
Ontario, CA, USA
Easy Apply
Senior level
Senior level
Artificial Intelligence • Marketing Tech • Software
Lead technical reliability initiatives across a multi-cloud, multi-region active-active content platform. Architect and evolve core services, observability and logging, automation and capacity planning. Mentor engineers, drive cross-team reliability projects, define standards (IaC, SLOs, on-call) and proactively improve platform scalability and incident outcomes.
Top Skills: Apache KafkaApache PulsarAWSCassandraChefEksGCPGkeGoGrafana AlloyGrafana LokiKubernetesLinuxNode.jsPrometheusPythonRubyScylladbShell ScriptingTempoTerraformThanos
15 Days AgoSaved
Hybrid
O'Fallon, MO, USA
76K-127K Annually
Mid level
76K-127K Annually
Mid level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Ensure reliability, scalability, and performance of Mastercard applications by implementing observability, automation, CI/CD, and cloud infrastructure best practices. Support production readiness, triage incidents, perform root-cause analysis and blameless post-mortems, mentor developers, and drive operational standards, capacity planning, and risk/compliance activities to maximize service availability and customer experience.
Top Skills: AWSAzureBashBitbucketCi/CdContainerizationDynatraceGCPGoJenkinsLinux/UnixOrchestrationPcfPythonSplunkXlr
15 Days AgoSaved
In-Office or Remote
Minnetonka, MN, USA
Expert/Leader
Expert/Leader
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Define and scale SRE standards across teams, implement SLOs/SLIs/error budgets, build observability and resiliency patterns, drive automation and AIOps, improve reliability for large-scale Azure cloud systems, and influence engineering and platform teams.
Top Skills: Ai/MlAiopsAutomationAzureError BudgetsIncident ManagementLogsObservability (MetricsOpentelemetrySlisSlosTracing)
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 15 Days AgoSaved
Hybrid
Denver, CO, USA
160K-180K Annually
Expert/Leader
160K-180K Annually
Expert/Leader
Information Technology • Insurance • Software
The Principal Site Reliability Engineer will lead the enterprise's reliability, scalability, and performance efforts, influencing architecture, managing incidents, and fostering a proactive engineering culture across teams.
Top Skills: .NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactRelational DatabasesWindows
Reposted 15 Days AgoSaved
Hybrid
Denver, CO, USA
75K-120K Annually
Mid level
75K-120K Annually
Mid level
Information Technology • Insurance • Software
The Site Reliability Engineer II ensures system reliability, participates in incident responses, and automates tasks to enhance operational health in production environments.
Top Skills: .NetAWSC#Ci/Cd PipelinesGitlabJavaJenkinsPythonReact
16 Days AgoSaved
Easy Apply
Remote or Hybrid
7 Locations
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.
Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls
16 Days AgoSaved
Hybrid
2 Locations
197K-246K Annually
Mid level
197K-246K Annually
Mid level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical, second-line oversight of SRE and cloud engineering practices. Perform deep-dive risk analyses of cloud architectures, resiliency, CI/CD, observability, and Gen AI integrations. Produce data-driven risk findings, mitigation recommendations, and executive-facing reports while partnering with first-line engineers and leadership to ensure robust controls and operational reliability.
Top Skills: AWSAzureCi/CdCloud-NativeContainerizationDatadogElkGCPGenerative AiKubernetesPagerdutyPrometheusSplunk
Reposted 19 Days AgoSaved
Easy Apply
Remote or Hybrid
USA
Easy Apply
Internship
Internship
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform
Reposted 19 Days AgoSaved
Easy Apply
Remote or Hybrid
Virginia, USA
Easy Apply
Internship
Internship
Cloud • Information Technology • Security • Software • Cybersecurity
As an intern, manage operational tasks in classified environments, develop automation tools, create documentation, and enhance services for Zscaler's cloud security platform.
Top Skills: Aws EcsKubernetesPython
Reposted 19 Days AgoSaved
Easy Apply
Remote
US
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.
Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform
20 Days AgoSaved
Hybrid
3 Locations
209K-286K Annually
Senior level
209K-286K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical risk advisory for SRE and cloud-native engineering, assess resiliency, SLIs/SLOs, CI/CD, and observability, perform independent risk reviews, drive AI/automation adoption, and deliver executive-facing risk reporting and remediation guidance.
Top Skills: AutomationAWSAzureCi/CdCloud-Native ArchitecturesContainerizationDatadogElkGCPGen AiObservabilityPagerdutyPrometheusSplunk
Reposted 20 Days AgoSaved
In-Office or Remote
Eden Prairie, MN, USA
73K-130K Annually
Mid level
73K-130K Annually
Mid level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Site Reliability Engineer will design, develop, and support a secure cloud infrastructure while collaborating with development and DevOps teams, ensuring high performance and reliability of systems.
Top Skills: AWSAzureDynatraceGrafanaKubernetesPrometheusPulumiSplunkTerraform
Reposted 21 Days AgoSaved
Easy Apply
Hybrid
New York City, NY, USA
Easy Apply
111K-218K Annually
Mid level
111K-218K Annually
Mid level
Big Data • Cloud • Software • Database
The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.
Top Skills: AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls
Reposted 22 Days AgoSaved
Easy Apply
Remote or Hybrid
5 Locations
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Reposted 21 Hours AgoSaved
Hybrid
Austin, TX, USA
Senior level
Senior level
Logistics • Mobile • Productivity • Software • Transportation
The Senior Site Reliability Engineer will manage the reliability of Zello's data tier, contribute to monitoring and incident response while improving cloud infrastructure and database performance.
Top Skills: BashDockerElasticsearchGoKubernetesLokiMongoDBMySQLPrometheusPythonRedisScylladbTempo
Reposted 23 Days AgoSaved
In-Office
New York City, NY, USA
175K-275K Annually
Expert/Leader
175K-275K Annually
Expert/Leader
Artificial Intelligence • Cloud • Enterprise Web • Natural Language Processing • Software • App development • Automation
Design and implement large-scale distributed systems that integrate AI safely and reliably, focusing on infrastructure, observability, and security.
Top Skills: Cloud NetworkingContainersDistributed SystemsEvent Driven RuntimesKedaKnativeKubernetesMulti Cloud ArchitectureOperating SystemsScalability
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account