Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Blackstone

Site Reliability Engineer - Data, Cloud & Developer Experience

Reposted 12 Days AgoSaved

In-Office

New York, NY, USA

140K-225K Annually

Senior level

140K-225K Annually

Senior level

Fintech

Lead adoption of SRE practices to improve reliability, observability, automation, and incident response. Implement and maintain observability tooling, instrumentation, CI/CD, and infrastructure-as-code. Partner with developers, participate in on-call rotations, drive postmortems, and reduce operational overhead through automation.

Top Skills: AnthropicAWSAws EcsAws EksAzureC#DockerGitlab CiGrafanaLinuxOpenaiPrometheusPuppetPythonSplunkTerraformTypescriptWindows

Hadrian

Site Reliability Engineer, Robotics

Reposted 12 Days AgoSaved

In-Office

Los Angeles, CA, USA

164K-270K Annually

Mid level

164K-270K Annually

Mid level

Aerospace • Hardware • Software • Defense • Manufacturing

As a Site Reliability Engineer, you'll ensure robotics system reliability, build telemetry integration, and develop tools for diagnostics and automation, collaborating with engineering teams for enhanced production reliability.

Top Skills: C++DatadogGoKubernetesOpentelemetryPrometheusPythonRos2TelegrafTypescript

WorkOS

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

2 Locations

175K-275K Annually

Mid level

175K-275K Annually

Mid level

Software

As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.

Top Skills: AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript

GM Financial

Site Reliability Engineer I

Reposted 13 Days AgoSaved

Hybrid

2 Locations

Mid level

Fintech • Financial Services

The Site Reliability Engineer will support cloud infrastructure, automate deployments, and ensure operational efficiency and governance across public cloud platforms.

Top Skills: AnsibleAWSAzureAzure CliAzure FunctionsAzure Kubernetes ServiceCosmodbGCPGitJenkinsKubernetesLinuxPowershellTerraformWindows

Waabi

Senior / Staff Software Engineer (Observability / SRE)

Reposted 13 Days AgoSaved

Remote or Hybrid

4 Locations

148K-249K Annually

Senior level

148K-249K Annually

Senior level

Transportation

Design and develop Waabi's observability stack, optimize performance, build automation tooling, and support application requirements while leading projects and mentoring teams.

Top Skills: AWSC/C++DockerGoGrafanaJavaKubernetesOpentelemetryPythonRust

MUFG

Site Reliability Engineer - Web Applications, AVP

Reposted 13 Days AgoSaved

In-Office

2 Locations

112K-137K Annually

Senior level

112K-137K Annually

Senior level

Fintech

The Site Reliability Engineer will manage AWS infrastructures, oversee application deployments, and ensure system reliability and security while collaborating with teams.

Top Skills: AWSBashCodebuildCodedeployCodepipelineEc2IamPythonRdsRoute 53S3TerraformVpc

Plenful

Site Reliability Engineer

Reposted 13 Days AgoSaved

In-Office

San Francisco, CA, USA

Senior level

Artificial Intelligence • Healthtech

The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.

Top Skills: AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute

Optum

Senior Site Reliability Engineer - Observability - Remote

Reposted 19 Days AgoSaved

In-Office or Remote

La Crosse, WI, USA

92K-164K Annually

Mid level

92K-164K Annually

Mid level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Senior Observability Engineer maintains monitoring systems, designs log aggregation solutions, automates tasks with scripts, and ensures platform performance.

Top Skills: AnsibleBashDynatraceElasticsearchElkFilebeatFluentbitFluentdGrafanaLinuxLogstashOtelPowershellPrometheusPythonTerraform

Taxwell

Site Reliability Engineer

Reposted 14 Days AgoSaved

In-Office or Remote

Washington, DC, USA

Entry level

Fintech • Information Technology • Professional Services • Software

The Site Reliability Engineer serves as a consultant for Taxwell, focusing on ensuring the reliability and performance of their tax preparation software.

Blaxel

Site Reliability Engineer

Reposted 14 Days AgoSaved

In-Office

San Francisco, CA, USA

175K-250K Annually

Mid level

175K-250K Annually

Mid level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.

Top Skills: AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform

SpaceX

Site Reliability Engineer, GNC

Reposted 14 Days AgoSaved

In-Office

Hawthorne, CA, USA

125K-175K Annually

Mid level

125K-175K Annually

Mid level

Aerospace • Other

The Site Reliability Engineer, GNC at SpaceX oversees mission-critical GNC products, operates servers, maintains HPC clusters, and enhances services and infrastructure to support space operations.

Top Skills: AnsibleBazelDockerGradleKubernetesLinuxMakeNpmPipPuppetPythonTerraformVagrant

SambaNova Systems

Cloud Site Reliability Engineer

Reposted 14 Days AgoSaved

In-Office

San Jose, CA, USA

Mid level

Artificial Intelligence • Hardware • Machine Learning • Natural Language Processing • Software • Generative AI

As a Cloud Site Reliability Engineer, you will ensure the reliability, performance, and scalability of AI inferencing services, participate in on-call rotations, manage cloud infrastructure, and automate CI/CD processes while collaborating on incident management and capacity planning.

Top Skills: ArgocdCloudFormationDatadogDockerElk StackGithub ActionsGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

Trimble

Site Reliability Engineer

Reposted 15 Days AgoSaved

In-Office

Westminster, CO, USA

106K-145K Annually

Mid level

106K-145K Annually

Mid level

Hardware • Information Technology • Other • Software • Analytics

Architect and operate ML/agent pipelines and infrastructure, deploy and monitor models at scale, pioneer MLOps/Agent Ops best practices, collaborate with domain experts, and test/optimize ML systems for production reliability and cost efficiency.

Top Skills: Bash ScriptingContainerization (E.G.Docker)Git/GithubLinuxModel VersioningMonitoringNumpyPandasPythonPyTorchScikit-Learn

Pod Network

Site Reliability Engineer (APAC)

15 Days AgoSaved

In-Office or Remote

17 Locations

Mid level

Information Technology • Software • Web3 • Infrastructure as a Service (IaaS)

Operate and improve the Pod platform: respond to incidents, investigate root causes, build automation and observability, design monitoring/alerting, reduce alert fatigue, and drive reliability improvements across production systems.

Top Skills: BashCi/CdCloudDockerGrafanaLinuxPagerdutyPrometheusPythonRust

Gradial

Principal SRE

15 Days AgoSaved

In-Office

Seattle, WA, USA

180K-240K Annually

Senior level

180K-240K Annually

Senior level

Artificial Intelligence • Software • Generative AI

Lead reliability, scalability, and operational health of a production platform. Evolve Kubernetes, CI/CD, IaC, and observability. Build tooling and automation, improve monitoring/incident response, partner with engineering to identify and mitigate scaling risks, and influence platform direction across reliability, security, performance, and cost.

Top Skills: Ci/CdCloud-Native ArchitectureContainer OrchestrationGitopsGpu ProvisioningIncident ResponseInfrastructure As CodeKubernetesLoggingMetricsMulti-CloudObservabilityPythonTracingTypescript

BNY

Director, Splunk Platform Engineering & SRE

Reposted 15 Days AgoSaved

In-Office

New York, NY, USA

147K-310K Annually

Expert/Leader

147K-310K Annually

Expert/Leader

Fintech • Financial Services

The Director of Splunk Platform Engineering & SRE owns the enterprise Splunk platform, drives incident resolution, optimizes systems, and mentors engineers, focusing on automation and performance.

Top Skills: AnsibleGitGoJavaKubernetesLinux/UnixMoogPrometheusPythonSplunk

HHAeXchange

SRE Technical Project Manager

Reposted 15 Days AgoSaved

Remote

United States

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.

Top Skills: Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty

United States Cold Storage

SITE RELIABILITY ENGINEER

Reposted 15 Days AgoSaved

Hybrid

Camden, NJ, USA

130K-150K Annually

Mid level

130K-150K Annually

Mid level

Information Technology • Logistics • Transportation • Analytics • Business Intelligence • 3PL: Third Party Logistics • Industrial

As a Site Reliability Engineer, you'll enhance reliability for Phenix WMS and automation systems, focusing on incident reduction and system health through observability and automation. Responsibilities include defining SLIs and SLOs, participating in incident response, and testing disaster recovery plans.

Top Skills: AnsibleAzureBashCi/CdKubernetesPowershellPythonTerraform

CME Group

Site Reliability Engineer II

Reposted 15 Days AgoSaved

In-Office

Wacker, IL, USA

94K-157K Annually

Junior

94K-157K Annually

Junior

Financial Services

As a Site Reliability Engineer II, you will build, operate, and scale systems for CME Group's Clearing portfolio. Responsibilities include collaborating with teams, monitoring services, scripting for efficiency, and improving system performance, particularly during the migration to Google Cloud Platform.

Top Skills: BashGoogle Cloud PlatformGrafanaKubernetesLinuxOpentelemetryPrometheusPythonSplunk

Allegion

Director, DevSecOps& SRE

Reposted 15 Days AgoSaved

In-Office

2 Locations

164K-222K Annually

Expert/Leader

164K-222K Annually

Expert/Leader

Security

The Director of DevSecOps and SRE will lead teams in SRE, Cloud Infrastructure, and DevOps practices, focusing on automation, infrastructure reliability, and security policies while mentoring engineers and managing software projects.

Top Skills: Aws Cloud TechnologiesGitlabGrafanaJavaKubernetesLokiMaterial UiPostgresPrometheusRabbitMQReactReduxSentrySpringTailwindTerraform

Figure.ai

Staff Site Reliability Engineer

Reposted 15 Days AgoSaved

In-Office

San Jose, CA, USA

175K-250K Annually

Senior level

175K-250K Annually

Senior level

Artificial Intelligence • Robotics • Automation • Manufacturing

Responsible for managing and setting up internal systems infrastructure, migrating SaaS to self-hosted solutions, implementing monitoring systems, and ensuring security compliance.

Top Skills: AnsibleAWSAzureCloudFormationDatadogDnsGCPGrafanaHTTPLinux/UnixPrometheusTcp/IpTerraform

NVIDIA

Site Reliability Engineer - Hardware Infrastructure

Reposted 15 Days AgoSaved

In-Office

Santa Clara, CA, USA

168K-334K Annually

Expert/Leader

168K-334K Annually

Expert/Leader

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Responsible for developing incident management guidelines, supporting production systems, defining reliability metrics, and driving automation for high service availability.

Top Skills: GoGrafanaPerlPrometheusPythonRuby

Booz Allen Hamilton

Site Reliability Engineer

Reposted 15 Days AgoSaved

In-Office

Herndon, VA, USA

87K-198K Annually

Senior level

87K-198K Annually

Senior level

Information Technology

As a Site Reliability Engineer, you'll develop resilient infrastructure, automate tasks, handle incident response, and support classified environments for the Intelligence Community.

Top Skills: ArgocdBitbucketElasticsearchGitlabJava SpringbootKafkaKubernetesMongoDBNifi

Booz Allen Hamilton

Site Reliability Engineer

Reposted 15 Days AgoSaved

In-Office

2 Locations

62K-141K Annually

Mid level

62K-141K Annually

Mid level

Information Technology

The Site Reliability Engineer will enhance infrastructure resilience, automate processes, and implement monitoring tools to support the Intelligence Community.

Top Skills: AWSConfluenceDockerGitJenkinsJIRAKubernetesLinuxNessusPacker

SitusAMC

Site Reliability Engineer - AWS - Remote

Reposted 16 Days AgoSaved

Remote

USA

110K-140K Annually

Senior level

110K-140K Annually

Senior level

Real Estate • Financial Services • PropTech

Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.

Top Skills: AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget