Top Site Reliability Engineer Jobs

Reposted 3 Days AgoSaved
In-Office
North Bethesda, MD, USA
135K-165K Annually
Mid level
135K-165K Annually
Mid level
Artificial Intelligence
In this role, the Site Reliability Engineer will improve reliability and performance of infrastructure, write clean code, collaborate across teams, and maintain platforms for deployed software.
Top Skills: AWSCi/CdDockerJavaScriptKubernetesPythonTerraformUnix Shell
Reposted 3 Days AgoSaved
In-Office
2 Locations
Senior level
Senior level
Artificial Intelligence • Blockchain • Information Technology • Consulting
Design, scale, and operate multi-region, high-availability cloud infrastructure; lead incident response and on-call rotations; build automation and tooling in Python/Go; enforce risk, security, and operational standards; mentor teams and drive infrastructure architecture decisions.
Top Skills: AWSCryptoGCPGoIamIsoKafkaKubernetesOpentofuPostgresPythonRedpandaSoc2TerraformWeb3
Reposted 3 Days AgoSaved
In-Office
2 Locations
Mid level
Mid level
Artificial Intelligence
The Deployment Engineer will manage AI inference clusters, optimizing deployment, capacity allocation, and ensuring reliability of pipeline operations across datacenters.
Top Skills: DockerGrafanaInfluxdbK8SLinuxPrometheusPython
Reposted 3 Days AgoSaved
Hybrid
San Francisco, CA, USA
200K-240K Annually
Mid level
200K-240K Annually
Mid level
Artificial Intelligence • Logistics • Software
The Site Reliability Engineer will enhance operational resilience, ensuring system stability, observability, and debugging workflows for complex failures while improving developer focus and uptime.
Top Skills: DatadogGoPrometheusPythonSentry
Reposted 3 Days AgoSaved
In-Office
Palo Alto, CA, USA
232K-263K Annually
Senior level
232K-263K Annually
Senior level
Cybersecurity
As a Sr. Staff Site Reliability Engineer, you will define the reliability vision for a multi-tenant SaaS platform, lead the architecture of detection systems, and partner across teams to improve incident management and system resilience, ensuring issues are resolved before affecting customers.
Top Skills: ArgocdAWSGCPGitlab Ci/CdGrafanaHelmKubernetesPrometheus
Reposted 3 Days AgoSaved
In-Office
Birmingham, AL, USA
Senior level
Senior level
Automotive • Hardware • Logistics
The Site Reliability Engineer III enhances system reliability by building automation and supporting large-scale systems, ensuring critical platforms function optimally.
Top Skills: APIsAzure DevopsDynatraceGoogle Cloud PlatformGrafanaHTTPJavaKubernetesMicroservicesPrometheusTerraform
Reposted 3 Days AgoSaved
In-Office or Remote
7 Locations
200K-200K Annually
Mid level
200K-200K Annually
Mid level
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills: KubernetesLinuxOpenstackPython
Reposted 3 Days AgoSaved
In-Office
Pittsburgh, PA, USA
146K-162K Annually
Senior level
146K-162K Annually
Senior level
Financial Services
The Lead Site Reliability Engineer will establish the SRE operating model, implement AI-enabled reliability use cases, manage reliability metrics, and oversee operational readiness while collaborating with teams and mentoring engineers.
Top Skills: Ai/MlAnsibleAzure DevopsDockerGithub ActionsGitlab CiJenkinsKubernetesTerraformVMware
Reposted 3 Days AgoSaved
In-Office
San Francisco, CA, USA
230K-390K Annually
Senior level
230K-390K Annually
Senior level
Artificial Intelligence • Software
As a Software Engineer on the Site Reliability team, you'll ensure system reliability, scalability, and observability while partnering with engineering teams and improving incident management processes.
Top Skills: AWSCi/Cd ToolingContainer OrchestrationDatadogGrafanaPrometheusTerraform
Reposted 3 Days AgoSaved
In-Office
San Francisco, CA, USA
Mid level
Mid level
Information Technology • Software • Big Data Analytics
The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.
Top Skills: ApacheJava
Reposted 3 Days AgoSaved
In-Office
Mountlake Terrace, WA, USA
136K-231K Annually
Senior level
136K-231K Annually
Senior level
Insurance • Financial Services
Drive reliability and operational excellence across cloud, on-premise, and hybrid platforms. Build automation and AI-powered tooling, design observability and self-healing systems, standardize CI/CD and incident practices, lead post-incident reviews, and support production systems through on-call rotation while advising engineering teams on reliability, compliance, and modern DevOps practices.
Top Skills: Ai PlatformsC#Ci/CdCloudContainer PlatformsDockerEvent StreamingInfrastructure-As-CodeJavaJavaScriptKubernetesLlmsObservabilityPowershellPythonTelemetry
Reposted 3 Days AgoSaved
In-Office
Golden, CO, USA
103K-136K Annually
Senior level
103K-136K Annually
Senior level
Manufacturing
The Site Reliability Engineer will ensure the reliability, security, and support of Databricks applications while collaborating with various teams to optimize data workflows and incident management.
Top Skills: AzureCi/CdDatabricksDelta LakePysparkPythonSQLUnity Catalog
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 3 Days AgoSaved
In-Office
Alpharetta, GA, USA
Senior level
Senior level
Information Technology • Software
As Sr. Director, Platform Engineering & SRE, you will lead the reliability and operational excellence of the platform, establishing practices for site reliability engineering and managing cloud engineering across a multi-cloud SaaS portfolio.
Top Skills: AWSAzureCi/CdDatadogGCPGrafanaInfrastructure-As-CodeOpentelemetryPrometheus
Reposted 3 Days AgoSaved
In-Office
San Francisco, CA, USA
238K-290K Annually
Expert/Leader
238K-290K Annually
Expert/Leader
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform
Reposted 3 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-260K Annually
Mid level
200K-260K Annually
Mid level
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform
Reposted 3 Days AgoSaved
In-Office or Remote
2 Locations
95K-171K Annually
Junior
95K-171K Annually
Junior
Cloud • Security • Software • Cybersecurity
As a Site Reliability Engineer II, you'll automate tasks, monitor AI workloads, enhance dashboards, support CI/CD processes, and collaborate with engineering teams on complex issues while participating in on-call rotations.
Top Skills: GoGrafanaKubernetesLinuxPrometheusPythonSaltstackTerraform
Reposted 3 Days AgoSaved
Remote
USA
Mid level
Mid level
Software • Analytics
The role involves automating and managing AWS infrastructure, ensuring reliability and scalability of stateful systems, and optimizing deployment processes. You'll also handle incident responses and improve operational tooling.
Top Skills: AWSKubernetesTerraformTerragrunt
Reposted 3 Days AgoSaved
In-Office
Denver, CO, USA
160K-200K Annually
Mid level
160K-200K Annually
Mid level
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing observability infrastructure, and supporting SRE practices and cloud deployments.
Top Skills: AWSAzureCloudFormationDockerGCPGoJavaKubernetesNode.jsOpentelemetryPulumiPythonRustTerraform
Reposted 3 Days AgoSaved
In-Office
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Software
The Site Reliability Engineer ensures the reliability and performance of products Devin and Windsurf, managing incident response, CI/CD pipelines, infrastructure as code, and fostering a reliability culture within the engineering team.
Top Skills: AWSAzureCi/CdGCPKubernetesTerraform
Reposted 3 Days AgoSaved
Remote
United States
220K-250K Annually
Expert/Leader
220K-250K Annually
Expert/Leader
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills: AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Reposted 3 Days AgoSaved
Remote
United States
133K-211K Annually
Mid level
133K-211K Annually
Mid level
Cloud • Security • Software • Generative AI
Design, build, and automate large-scale multi-cloud infrastructure and internal SRE tools. Improve host lifecycle, observability, alerting, and reliability; operate containerized workloads; participate in on-call rotations, incident response, runbooks, postmortems, code reviews, and mentoring.
Top Skills: AnsibleArgo CdArgo WorkflowsCueDockerElastic StackGoGraphiteInfluxKubernetesLinuxPrometheusPuppetTerraformUbuntuUbuntu Live Patch
Reposted 3 Days AgoSaved
In-Office or Remote
2 Locations
165K-215K Annually
Senior level
165K-215K Annually
Senior level
Software • Cybersecurity
This role involves managing Kubernetes clusters, cloud infrastructure, and CI/CD pipelines. The engineer will enhance system reliability and efficiency while troubleshooting production issues.
Top Skills: AlertmanagerAWSAzureBashCi/CdDockerElastic StackElasticsearchGCPGoGrafanaHelmKafkaKubernetesLokiMongoDBOciPrometheusPythonRedisSparkTerraform
Reposted 3 Days AgoSaved
In-Office
Reston, VA, USA
136K-184K Annually
Senior level
136K-184K Annually
Senior level
Information Technology • Software
The Systems Engineer manages Linux systems, designs CI/CD pipelines, administers application security platforms, and ensures compliance with security standards.
Top Skills: AnsibleBashCloudbees JenkinsDockerElkGitGithub ActionsGithub Advanced SecurityJfrog ArtifactoryJfrog XrayJIRAKubernetesLinuxNagiosNexus IqPrometheusPythonTerraform
Reposted 3 Days AgoSaved
In-Office
Washington, DC, USA
188K-259K Annually
Senior level
188K-259K Annually
Senior level
Cloud
The Staff Site Reliability Engineer will lead the design of AWS solutions, manage incident responses, and mentor junior engineers, ensuring reliability and security in federal environments.
Top Skills: AWSDatabricksGoHelmKubernetesRedshiftSnowflakeTerraform
Reposted 3 Days AgoSaved
In-Office
Palo Alto, CA, USA
180K-360K Annually
Mid level
180K-360K Annually
Mid level
Information Technology
The role involves securing and maintaining the reliability of X Money's infrastructure, focusing on AWS, Kubernetes, and code security while implementing best practices and collaborative problem-solving.
Top Skills: AWSDynamoDBKubernetesPythonRdsTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account