Get the job you really want.

Top Site Reliability Engineer Jobs

Reposted 18 Days AgoSaved
Remote
United States
Expert/Leader
Expert/Leader
Cloud • Security • Software • Cybersecurity
The Staff Site Reliability Engineer will enhance AI/ML infrastructure, manage CI/CD pipelines, ensure system reliability, and troubleshoot applications, focusing on cloud-based operations.
Top Skills: AWSAzureBashDockerGitGitGCPGrafanaHuggingface TransformersKubernetesLlmPrometheusPythonPyTorchTensorrtTerraform
Reposted 18 Days AgoSaved
In-Office
2 Locations
Senior level
Senior level
eCommerce
The Staff Back-end Engineer (SRE) will build, run, and scale ecommerce systems, ensuring reliability and performance for customer-facing services, while utilizing automation and best practices.
Top Skills: AWSAzureDatadogDockerElastic StackGoGoogle Cloud PlatformGrafanaJavaKubernetesNew RelicPrometheusPythonRuby
Reposted 18 Days AgoSaved
In-Office
Lehi, UT, USA
Mid level
Mid level
Healthtech • Payments • Software
The SRE Specialist ensures the reliability and performance of data systems, collaborates with teams, and handles incident response and system monitoring.
Top Skills: AWSAzureCloudFormationGCPGrafanaKubernetesPowershellPrometheusPythonSplunkTerraform
Reposted 18 Days AgoSaved
In-Office
2 Locations
147K-230K Annually
Senior level
147K-230K Annually
Senior level
Insurance
The Senior Product Manager will drive core reliability platforms and services, guiding developer engineering products from conception to launch, improving system availability, incident management, and developer workflows.
Top Skills: AWSAzureCloud InfrastructureDeveloper ToolsGrafanaKubernetesObservability
Reposted 18 Days AgoSaved
Hybrid
Atlanta, GA, USA
Senior level
Senior level
Software
The Principal Site Reliability Engineer will enhance system reliability, implement monitoring systems, collaborate across teams, and ensure platform uptime and performance.
Top Skills: AWSAzureDatadogGCPGrafanaJavaKubernetesNode.jsPrometheusPython
Reposted 18 Days AgoSaved
Easy Apply
In-Office
Midtown, TN, USA
Easy Apply
Mid level
Mid level
Gaming
Manage operational tasks for gaming services, design runtime environments, monitor metrics, optimize architecture, and research software solutions.
Top Skills: C/C++GoIstioJavaK8SLinuxMySQLNginxPythonRustShell
Reposted 18 Days AgoSaved
In-Office or Remote
2 Locations
Senior level
Senior level
Artificial Intelligence • Software • Generative AI
As a Site Reliability Engineer, you'll design and maintain cloud infrastructure, automate provisioning, ensure system reliability, and mentor junior engineers while leveraging various technologies to optimize performance and security.
Top Skills: AWSAzureDockerElk StackGCPGoGrafanaJavaKubernetesPrometheusPythonScalaTerraform
Reposted 18 Days AgoSaved
Remote
United States
201K-287K Annually
Senior level
201K-287K Annually
Senior level
Cloud • Security • Software • Cybersecurity
As a Staff Site Reliability Engineer, you will lead SRE initiatives, mentor engineers, ensure system reliability, and drive strategic engineering practices globally.
Top Skills: C#GoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTypescript
Reposted 18 Days AgoSaved
Remote
United States
215K-307K Annually
Expert/Leader
215K-307K Annually
Expert/Leader
Cloud • Security • Software • Cybersecurity
The Principal Site Reliability Engineer will lead Veeam's global SRE efforts, focusing on architecture, reliability strategies, and mentorship while influencing cross-functional teams.
Top Skills: Automation ToolingCloud InfrastructureCloud-Native DevelopmentDistributed Systems
19 Days AgoSaved
Easy Apply
In-Office
Santa Clara, CA, USA
Easy Apply
Expert/Leader
Expert/Leader
Cloud • Software • Analytics
The Principal Cloud Site Reliability Engineer will lead the design and implementation of cloud infrastructure, manage CI/CD pipelines, mentor teams, and ensure secure, performant systems in AWS and Azure environments.
Top Skills: AnsibleAWSAzureBashChefDockerElkGrafanaJenkinsKubernetesMongoDBMySQLPostgresPrometheusPuppetPythonRdsSaltTerraform
19 Days AgoSaved
Remote
United States
160K-200K Annually
Senior level
160K-200K Annually
Senior level
Information Technology • Security • Cybersecurity
Lead a Site Reliability Engineering team to ensure product reliability, oversee incident management, and collaborate with other engineering teams on performance issues.
Top Skills: AWSCi/CdGCPGrafanaKubernetesPrometheusTerraform
19 Days AgoSaved
In-Office
Honolulu, HI, USA
100K-170K Annually
Junior
100K-170K Annually
Junior
Fintech
The Site Reliability Engineer will manage Kubernetes clusters, automate infrastructure, ensure cloud resource reliability, and collaborate across teams to enhance operational efficiency.
Top Skills: Amazon S3Apache MesosAWSAzureC/C++CephCloud InfrastructureDockerHdfsHelmInfrastructure As CodeJavaJavaScriptKubernetesLinuxNfsPostgresPythonRubyTerraformYarn
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 19 Days AgoSaved
In-Office or Remote
San Francisco, CA, USA
Mid level
Mid level
Artificial Intelligence • Generative AI
Lead GPU cluster design and operations, manage Kubernetes, implement Infrastructure-as-Code, and develop observability stacks for high-performance AI models.
Top Skills: AnsibleArgo CdBashEbpfFluxGitopsGrafanaHelmInfinibandKubernetesNvidia DcgmOpentelemetryPrometheusPythonRdmaTerraform
Reposted 19 Days AgoSaved
In-Office
2 Locations
159K-230K Annually
Senior level
159K-230K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Reposted 19 Days AgoSaved
Easy Apply
In-Office
Reston, VA, USA
Easy Apply
109K-147K Annually
Senior level
109K-147K Annually
Senior level
Information Technology • Software
The Site Reliability Engineer will manage and scale infrastructure, automate deployments, and lead efforts in operational process management while participating in a 24x7 on-call rotation.
Top Skills: AnsibleDockerFreebsdFreeipaJenkinsKubernetesLinuxOpenstackPythonRedhat Enterprise LinuxTerraform
Reposted 20 Days AgoSaved
Remote
USA
100K-720K Annually
Senior level
100K-720K Annually
Senior level
News + Entertainment
The role involves designing scalable infrastructure, collaborating for reliability, automating monitoring and response tools, managing incidents, and promoting reliability culture at Netflix.
Top Skills: AWSAzureGCPGoJavaKubernetesPythonTerraform
Reposted 20 Days AgoSaved
In-Office or Remote
2 Locations
150K-350K Annually
Senior level
150K-350K Annually
Senior level
Travel
Seeking a Senior Site Reliability Engineer to enhance platform infrastructure for scaling services in Google Cloud. Responsibilities include automation, incident response, and supporting engineering teams with reliable tools and systems.
Top Skills: BashDatadogGoogle Cloud PlatformHelmIstioKubernetesKustomizePythonTerraform
Reposted 20 Days AgoSaved
In-Office or Remote
3 Locations
Senior level
Senior level
Travel
The Senior Site Reliability Engineer will enhance platform tooling, drive automation of infrastructure components, and support teams by ensuring reliable and scalable cloud infrastructure on Google Cloud.
Top Skills: BashDatadogGoogle Cloud PlatformHelmIstioKubernetesKustomizePythonTerraform
Reposted 20 Days AgoSaved
In-Office
Chevy Chase, MD, USA
100K-215K Annually
Senior level
100K-215K Annually
Senior level
Insurance
The Senior Engineer SRE Incident Response (NOC) at GEICO is responsible for overseeing incident response operations, ensuring efficient resolution of technical issues, and maintaining system integrity. The role involves collaboration with various teams and continuous improvement of incident management processes.
Reposted 20 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
184K-240K Annually
Senior level
184K-240K Annually
Senior level
Information Technology • Security • Cybersecurity
The Staff/Principal Site Reliability Engineer leads infrastructure initiatives, architects solutions for cloud and SaaS, and collaborates cross-functionally to enhance reliability and innovation.
Top Skills: AWSBashBazelCuelangDatadogGitopsGoGrafanaHelmKubernetesLinuxPrometheusPythonTerraform
Reposted 20 Days AgoSaved
Easy Apply
In-Office
Korea, KY, USA
Easy Apply
Senior level
Senior level
Software
As a Lead SRE at Commvault, you'll ensure the quality and reliability of the Clumio Data Platform in AWS, collaborating across teams to enhance infrastructure and maintain SLAs.
Top Skills: AWSDockerIp NetworkingItilKubernetesLinuxPythonTerraform
Reposted 20 Days AgoSaved
In-Office
Palo Alto, CA, USA
120K-140K Annually
Senior level
120K-140K Annually
Senior level
Hardware • Manufacturing
As an SRE, you'll maintain service reliability, operate monitoring tools, automate tasks in Python, and manage incident responses.
Top Skills: AnsibleAWSBashGitlabGrafanaKubernetesLokiPrometheusPythonTempoTerraform
Reposted 20 Days AgoSaved
In-Office
Seattle, WA, USA
176K-221K Annually
Senior level
176K-221K Annually
Senior level
eCommerce
Responsible for platform reliability, monitoring, automation, and system health for Coupang's customer-facing services, ensuring scalable solutions and handling production incidents.
Top Skills: AWSAzureDatadogDockerElastic StackGoGoogle Cloud PlatformGrafanaJavaKubernetesNew RelicPrometheusPythonRuby
Reposted 20 Days AgoSaved
Remote
US
175K-200K Annually
Senior level
175K-200K Annually
Senior level
Blockchain • Software
As a Senior Engineer, SRE/DevOps, you will enhance blockchain infrastructure reliability, automate deployment, and collaborate on CI/CD practices while ensuring security and performance optimization.
Top Skills: AnsibleAWSBashCloudtrailCloudwatchCosmosDockerElk-StackEthereumGCPK8SKubernetesOpsgeniePingdomPythonTerraform
Reposted 20 Days AgoSaved
Easy Apply
In-Office or Remote
47 Locations
Easy Apply
Senior level
Senior level
Artificial Intelligence • Blockchain • Internet of Things • Machine Learning • Software • App development • Automation
As a Staff SRE, you will ensure the reliability, scalability, and performance of systems, lead incident management, and drive automation efforts.
Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account