Get the job you really want.
Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Cloud • Security • Software • Cybersecurity
The Staff Site Reliability Engineer will enhance AI/ML infrastructure, manage CI/CD pipelines, ensure system reliability, and troubleshoot applications, focusing on cloud-based operations.
Top Skills:
AWSAzureBashDockerGitGitGCPGrafanaHuggingface TransformersKubernetesLlmPrometheusPythonPyTorchTensorrtTerraform
eCommerce
The Staff Back-end Engineer (SRE) will build, run, and scale ecommerce systems, ensuring reliability and performance for customer-facing services, while utilizing automation and best practices.
Top Skills:
AWSAzureDatadogDockerElastic StackGoGoogle Cloud PlatformGrafanaJavaKubernetesNew RelicPrometheusPythonRuby
Healthtech • Payments • Software
The SRE Specialist ensures the reliability and performance of data systems, collaborates with teams, and handles incident response and system monitoring.
Top Skills:
AWSAzureCloudFormationGCPGrafanaKubernetesPowershellPrometheusPythonSplunkTerraform
Reposted 18 Days AgoSaved
Insurance
The Senior Product Manager will drive core reliability platforms and services, guiding developer engineering products from conception to launch, improving system availability, incident management, and developer workflows.
Top Skills:
AWSAzureCloud InfrastructureDeveloper ToolsGrafanaKubernetesObservability
Software
The Principal Site Reliability Engineer will enhance system reliability, implement monitoring systems, collaborate across teams, and ensure platform uptime and performance.
Top Skills:
AWSAzureDatadogGCPGrafanaJavaKubernetesNode.jsPrometheusPython
Gaming
Manage operational tasks for gaming services, design runtime environments, monitor metrics, optimize architecture, and research software solutions.
Top Skills:
C/C++GoIstioJavaK8SLinuxMySQLNginxPythonRustShell
Artificial Intelligence • Software • Generative AI
As a Site Reliability Engineer, you'll design and maintain cloud infrastructure, automate provisioning, ensure system reliability, and mentor junior engineers while leveraging various technologies to optimize performance and security.
Top Skills:
AWSAzureDockerElk StackGCPGoGrafanaJavaKubernetesPrometheusPythonScalaTerraform
Cloud • Security • Software • Cybersecurity
As a Staff Site Reliability Engineer, you will lead SRE initiatives, mentor engineers, ensure system reliability, and drive strategic engineering practices globally.
Top Skills:
C#GoGrafanaJavaJavaScriptKubernetesOpentelemetryPrometheusPulumiTerraformTypescript
Cloud • Security • Software • Cybersecurity
The Principal Site Reliability Engineer will lead Veeam's global SRE efforts, focusing on architecture, reliability strategies, and mentorship while influencing cross-functional teams.
Top Skills:
Automation ToolingCloud InfrastructureCloud-Native DevelopmentDistributed Systems
Cloud • Software • Analytics
The Principal Cloud Site Reliability Engineer will lead the design and implementation of cloud infrastructure, manage CI/CD pipelines, mentor teams, and ensure secure, performant systems in AWS and Azure environments.
Top Skills:
AnsibleAWSAzureBashChefDockerElkGrafanaJenkinsKubernetesMongoDBMySQLPostgresPrometheusPuppetPythonRdsSaltTerraform
Information Technology • Security • Cybersecurity
Lead a Site Reliability Engineering team to ensure product reliability, oversee incident management, and collaborate with other engineering teams on performance issues.
Top Skills:
AWSCi/CdGCPGrafanaKubernetesPrometheusTerraform
Fintech
The Site Reliability Engineer will manage Kubernetes clusters, automate infrastructure, ensure cloud resource reliability, and collaborate across teams to enhance operational efficiency.
Top Skills:
Amazon S3Apache MesosAWSAzureC/C++CephCloud InfrastructureDockerHdfsHelmInfrastructure As CodeJavaJavaScriptKubernetesLinuxNfsPostgresPythonRubyTerraformYarn
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Artificial Intelligence • Generative AI
Lead GPU cluster design and operations, manage Kubernetes, implement Infrastructure-as-Code, and develop observability stacks for high-performance AI models.
Top Skills:
AnsibleArgo CdBashEbpfFluxGitopsGrafanaHelmInfinibandKubernetesNvidia DcgmOpentelemetryPrometheusPythonRdmaTerraform
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills:
AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Information Technology • Software
The Site Reliability Engineer will manage and scale infrastructure, automate deployments, and lead efforts in operational process management while participating in a 24x7 on-call rotation.
Top Skills:
AnsibleDockerFreebsdFreeipaJenkinsKubernetesLinuxOpenstackPythonRedhat Enterprise LinuxTerraform
News + Entertainment
The role involves designing scalable infrastructure, collaborating for reliability, automating monitoring and response tools, managing incidents, and promoting reliability culture at Netflix.
Top Skills:
AWSAzureGCPGoJavaKubernetesPythonTerraform
Reposted 20 Days AgoSaved
Travel
Seeking a Senior Site Reliability Engineer to enhance platform infrastructure for scaling services in Google Cloud. Responsibilities include automation, incident response, and supporting engineering teams with reliable tools and systems.
Top Skills:
BashDatadogGoogle Cloud PlatformHelmIstioKubernetesKustomizePythonTerraform
Reposted 20 Days AgoSaved
Travel
The Senior Site Reliability Engineer will enhance platform tooling, drive automation of infrastructure components, and support teams by ensuring reliable and scalable cloud infrastructure on Google Cloud.
Top Skills:
BashDatadogGoogle Cloud PlatformHelmIstioKubernetesKustomizePythonTerraform
Insurance
The Senior Engineer SRE Incident Response (NOC) at GEICO is responsible for overseeing incident response operations, ensuring efficient resolution of technical issues, and maintaining system integrity. The role involves collaboration with various teams and continuous improvement of incident management processes.
Reposted 20 Days AgoSaved
Easy Apply
Easy Apply
Information Technology • Security • Cybersecurity
The Staff/Principal Site Reliability Engineer leads infrastructure initiatives, architects solutions for cloud and SaaS, and collaborates cross-functionally to enhance reliability and innovation.
Top Skills:
AWSBashBazelCuelangDatadogGitopsGoGrafanaHelmKubernetesLinuxPrometheusPythonTerraform
Software
As a Lead SRE at Commvault, you'll ensure the quality and reliability of the Clumio Data Platform in AWS, collaborating across teams to enhance infrastructure and maintain SLAs.
Top Skills:
AWSDockerIp NetworkingItilKubernetesLinuxPythonTerraform
Hardware • Manufacturing
As an SRE, you'll maintain service reliability, operate monitoring tools, automate tasks in Python, and manage incident responses.
Top Skills:
AnsibleAWSBashGitlabGrafanaKubernetesLokiPrometheusPythonTempoTerraform
eCommerce
Responsible for platform reliability, monitoring, automation, and system health for Coupang's customer-facing services, ensuring scalable solutions and handling production incidents.
Top Skills:
AWSAzureDatadogDockerElastic StackGoGoogle Cloud PlatformGrafanaJavaKubernetesNew RelicPrometheusPythonRuby
Blockchain • Software
As a Senior Engineer, SRE/DevOps, you will enhance blockchain infrastructure reliability, automate deployment, and collaborate on CI/CD practices while ensuring security and performance optimization.
Top Skills:
AnsibleAWSBashCloudtrailCloudwatchCosmosDockerElk-StackEthereumGCPK8SKubernetesOpsgeniePingdomPythonTerraform
Artificial Intelligence • Blockchain • Internet of Things • Machine Learning • Software • App development • Automation
As a Staff SRE, you will ensure the reliability, scalability, and performance of systems, lead incident management, and drive automation efforts.
Top Skills:
AnsibleAWSAzureBashDockerElk StackGCPGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
Popular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results
































