Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Energy
The Senior Site Reliability Engineer improves infrastructure reliability and scalability, partners with various teams, implements IaC and CI/CD, and ensures business continuity through effective BCP/DR planning.
Top Skills:
AWSBashCloudFormationDatadogElkGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpensearchPrometheusPythonTerraform
Artificial Intelligence • Software
Design, build, and scale control- and data-plane infrastructure for distributed AI workloads. Improve reliability, performance, scheduling, and observability for Ray clusters across cloud and on-prem environments. Support accelerator integration, container image management, and provide on-call troubleshooting and cross-team collaboration.
Top Skills:
AWSAzureContainersGCPGoGpusGrafanaKubernetesLinuxPrometheusPythonRayTpusVms
Fintech • Payments • Software • Financial Services
Senior SRE responsible for ensuring platform scalability and reliability on AWS, owning CI/CD and GitHub workflows, leading incident response and post-mortems, implementing observability and logging, and serving as a bilingual (Mandarin/English) technical liaison with international engineering teams.
Top Skills:
AWSCi/CdGitLoggingMonitoringScripting
Edtech
Lead SRE work to improve availability, reliability, observability, and security for a distributed SaaS platform. Build and maintain IaC (Terraform, CloudFormation), support CI/CD, manage containerized production environments (Kubernetes/EKS), run disaster recovery exercises, participate in on-call rotation, collaborate cross-functionally, and mentor teams while integrating tooling including AI into SRE workflows.
Top Skills:
.NetAnsibleAws EksCi/CdCloudFormationDockerJavaJavaScriptKubernetesPythonTerraform
Artificial Intelligence • Software • Generative AI • Automation
Lead design, build, and operation of scalable, fault-tolerant cloud infrastructure. Define SLOs/SLAs, improve observability and incident response, own CI/CD and deployment automation, partner with engineering teams on reliability, capacity planning, performance benchmarking, cost optimization, and security for an AI platform.
Top Skills:
AWSAzureBashCi/CdDatadogEbpfGCPGoGpuGrafanaIstioKubernetesLinkerdOpentelemetryPrometheusPulumiPythonTerraform
Big Data
You will manage AWS infrastructure, automate deployments, debug application issues, and improve the operational health of Metabase Cloud.
Top Skills:
AWSDatadogGoGrafanaKubernetesPrometheusPythonTerraform
Other
The Senior Site Reliability Engineer ensures the integrity, performance, and reliability of cloud infrastructure, overseeing software development, maintenance, and site reliability issues while promoting industry best practices.
Top Skills:
Cloud InfrastructureDevOpsSoftware Development
Artificial Intelligence • Healthtech
The role involves improving operational reliability, managing production environments, enhancing observability, automating tasks, and collaborating with engineering teams, requiring 3-6 years of relevant experience.
Top Skills:
AWSBashDatadogKubernetesPrometheusPythonTerraform
Big Data • Cloud • Marketing Tech • Social Impact • Software
As a Senior Site Reliability Engineer, you will support product deployments, provide engineering support, maintain systems, and collaborate with teams globally to enhance infrastructure reliability.
Top Skills:
AWSCassandraCircleCIDynamoDBGCPGoJenkinsKubernetesNosql DatabasesPythonScylladbSinglestore DbTerraform
Healthtech • Travel
The Senior Site Reliability Engineer leads reliability engineering for Azure, focusing on scripting, automation, observability, and incident response, ensuring service quality and uptime.
Top Skills:
AksApp ServicesApplication InsightsAzureAzure DevopsAzure MonitorBicepFunctionsGithub ActionsGrafanaItrs GeneosJIRALog AnalyticsPowershellPythonServicenowTerraformVm Scale Sets
Artificial Intelligence • Software
As a Software Engineer in Reliability, you'll architect and manage multi-cloud GPU infrastructure, ensuring performance, security, and scale while debugging complex hardware/software issues.
Top Skills:
AmdAWSBashGoGpuInfinibandLinuxNvidiaOciPythonRdma
Legal Tech • Software
As a Site Reliability Engineer, you'll develop autonomous systems, improve CI/CD pipelines, mentor junior engineers, and ensure software reliability and security in a 24/7 environment.
Top Skills:
BashPowershellPython
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Information Technology • Consulting
As a Sr. Site Reliability Engineer, you'll design, deploy, and maintain applications in virtualized environments, develop CI/CD pipelines, and ensure operational observability and performance of production systems.
Top Skills:
AnsibleBashF5Gitlab Ci/CdKubernetesMinioPortworxS3-Compatible ServicesVMware
Artificial Intelligence • Information Technology • Software
The Site Reliability Engineer will ensure high availability and performance of CodeRabbit's AI-powered code review platform, enhancing system reliability through infrastructure ownership, performance engineering, and automation.
Top Skills:
AWSDatadogDockerElk StackGoogle Cloud PlatformGrafanaKubernetesLinuxNode.jsPrometheusTerraformTypescript
Cloud
The Staff Site Reliability Engineer will manage large-scale cloud production systems, ensuring reliability and performance, while automating processes and responding to incidents.
Top Skills:
AWSBashCloudFormationDockerGoHelmKubernetesPythonRubyTerraform
Other • Software • Analytics
The Sr. Site Reliability Engineer will manage SaaS capabilities, implement monitoring systems, automate operational tasks, and provide on-call support.
Top Skills:
AWSBashDockerEksElkGitJavaKubernetesPrometheusPythonTerraform
Other • Software • Analytics
The role involves deploying and managing SaaS solutions, automating infrastructure processes, troubleshooting system issues, and collaborating with a team of engineers.
Top Skills:
Arcgis VelocityArcgis Workflow ManagerAWSAws LambdaBashDockerEksElkGitKafkaKubernetesOpensearchPrometheusPythonTerraform
Other • Software • Analytics
As a Sr. Site Reliability Engineer, you will manage cloud-based SaaS products, automate infrastructure, troubleshoot issues, and provide technical support while collaborating with a team of engineers.
Top Skills:
AWSAws LambdaBashDockerEcsEksElkGitJavaKafkaKubernetesOpensearchPrometheusPythonSecurity GroupsTerraformVpc
Other • Software • Analytics
The role involves deploying and managing SaaS capabilities on AWS, including monitoring systems, automation solutions, and troubleshooting incidents. Collaboration with SRE engineers is key to operational success across multiple regions.
Top Skills:
AWSAws LambdaBashDockerEcsElkGitGitKafkaKubernetesOpensearchPrometheusPythonTerraform
AdTech • Marketing Tech
The role involves enhancing the reliability and performance of media measurement platforms, managing incidents, implementing observability practices, automating processes, and ensuring high availability of cloud and on-premises infrastructures.
Top Skills:
AnsibleAWSBashGCPGitlabGoGrafanaHelmKubernetesLinuxMongoDBNagiosNoSQLOciPrometheusPythonSnowflakeSplunkSQLTerraformUnixVertica
Digital Media • Events • Music
Lead and manage a team of SRE/DevOps engineers to ensure reliability, availability, and performance of cloud-based systems. Oversee incident response, operational troubleshooting, process improvements, and cross-team collaboration while mentoring and delegating tasks to meet business objectives.
Top Skills:
Cloud Services
Artificial Intelligence • HR Tech • Professional Services
Design, build, and operate scalable, reliable cloud infrastructure. Maintain AWS/GCP and Linux systems, Kubernetes clusters, CI/CD pipelines, and monitoring (Prometheus/ELK). Automate operations, troubleshoot production issues, run on-call, conduct reviews, and evaluate new technologies to improve availability and performance.
Top Skills:
AnsibleAWSCi/CdElkGCPJenkinsKubernetesLinuxPrometheusPuppetTerraform
Artificial Intelligence • HR Tech • Professional Services
Design, build, and operate reliable, scalable cloud infrastructure. Maintain AWS/GCP and Linux systems, manage Kubernetes clusters, implement IaC (Ansible/Puppet/Terraform), automate CI/CD (Jenkins), monitor with Prometheus/ELK, triage alerts, participate in design/reviews, migrate apps to Kubernetes, and improve operational automation.
Top Skills:
AnsibleAWSC++ElkGCPGoJenkinsKubernetesLinuxPrometheusPuppetRustTerraformTypescript
Healthtech • Pet • Biotech
Senior SRE responsible for designing and modernizing CI/CD and deployment systems, automating AWS Serverless infrastructure, improving observability and incident response, enforcing release and security practices, and guiding engineering teams to scale resilient global services.
Top Skills:
AuroradbAws CloudformationAws LambdaAzure Entra IdCloudfrontDynamoDBEventbridgeGitGitGithub ActionsMavenOauth2Openid ConnectS3SnsSqsTerraform
Software • Cybersecurity
Drive reliability, scalability, and performance of cloud-based systems on AWS/Azure. Monitor systems, handle on-call production support, lead incident response and root cause analysis, perform releases and hotfixes, implement cloud security controls, and automate infrastructure improvements.
Top Skills:
AWSAzureAzure DevopsCloud-NativeDockerGitlab Ci/CdGoJenkinsKubernetesMicroservicesPowershellPython
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results
































