Top Site Reliability Engineer Jobs

Reposted 15 Days AgoSaved
In-Office
2 Locations
Senior level
Senior level
Energy
The Senior Site Reliability Engineer improves infrastructure reliability and scalability, partners with various teams, implements IaC and CI/CD, and ensures business continuity through effective BCP/DR planning.
Top Skills: AWSBashCloudFormationDatadogElkGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpensearchPrometheusPythonTerraform
Senior level
Artificial Intelligence • Software
Design, build, and scale control- and data-plane infrastructure for distributed AI workloads. Improve reliability, performance, scheduling, and observability for Ray clusters across cloud and on-prem environments. Support accelerator integration, container image management, and provide on-call troubleshooting and cross-team collaboration.
Top Skills: AWSAzureContainersGCPGoGpusGrafanaKubernetesLinuxPrometheusPythonRayTpusVms
16 Days AgoSaved
In-Office
Sunnyvale, CA, USA
Senior level
Senior level
Fintech • Payments • Software • Financial Services
Senior SRE responsible for ensuring platform scalability and reliability on AWS, owning CI/CD and GitHub workflows, leading incident response and post-mortems, implementing observability and logging, and serving as a bilingual (Mandarin/English) technical liaison with international engineering teams.
Top Skills: AWSCi/CdGitLoggingMonitoringScripting
17 Days AgoSaved
Remote
US
110K-151K Annually
Senior level
110K-151K Annually
Senior level
Edtech
Lead SRE work to improve availability, reliability, observability, and security for a distributed SaaS platform. Build and maintain IaC (Terraform, CloudFormation), support CI/CD, manage containerized production environments (Kubernetes/EKS), run disaster recovery exercises, participate in on-call rotation, collaborate cross-functionally, and mentor teams while integrating tooling including AI into SRE workflows.
Top Skills: .NetAnsibleAws EksCi/CdCloudFormationDockerJavaJavaScriptKubernetesPythonTerraform
17 Days AgoSaved
In-Office
Cambridge, MA, USA
160K-180K Annually
Senior level
160K-180K Annually
Senior level
Artificial Intelligence • Software • Generative AI • Automation
Lead design, build, and operation of scalable, fault-tolerant cloud infrastructure. Define SLOs/SLAs, improve observability and incident response, own CI/CD and deployment automation, partner with engineering teams on reliability, capacity planning, performance benchmarking, cost optimization, and security for an AI platform.
Top Skills: AWSAzureBashCi/CdDatadogEbpfGCPGoGpuGrafanaIstioKubernetesLinkerdOpentelemetryPrometheusPulumiPythonTerraform
Reposted 17 Days AgoSaved
Remote
United States
Senior level
Senior level
Big Data
You will manage AWS infrastructure, automate deployments, debug application issues, and improve the operational health of Metabase Cloud.
Top Skills: AWSDatadogGoGrafanaKubernetesPrometheusPythonTerraform
Reposted 18 Days AgoSaved
In-Office
Edmond, OK, USA
Senior level
Senior level
Other
The Senior Site Reliability Engineer ensures the integrity, performance, and reliability of cloud infrastructure, overseeing software development, maintenance, and site reliability issues while promoting industry best practices.
Top Skills: Cloud InfrastructureDevOpsSoftware Development
Reposted 19 Days AgoSaved
In-Office
San Francisco, CA, USA
140K-185K Annually
Mid level
140K-185K Annually
Mid level
Artificial Intelligence • Healthtech
The role involves improving operational reliability, managing production environments, enhancing observability, automating tasks, and collaborating with engineering teams, requiring 3-6 years of relevant experience.
Top Skills: AWSBashDatadogKubernetesPrometheusPythonTerraform
Reposted 19 Days AgoSaved
In-Office
San Francisco, CA, USA
127K-192K Annually
Senior level
127K-192K Annually
Senior level
Big Data • Cloud • Marketing Tech • Social Impact • Software
As a Senior Site Reliability Engineer, you will support product deployments, provide engineering support, maintain systems, and collaborate with teams globally to enhance infrastructure reliability.
Top Skills: AWSCassandraCircleCIDynamoDBGCPGoJenkinsKubernetesNosql DatabasesPythonScylladbSinglestore DbTerraform
Reposted 19 Days AgoSaved
In-Office
Dallas, TX, USA
Senior level
Senior level
Healthtech • Travel
The Senior Site Reliability Engineer leads reliability engineering for Azure, focusing on scripting, automation, observability, and incident response, ensuring service quality and uptime.
Top Skills: AksApp ServicesApplication InsightsAzureAzure DevopsAzure MonitorBicepFunctionsGithub ActionsGrafanaItrs GeneosJIRALog AnalyticsPowershellPythonServicenowTerraformVm Scale Sets
Reposted 20 Days AgoSaved
In-Office or Remote
9 Locations
170K-290K Annually
Expert/Leader
170K-290K Annually
Expert/Leader
Artificial Intelligence • Software
As a Software Engineer in Reliability, you'll architect and manage multi-cloud GPU infrastructure, ensuring performance, security, and scale while debugging complex hardware/software issues.
Top Skills: AmdAWSBashGoGpuInfinibandLinuxNvidiaOciPythonRdma
Reposted 20 Days AgoSaved
Remote
United States
Expert/Leader
Expert/Leader
Legal Tech • Software
As a Site Reliability Engineer, you'll develop autonomous systems, improve CI/CD pipelines, mentor junior engineers, and ensure software reliability and security in a 24/7 environment.
Top Skills: BashPowershellPython
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 21 Days AgoSaved
In-Office
Washington, DC, USA
170K-220K Annually
Senior level
170K-220K Annually
Senior level
Information Technology • Consulting
As a Sr. Site Reliability Engineer, you'll design, deploy, and maintain applications in virtualized environments, develop CI/CD pipelines, and ensure operational observability and performance of production systems.
Top Skills: AnsibleBashF5Gitlab Ci/CdKubernetesMinioPortworxS3-Compatible ServicesVMware
Reposted 21 Days AgoSaved
Hybrid
San Francisco, CA, USA
250K-350K Annually
Senior level
250K-350K Annually
Senior level
Artificial Intelligence • Information Technology • Software
The Site Reliability Engineer will ensure high availability and performance of CodeRabbit's AI-powered code review platform, enhancing system reliability through infrastructure ownership, performance engineering, and automation.
Top Skills: AWSDatadogDockerElk StackGoogle Cloud PlatformGrafanaKubernetesLinuxNode.jsPrometheusTerraformTypescript
Reposted 21 Days AgoSaved
In-Office
Washington, DC, USA
147K-202K Annually
Senior level
147K-202K Annually
Senior level
Cloud
The Staff Site Reliability Engineer will manage large-scale cloud production systems, ensuring reliability and performance, while automating processes and responding to incidents.
Top Skills: AWSBashCloudFormationDockerGoHelmKubernetesPythonRubyTerraform
Reposted 21 Days AgoSaved
In-Office
Vienna, VA, USA
84K-142K Annually
Senior level
84K-142K Annually
Senior level
Other • Software • Analytics
The Sr. Site Reliability Engineer will manage SaaS capabilities, implement monitoring systems, automate operational tasks, and provide on-call support.
Top Skills: AWSBashDockerEksElkGitJavaKubernetesPrometheusPythonTerraform
Reposted 21 Days AgoSaved
In-Office
Charlotte, NC, USA
84K-142K Annually
Senior level
84K-142K Annually
Senior level
Other • Software • Analytics
The role involves deploying and managing SaaS solutions, automating infrastructure processes, troubleshooting system issues, and collaborating with a team of engineers.
Top Skills: Arcgis VelocityArcgis Workflow ManagerAWSAws LambdaBashDockerEksElkGitKafkaKubernetesOpensearchPrometheusPythonTerraform
Reposted 21 Days AgoSaved
In-Office
St. Louis, MO, USA
84K-142K Annually
Senior level
84K-142K Annually
Senior level
Other • Software • Analytics
As a Sr. Site Reliability Engineer, you will manage cloud-based SaaS products, automate infrastructure, troubleshoot issues, and provide technical support while collaborating with a team of engineers.
Top Skills: AWSAws LambdaBashDockerEcsEksElkGitJavaKafkaKubernetesOpensearchPrometheusPythonSecurity GroupsTerraformVpc
Reposted 21 Days AgoSaved
In-Office
Redlands, CA, USA
84K-142K Annually
Senior level
84K-142K Annually
Senior level
Other • Software • Analytics
The role involves deploying and managing SaaS capabilities on AWS, including monitoring systems, automation solutions, and troubleshooting incidents. Collaboration with SRE engineers is key to operational success across multiple regions.
Top Skills: AWSAws LambdaBashDockerEcsElkGitGitKafkaKubernetesOpensearchPrometheusPythonTerraform
Reposted 21 Days AgoSaved
In-Office
New York, NY, USA
89K-178K Annually
Senior level
89K-178K Annually
Senior level
AdTech • Marketing Tech
The role involves enhancing the reliability and performance of media measurement platforms, managing incidents, implementing observability practices, automating processes, and ensuring high availability of cloud and on-premises infrastructures.
Top Skills: AnsibleAWSBashGCPGitlabGoGrafanaHelmKubernetesLinuxMongoDBNagiosNoSQLOciPrometheusPythonSnowflakeSplunkSQLTerraformUnixVertica
23 Days AgoSaved
In-Office
Oakland Estates, San Antonio, TX, USA
Senior level
Senior level
Digital Media • Events • Music
Lead and manage a team of SRE/DevOps engineers to ensure reliability, availability, and performance of cloud-based systems. Oversee incident response, operational troubleshooting, process improvements, and cross-team collaboration while mentoring and delegating tasks to meet business objectives.
Top Skills: Cloud Services
23 Days AgoSaved
Hybrid
San Francisco, CA, USA
165K-235K Annually
Senior level
165K-235K Annually
Senior level
Artificial Intelligence • HR Tech • Professional Services
Design, build, and operate scalable, reliable cloud infrastructure. Maintain AWS/GCP and Linux systems, Kubernetes clusters, CI/CD pipelines, and monitoring (Prometheus/ELK). Automate operations, troubleshoot production issues, run on-call, conduct reviews, and evaluate new technologies to improve availability and performance.
Top Skills: AnsibleAWSCi/CdElkGCPJenkinsKubernetesLinuxPrometheusPuppetTerraform
23 Days AgoSaved
Hybrid
San Francisco, CA, USA
205K-225K Annually
Senior level
205K-225K Annually
Senior level
Artificial Intelligence • HR Tech • Professional Services
Design, build, and operate reliable, scalable cloud infrastructure. Maintain AWS/GCP and Linux systems, manage Kubernetes clusters, implement IaC (Ansible/Puppet/Terraform), automate CI/CD (Jenkins), monitor with Prometheus/ELK, triage alerts, participate in design/reviews, migrate apps to Kubernetes, and improve operational automation.
Top Skills: AnsibleAWSC++ElkGCPGoJenkinsKubernetesLinuxPrometheusPuppetRustTerraformTypescript
23 Days AgoSaved
In-Office or Remote
3 Locations
100K-125K Annually
Senior level
100K-125K Annually
Senior level
Healthtech • Pet • Biotech
Senior SRE responsible for designing and modernizing CI/CD and deployment systems, automating AWS Serverless infrastructure, improving observability and incident response, enforcing release and security practices, and guiding engineering teams to scale resilient global services.
Top Skills: AuroradbAws CloudformationAws LambdaAzure Entra IdCloudfrontDynamoDBEventbridgeGitGitGithub ActionsMavenOauth2Openid ConnectS3SnsSqsTerraform
23 Days AgoSaved
In-Office
Sunnyvale, CA, USA
170K-196K Annually
Senior level
170K-196K Annually
Senior level
Software • Cybersecurity
Drive reliability, scalability, and performance of cloud-based systems on AWS/Azure. Monitor systems, handle on-call production support, lead incident response and root cause analysis, perform releases and hotfixes, implement cloud security controls, and automate infrastructure improvements.
Top Skills: AWSAzureAzure DevopsCloud-NativeDockerGitlab Ci/CdGoJenkinsKubernetesMicroservicesPowershellPython
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account