Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Artificial Intelligence
In this role, the Site Reliability Engineer will improve reliability and performance of infrastructure, write clean code, collaborate across teams, and maintain platforms for deployed software.
Top Skills:
AWSCi/CdDockerJavaScriptKubernetesPythonTerraformUnix Shell
Artificial Intelligence • Blockchain • Information Technology • Consulting
Design, scale, and operate multi-region, high-availability cloud infrastructure; lead incident response and on-call rotations; build automation and tooling in Python/Go; enforce risk, security, and operational standards; mentor teams and drive infrastructure architecture decisions.
Top Skills:
AWSCryptoGCPGoIamIsoKafkaKubernetesOpentofuPostgresPythonRedpandaSoc2TerraformWeb3
Artificial Intelligence
The Deployment Engineer will manage AI inference clusters, optimizing deployment, capacity allocation, and ensuring reliability of pipeline operations across datacenters.
Top Skills:
DockerGrafanaInfluxdbK8SLinuxPrometheusPython
Artificial Intelligence • Logistics • Software
The Site Reliability Engineer will enhance operational resilience, ensuring system stability, observability, and debugging workflows for complex failures while improving developer focus and uptime.
Top Skills:
DatadogGoPrometheusPythonSentry
Cybersecurity
As a Sr. Staff Site Reliability Engineer, you will define the reliability vision for a multi-tenant SaaS platform, lead the architecture of detection systems, and partner across teams to improve incident management and system resilience, ensuring issues are resolved before affecting customers.
Top Skills:
ArgocdAWSGCPGitlab Ci/CdGrafanaHelmKubernetesPrometheus
Automotive • Hardware • Logistics
The Site Reliability Engineer III enhances system reliability by building automation and supporting large-scale systems, ensuring critical platforms function optimally.
Top Skills:
APIsAzure DevopsDynatraceGoogle Cloud PlatformGrafanaHTTPJavaKubernetesMicroservicesPrometheusTerraform
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills:
KubernetesLinuxOpenstackPython
Financial Services
The Lead Site Reliability Engineer will establish the SRE operating model, implement AI-enabled reliability use cases, manage reliability metrics, and oversee operational readiness while collaborating with teams and mentoring engineers.
Top Skills:
Ai/MlAnsibleAzure DevopsDockerGithub ActionsGitlab CiJenkinsKubernetesTerraformVMware
Artificial Intelligence • Software
As a Software Engineer on the Site Reliability team, you'll ensure system reliability, scalability, and observability while partnering with engineering teams and improving incident management processes.
Top Skills:
AWSCi/Cd ToolingContainer OrchestrationDatadogGrafanaPrometheusTerraform
Information Technology • Software • Big Data Analytics
The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.
Top Skills:
ApacheJava
Insurance • Financial Services
Drive reliability and operational excellence across cloud, on-premise, and hybrid platforms. Build automation and AI-powered tooling, design observability and self-healing systems, standardize CI/CD and incident practices, lead post-incident reviews, and support production systems through on-call rotation while advising engineering teams on reliability, compliance, and modern DevOps practices.
Top Skills:
Ai PlatformsC#Ci/CdCloudContainer PlatformsDockerEvent StreamingInfrastructure-As-CodeJavaJavaScriptKubernetesLlmsObservabilityPowershellPythonTelemetry
Manufacturing
The Site Reliability Engineer will ensure the reliability, security, and support of Databricks applications while collaborating with various teams to optimize data workflows and incident management.
Top Skills:
AzureCi/CdDatabricksDelta LakePysparkPythonSQLUnity Catalog
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Information Technology • Software
As Sr. Director, Platform Engineering & SRE, you will lead the reliability and operational excellence of the platform, establishing practices for site reliability engineering and managing cloud engineering across a multi-cloud SaaS portfolio.
Top Skills:
AWSAzureCi/CdDatadogGCPGrafanaInfrastructure-As-CodeOpentelemetryPrometheus
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.
Top Skills:
AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.
Top Skills:
AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform
Cloud • Security • Software • Cybersecurity
As a Site Reliability Engineer II, you'll automate tasks, monitor AI workloads, enhance dashboards, support CI/CD processes, and collaborate with engineering teams on complex issues while participating in on-call rotations.
Top Skills:
GoGrafanaKubernetesLinuxPrometheusPythonSaltstackTerraform
Software • Analytics
The role involves automating and managing AWS infrastructure, ensuring reliability and scalability of stateful systems, and optimizing deployment processes. You'll also handle incident responses and improve operational tooling.
Top Skills:
AWSKubernetesTerraformTerragrunt
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing observability infrastructure, and supporting SRE practices and cloud deployments.
Top Skills:
AWSAzureCloudFormationDockerGCPGoJavaKubernetesNode.jsOpentelemetryPulumiPythonRustTerraform
Artificial Intelligence • Software
The Site Reliability Engineer ensures the reliability and performance of products Devin and Windsurf, managing incident response, CI/CD pipelines, infrastructure as code, and fostering a reliability culture within the engineering team.
Top Skills:
AWSAzureCi/CdGCPKubernetesTerraform
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills:
AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Cloud • Security • Software • Generative AI
Design, build, and automate large-scale multi-cloud infrastructure and internal SRE tools. Improve host lifecycle, observability, alerting, and reliability; operate containerized workloads; participate in on-call rotations, incident response, runbooks, postmortems, code reviews, and mentoring.
Top Skills:
AnsibleArgo CdArgo WorkflowsCueDockerElastic StackGoGraphiteInfluxKubernetesLinuxPrometheusPuppetTerraformUbuntuUbuntu Live Patch
Software • Cybersecurity
This role involves managing Kubernetes clusters, cloud infrastructure, and CI/CD pipelines. The engineer will enhance system reliability and efficiency while troubleshooting production issues.
Top Skills:
AlertmanagerAWSAzureBashCi/CdDockerElastic StackElasticsearchGCPGoGrafanaHelmKafkaKubernetesLokiMongoDBOciPrometheusPythonRedisSparkTerraform
Information Technology • Software
The Systems Engineer manages Linux systems, designs CI/CD pipelines, administers application security platforms, and ensures compliance with security standards.
Top Skills:
AnsibleBashCloudbees JenkinsDockerElkGitGithub ActionsGithub Advanced SecurityJfrog ArtifactoryJfrog XrayJIRAKubernetesLinuxNagiosNexus IqPrometheusPythonTerraform
Cloud
The Staff Site Reliability Engineer will lead the design of AWS solutions, manage incident responses, and mentor junior engineers, ensuring reliability and security in federal environments.
Top Skills:
AWSDatabricksGoHelmKubernetesRedshiftSnowflakeTerraform
Information Technology
The role involves securing and maintaining the reliability of X Money's infrastructure, focusing on AWS, Kubernetes, and code security while implementing best practices and collaborative problem-solving.
Top Skills:
AWSDynamoDBKubernetesPythonRdsTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results





























