Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Hardware • Quantum Computing
Lead integration, maintenance, and automation of heterogeneous hardware and software control systems for quantum computers. Manage networked lab infrastructure, CI/CD pipelines, observability, and provisioning. Support incident response, testing, and orchestration, collaborating with software, hardware, and test teams to ensure reliability and operational readiness of development and production environments.
Top Skills:
AnsibleBashCi/CdDebianDhcpDnsDockerElkGitGitlab CiGoGrafanaHardware-In-The-Loop (Hil)JenkinsKubernetesLanLogging SystemsPrometheusPythonRack-Mount ServersRed HatRoutersSwitchesTcp/IpTerraformUbuntuVlanWanWindows
Other
Design, build, and maintain highly available cloud-native systems. Improve reliability through automation, CI/CD, Kubernetes, observability, and incident management. Collaborate with developers, security, and product teams to define SLOs, implement self-healing, debug production issues, and ensure secure deployments.
Top Skills:
AWSAzure Cloud ServicesDatadogGCPGithub ActionsGitlab CiGoInfrastructure As CodeKubernetesOpsgeniePagerdutyPythonRubySite Reliability Engineering Foundation
Software
Support senior SREs to maintain availability, performance, and reliability of VA enterprise platforms. Assist with monitoring, incident response, automation, CI/CD, cloud/container operations (AWS, containers), documentation, and security/compliance under Federal requirements while developing SRE skills.
Top Skills:
AWSAzureBashCi/CdCloudwatchDockerEcsEksElkGitGCPGrafanaKubernetesLinuxPowershellPrometheusPythonSplunkTerraform
Artificial Intelligence • Information Technology • Software
Design, implement, and maintain observability, auto-remediation, and deployment automation for production systems. Develop and maintain deployment scripts and automation in Python, PowerShell, Groovy, and Bash. Automate infrastructure across AWS, vCenter, and network/storage services. Participate in on-call rotations and collaborate with R&D and Cloud teams to improve reliability and CI/CD delivery.
Top Skills:
AnsibleAWSBashBig-IpDynGitGroovyJenkinsKubernetesPowershellPythonRoute53RubyTerraformVcenter
Big Data • Real Estate • Software
Senior SRE responsible for reliability, observability, and operational excellence of a large AWS/Kubernetes platform. Duties include maintaining EKS/Fargate infrastructure, monitoring SLIs/SLOs, implementing observability with NewRelic, driving cost optimization and FinOps practices, executing chaos engineering and incident response, contributing automation and IaC, and supporting security/compliance and developer experience.
Top Skills:
Apollo GraphqlArgo CdAWSAws Secrets ManagerCircleCICloudFormationCloudfrontCloudwatchDatadogDockerEc2EcsEksFargateGithub ActionsGitopsGoGrafanaHelmIamIstioJavaJenkinsKubernetesKustomizeLambdaNewrelicOpsgeniePagerdutyPrometheusPythonRdsRoute53S3ServicenowSplunkTerraformTyk GatewayVaultVpc
Artificial Intelligence • Software • Generative AI
As a Principal SRE, you will lead reliability, scalability, and operational health of Gradial's platform, driving improvements and collaborating with engineering.
Top Skills:
Ci/CdInfrastructure As CodeKubernetesObservabilityPythonTypescript
Software • Defense
Own reliability, scalability, and security for on-prem and AWS deployments. Build observability (Prometheus/Loki/Grafana/ELK), define SLOs/SLIs, lead incident response and postmortems, automate infrastructure (Terraform/Ansible), operate Kubernetes clusters, embed security/compliance controls, eliminate operational toil, and mentor teams.
Top Skills:
AlloyAnsibleAWSAws GovcloudBashCloudFormationDatadogElkGithub ActionsGitlab Ci/CdGoGrafanaJenkinsKubernetesLokiPrometheusPythonRmfStigsTerraform
Gaming • Information Technology • Mobile • Software • Esports
Lead design, build, and operation of multi-cloud hybrid infrastructure and Kubernetes platforms. Drive observability, SLI/SLOs, incident response, automation, CI/CD hardening, secrets/policy-as-code, and promote SRE practices across studios.
Top Skills:
1PasswordAnsibleArgocdAWSAws Secrets ManagerAws Systems ManagerBare MetalCiliumDatadogEksFluxGCPGithub ActionsGkeGoGrafanaHelmIstioJenkinsKubernetesOpa/GatekeeperOpentelemetryPasswordstatePrometheusPulumiPuppetPythonTerraformTerragruntTypescriptVMware
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Lead reliability efforts for cloud-native production systems: design and operate infrastructure, define SLOs/SLIs, lead incident response, build IaC and CI/CD, improve observability and automate toil, and mentor SRE engineers.
Top Skills:
AWSAzureCassandraCdnCloudFormationDnsEcsElkGCPGithub ActionsGitopsGoGrafanaJavaJenkinsKubernetesLinuxMySQLNewrelicOraclePagerdutyPostgresPrometheusPythonRedisSplunkTcp/IpTerraform
Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
As a Senior Site Reliability Engineer, you will design and develop tooling and automation for infrastructure services, collaborate with multiple teams, automate processes, and ensure system reliability in a production environment.
Top Skills:
AWSCloudFormationDatadogGoJavaJenkinsKibanaMavenNewrelicNode.jsPythonSignalfxTerraform
Computer Vision • Hardware • Machine Learning • Robotics • Software
The role involves maintaining cloud infrastructure, collaborating with engineering teams, troubleshooting issues, deploying solutions, and ensuring system reliability.
Top Skills:
AnsibleC++GrafanaHelmKubernetesPagerdutyPythonTerraformTypescript
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
The Senior Site Reliability Engineer will focus on automating infrastructure, enhancing cloud resilience, supporting deployments, and mentoring teams in reliability best practices, while participating in on-call rotations.
Top Skills:
AzureBashCi/CdDockerGoGrafanaJavaKubernetesPowershellPrometheusPythonRubyTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills:
AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Security • Software • Cybersecurity • Automation
As a Senior Site Reliability Engineer, you will enhance the reliability of Drata’s product teams through automation, architecture reviews, and operational excellence using cloud-native technologies.
Top Skills:
AiopsAWSBashDatadogDockerGitGithub ActionsKubernetesLinuxMySQLPythonTerraform
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Artificial Intelligence • Cloud • Information Technology • Security • Social Impact • Software • Cybersecurity
Design, deliver, and maintain a high-performance application platform. Automate processes, improve customer experience, and implement observability tools. Collaborate with teams to manage SLIs, SLOs, and ensure application performance.
Top Skills:
AWSAzureBashDatadogGCPGitlabGrafanaHelmJavaJenkinsKubernetesNoSQLPrometheusPythonRubySpringSQLTerraform
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing telemetry infrastructure, establishing SRE practices, and managing observability across cloud platforms.
Top Skills:
ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsNode.jsOpentelemetryPowershellPulumiPythonRustTerraform
Insurance
As a Staff Cyber SRE at GEICO, you will improve the reliability and performance of security platforms by writing production-quality code and automating workflows. Responsibilities include defining reliability standards, partnering with developers, driving observability, leading incident response, and embracing agile methodologies while enhancing system resilience.
Top Skills:
AnsibleAWSAzureGCPGithub ActionsGoGrafanaJenkinsPrometheusPythonTerraform
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Site Reliability Engineer will manage and enhance cloud infrastructure, focusing on automation, performance, and security while collaborating with software and DevOps teams.
Top Skills:
ArgocdAzureAzure MonitorDynatraceFluxGrafanaHelmKubernetesPrometheusPulumiRestful ServicesSplunkTerraform
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
As a Senior Site Reliability Engineer, you will enhance platform reliability, lead incident management, and drive AI-driven improvements in operational workflows.
Top Skills:
Amazon Web ServicesDatadogDynamoDBEnvoyEvent Driven ArchitecturesGrpcHTTPIstioJSONKotlinKubernetesLaunchdarklyModern JavaMySQLProtocol BuffersTerraformVitess
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills:
Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Automotive • Hardware • Logistics
The Manager of Site Reliability Engineering leads a team to enhance cloud infrastructure reliability, automate processes, and collaborate with various teams to improve service delivery and operations.
Top Skills:
ArgocdCi/CdDatadogDynatraceGCPGoogle Cloud PlatformKubernetesTerraform
Big Data • Energy • Big Data Analytics
The Staff Site Reliability Engineer will lead in designing and maintaining cloud infrastructure on GCP, drive IaC strategy, manage Kubernetes operations, ensure security compliance, and mentor engineers.
Top Skills:
BashGoGoogle Cloud PlatformGrafanaKubernetesOpentelemetryPostgresPythonTerraform
Cloud • Security • Software
As a Site Reliability Engineer, you will design, deliver, and maintain cloud-based infrastructure, ensuring resilient and secure enterprise software solutions through optimized CI/CD processes.
Top Skills:
Ci/CdDockerGCPGitGoKubernetes
Aerospace • Defense • Manufacturing
Lead and build the deployment engineering function to operate mission-critical software in accredited, air-gapped, and high-side environments. Own full deployment lifecycle across cloud, on-prem, and disconnected networks; manage Kubernetes/OpenShift and Linux infrastructure; build CI/CD and IaC workflows; integrate security tooling; diagnose and prevent production issues; produce ATO-related artifacts and maintain compliance.
Top Skills:
AlertmanagerAWSAzureBashDockerGCPGitlab CiGoGrafanaGroovyHelmJavaJenkinsKubernetesOpenshiftPagerdutyPodmanPrometheusPythonRhelRubyService MeshSplunkTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results








.png)


.png)





















