Top Site Reliability Engineer Jobs

5 Days AgoSaved
In-Office
Boston, MA, USA
160K-225K Annually
Senior level
160K-225K Annually
Senior level
Hardware • Quantum Computing
Lead integration, maintenance, and automation of heterogeneous hardware and software control systems for quantum computers. Manage networked lab infrastructure, CI/CD pipelines, observability, and provisioning. Support incident response, testing, and orchestration, collaborating with software, hardware, and test teams to ensure reliability and operational readiness of development and production environments.
Top Skills: AnsibleBashCi/CdDebianDhcpDnsDockerElkGitGitlab CiGoGrafanaHardware-In-The-Loop (Hil)JenkinsKubernetesLanLogging SystemsPrometheusPythonRack-Mount ServersRed HatRoutersSwitchesTcp/IpTerraformUbuntuVlanWanWindows
5 Days AgoSaved
Remote
USA
130K-160K Annually
Senior level
130K-160K Annually
Senior level
Other
Design, build, and maintain highly available cloud-native systems. Improve reliability through automation, CI/CD, Kubernetes, observability, and incident management. Collaborate with developers, security, and product teams to define SLOs, implement self-healing, debug production issues, and ensure secure deployments.
Top Skills: AWSAzure Cloud ServicesDatadogGCPGithub ActionsGitlab CiGoInfrastructure As CodeKubernetesOpsgeniePagerdutyPythonRubySite Reliability Engineering Foundation
5 Days AgoSaved
Remote
USA
Junior
Junior
Software
Support senior SREs to maintain availability, performance, and reliability of VA enterprise platforms. Assist with monitoring, incident response, automation, CI/CD, cloud/container operations (AWS, containers), documentation, and security/compliance under Federal requirements while developing SRE skills.
Top Skills: AWSAzureBashCi/CdCloudwatchDockerEcsEksElkGitGCPGrafanaKubernetesLinuxPowershellPrometheusPythonSplunkTerraform
5 Days AgoSaved
In-Office
Atlanta, GA, USA
218K-274K Annually
Expert/Leader
218K-274K Annually
Expert/Leader
Artificial Intelligence • Information Technology • Software
Design, implement, and maintain observability, auto-remediation, and deployment automation for production systems. Develop and maintain deployment scripts and automation in Python, PowerShell, Groovy, and Bash. Automate infrastructure across AWS, vCenter, and network/storage services. Participate in on-call rotations and collaborate with R&D and Cloud teams to improve reliability and CI/CD delivery.
Top Skills: AnsibleAWSBashBig-IpDynGitGroovyJenkinsKubernetesPowershellPythonRoute53RubyTerraformVcenter
11 Days AgoSaved
Hybrid
Austin, TX, USA
Senior level
Senior level
Big Data • Real Estate • Software
Senior SRE responsible for reliability, observability, and operational excellence of a large AWS/Kubernetes platform. Duties include maintaining EKS/Fargate infrastructure, monitoring SLIs/SLOs, implementing observability with NewRelic, driving cost optimization and FinOps practices, executing chaos engineering and incident response, contributing automation and IaC, and supporting security/compliance and developer experience.
Top Skills: Apollo GraphqlArgo CdAWSAws Secrets ManagerCircleCICloudFormationCloudfrontCloudwatchDatadogDockerEc2EcsEksFargateGithub ActionsGitopsGoGrafanaHelmIamIstioJavaJenkinsKubernetesKustomizeLambdaNewrelicOpsgeniePagerdutyPrometheusPythonRdsRoute53S3ServicenowSplunkTerraformTyk GatewayVaultVpc
Reposted 5 Days AgoSaved
In-Office
Seattle, WA, USA
180K-240K Annually
Senior level
180K-240K Annually
Senior level
Artificial Intelligence • Software • Generative AI
As a Principal SRE, you will lead reliability, scalability, and operational health of Gradial's platform, driving improvements and collaborating with engineering.
Top Skills: Ci/CdInfrastructure As CodeKubernetesObservabilityPythonTypescript
11 Days AgoSaved
In-Office
Washington, VA, USA
180K-220K Annually
Senior level
180K-220K Annually
Senior level
Software • Defense
Own reliability, scalability, and security for on-prem and AWS deployments. Build observability (Prometheus/Loki/Grafana/ELK), define SLOs/SLIs, lead incident response and postmortems, automate infrastructure (Terraform/Ansible), operate Kubernetes clusters, embed security/compliance controls, eliminate operational toil, and mentor teams.
Top Skills: AlloyAnsibleAWSAws GovcloudBashCloudFormationDatadogElkGithub ActionsGitlab Ci/CdGoGrafanaJenkinsKubernetesLokiPrometheusPythonRmfStigsTerraform
11 Days AgoSaved
Hybrid
Austin, TX, USA
Senior level
Senior level
Gaming • Information Technology • Mobile • Software • Esports
Lead design, build, and operation of multi-cloud hybrid infrastructure and Kubernetes platforms. Drive observability, SLI/SLOs, incident response, automation, CI/CD hardening, secrets/policy-as-code, and promote SRE practices across studios.
Top Skills: 1PasswordAnsibleArgocdAWSAws Secrets ManagerAws Systems ManagerBare MetalCiliumDatadogEksFluxGCPGithub ActionsGkeGoGrafanaHelmIstioJenkinsKubernetesOpa/GatekeeperOpentelemetryPasswordstatePrometheusPulumiPuppetPythonTerraformTerragruntTypescriptVMware
Reposted 12 Days AgoSaved
Hybrid
Austin, TX, USA
112K-186K Annually
Senior level
112K-186K Annually
Senior level
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Lead reliability efforts for cloud-native production systems: design and operate infrastructure, define SLOs/SLIs, lead incident response, build IaC and CI/CD, improve observability and automate toil, and mentor SRE engineers.
Top Skills: AWSAzureCassandraCdnCloudFormationDnsEcsElkGCPGithub ActionsGitopsGoGrafanaJavaJenkinsKubernetesLinuxMySQLNewrelicOraclePagerdutyPostgresPrometheusPythonRedisSplunkTcp/IpTerraform
Reposted 12 Days AgoSaved
Easy Apply
Hybrid
2 Locations
Easy Apply
Senior level
Senior level
Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
As a Senior Site Reliability Engineer, you will design and develop tooling and automation for infrastructure services, collaborate with multiple teams, automate processes, and ensure system reliability in a production environment.
Top Skills: AWSCloudFormationDatadogGoJavaJenkinsKibanaMavenNewrelicNode.jsPythonSignalfxTerraform
Reposted 12 Days AgoSaved
Easy Apply
Hybrid
Austin, TX, USA
Easy Apply
Senior level
Senior level
Computer Vision • Hardware • Machine Learning • Robotics • Software
The role involves maintaining cloud infrastructure, collaborating with engineering teams, troubleshooting issues, deploying solutions, and ensuring system reliability.
Top Skills: AnsibleC++GrafanaHelmKubernetesPagerdutyPythonTerraformTypescript
Reposted 12 Days AgoSaved
Hybrid
Chicago, IL, USA
130K-180K Annually
Senior level
130K-180K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
The Senior Site Reliability Engineer will focus on automating infrastructure, enhancing cloud resilience, supporting deployments, and mentoring teams in reliability best practices, while participating in on-call rotations.
Top Skills: AzureBashCi/CdDockerGoGrafanaJavaKubernetesPowershellPrometheusPythonRubyTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 12 Days AgoSaved
Remote or Hybrid
United States
175K-200K Annually
Senior level
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Reposted 12 Days AgoSaved
Hybrid
San Francisco, CA, USA
167K-226K Annually
Senior level
167K-226K Annually
Senior level
Security • Software • Cybersecurity • Automation
As a Senior Site Reliability Engineer, you will enhance the reliability of Drata’s product teams through automation, architecture reviews, and operational excellence using cloud-native technologies.
Top Skills: AiopsAWSBashDatadogDockerGitGithub ActionsKubernetesLinuxMySQLPythonTerraform
Reposted 6 Days AgoSaved
In-Office or Remote
2 Locations
Senior level
Senior level
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills: BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Reposted 6 Days AgoSaved
Hybrid
Atlanta, GA, USA
116K-175K Annually
Senior level
116K-175K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Security • Social Impact • Software • Cybersecurity
Design, deliver, and maintain a high-performance application platform. Automate processes, improve customer experience, and implement observability tools. Collaborate with teams to manage SLIs, SLOs, and ensure application performance.
Top Skills: AWSAzureBashDatadogGCPGitlabGrafanaHelmJavaJenkinsKubernetesNoSQLPrometheusPythonRubySpringSQLTerraform
Reposted 6 Days AgoSaved
In-Office
Chicago, IL, USA
130K-170K Annually
Senior level
130K-170K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing telemetry infrastructure, establishing SRE practices, and managing observability across cloud platforms.
Top Skills: ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsNode.jsOpentelemetryPowershellPulumiPythonRustTerraform
Reposted 6 Days AgoSaved
In-Office
4 Locations
110K-230K Annually
Senior level
110K-230K Annually
Senior level
Insurance
As a Staff Cyber SRE at GEICO, you will improve the reliability and performance of security platforms by writing production-quality code and automating workflows. Responsibilities include defining reliability standards, partnering with developers, driving observability, leading incident response, and embracing agile methodologies while enhancing system resilience.
Top Skills: AnsibleAWSAzureGCPGithub ActionsGoGrafanaJenkinsPrometheusPythonTerraform
Reposted 12 Days AgoSaved
In-Office or Remote
Eden Prairie, MN, USA
92K-164K Annually
Senior level
92K-164K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Site Reliability Engineer will manage and enhance cloud infrastructure, focusing on automation, performance, and security while collaborating with software and DevOps teams.
Top Skills: ArgocdAzureAzure MonitorDynatraceFluxGrafanaHelmKubernetesPrometheusPulumiRestful ServicesSplunkTerraform
Reposted 12 Days AgoSaved
In-Office
New York, NY, USA
161K-284K Annually
Senior level
161K-284K Annually
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
As a Senior Site Reliability Engineer, you will enhance platform reliability, lead incident management, and drive AI-driven improvements in operational workflows.
Top Skills: Amazon Web ServicesDatadogDynamoDBEnvoyEvent Driven ArchitecturesGrpcHTTPIstioJSONKotlinKubernetesLaunchdarklyModern JavaMySQLProtocol BuffersTerraformVitess
Reposted 12 Days AgoSaved
In-Office or Remote
8 Locations
161K-284K Annually
Senior level
161K-284K Annually
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills: Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Reposted 7 Days AgoSaved
In-Office
Birmingham, AL, USA
Senior level
Senior level
Automotive • Hardware • Logistics
The Manager of Site Reliability Engineering leads a team to enhance cloud infrastructure reliability, automate processes, and collaborate with various teams to improve service delivery and operations.
Top Skills: ArgocdCi/CdDatadogDynatraceGCPGoogle Cloud PlatformKubernetesTerraform
Reposted 7 Days AgoSaved
In-Office or Remote
San Francisco, CA, USA
136K-180K Annually
Senior level
136K-180K Annually
Senior level
Big Data • Energy • Big Data Analytics
The Staff Site Reliability Engineer will lead in designing and maintaining cloud infrastructure on GCP, drive IaC strategy, manage Kubernetes operations, ensure security compliance, and mentor engineers.
Top Skills: BashGoGoogle Cloud PlatformGrafanaKubernetesOpentelemetryPostgresPythonTerraform
Reposted 7 Days AgoSaved
Remote or Hybrid
USA
136K-170K Annually
Mid level
136K-170K Annually
Mid level
Cloud • Security • Software
As a Site Reliability Engineer, you will design, deliver, and maintain cloud-based infrastructure, ensuring resilient and secure enterprise software solutions through optimized CI/CD processes.
Top Skills: Ci/CdDockerGCPGitGoKubernetes
8 Days AgoSaved
In-Office
Washington, DC, USA
135K-150K Annually
Mid level
135K-150K Annually
Mid level
Aerospace • Defense • Manufacturing
Lead and build the deployment engineering function to operate mission-critical software in accredited, air-gapped, and high-side environments. Own full deployment lifecycle across cloud, on-prem, and disconnected networks; manage Kubernetes/OpenShift and Linux infrastructure; build CI/CD and IaC workflows; integrate security tooling; diagnose and prevent production issues; produce ATO-related artifacts and maintain compliance.
Top Skills: AlertmanagerAWSAzureBashDockerGCPGitlab CiGoGrafanaGroovyHelmJavaJenkinsKubernetesOpenshiftPagerdutyPodmanPrometheusPythonRhelRubyService MeshSplunkTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account