Top Site Reliability Engineer Jobs

16 Days AgoSaved
In-Office or Remote
San Francisco, CA, USA
114K-235K Annually
Mid level
114K-235K Annually
Mid level
Social Media
Operate, scale, and improve a cloud-native platform on AWS and Kubernetes. Manage GitOps deployments with ArgoCD and Helm, provision infra with Terraform/Terragrunt, build CI/CD automation, enhance observability, respond to incidents, reduce operational toil through scripting, and collaborate with security and application teams to improve reliability and platform guardrails.
Top Skills: ArgocdAWSBashContainersEksGithub ActionsGitopsHelmIamKubernetesLinuxPythonTerraformTerragrunt
16 Days AgoSaved
In-Office
New York, NY, USA
Senior level
Senior level
Cloud • Information Technology • Consulting • Cybersecurity
Design, templatize and deploy scalable infrastructure in public clouds (AWS, GCP) using IaC (CloudFormation). Support architects, troubleshoot developer escalations, ensure compliance, and build stable platform services; work within agile teams to create configuration templates and automated deployments.
Top Skills: AWSAws CloudformationAws EfsEc2GCPPythonRdsRuby
Senior level
Financial Services
Drive reliability, scalability, and performance for mission-critical applications using SRE principles. Implement monitoring, SLIs/SLOs, automation, and fault-tolerance strategies. Lead incident response, RCA, and embed reliability practices into the SDLC while collaborating across development, infrastructure, network, and security teams.
Top Skills: AutosysAWSDevops ToolsDynatraceJIRAMainframeMonitoring/ObservabilityPlsqlPythonServicenowShellSplunkSQLUnix/LinuxWindows
16 Days AgoSaved
In-Office
Reston, VA, USA
136K-184K Annually
Senior level
136K-184K Annually
Senior level
Information Technology • Software
Operate, provision, and secure IBM POWER and AIX infrastructure for mission-critical services. Install, configure, and maintain physical hosts, PowerVM/PowerVC environments, and AIX images to meet security controls. Troubleshoot production issues, document procedures, coordinate deployments with engineering teams, and participate in a 24x7 on-call rotation.
Top Skills: AixCisHmcIbm PowerLinuxNimPowervcPowervmVios
16 Days AgoSaved
In-Office
New York, NY, USA
131K-164K Annually
Expert/Leader
131K-164K Annually
Expert/Leader
Software
Design, deploy, and automate VMware-based private cloud infrastructure across global datacenters. Administer Linux and Windows Server platforms, integrate Active Directory, manage storage, networking, ADCs (F5/AVI), and ensure availability, security, and compliance. Build automation (PowerCLI/Ansible/Python), participate in on-call rotations, document systems, and mentor junior engineers while driving infrastructure modernization and reliability improvements.
Top Skills: Active DirectoryAnsibleAvi (Nsx Advanced Load Balancer)CentosCi/CdDnsF5 Big-IpGitNasPowercliPowershellPythonRhelSanTcp/IpUbuntuVcenter)Vmware Vsphere (EsxiVpnWindows Server
Reposted 21 Days AgoSaved
In-Office or Remote
Eden Prairie, MN, USA
92K-164K Annually
Senior level
92K-164K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Site Reliability Engineer will architect and maintain cloud infrastructure, collaborating with software and DevOps engineers while ensuring security and performance.
Top Skills: ArgocdAWSAzureAzure MonitorDynatraceFluxGraphanaHelmKubernetesPrometheusPulumiRestful ServicesSplunkTerraform
Reposted 16 Days AgoSaved
In-Office
San Francisco, CA, USA
350K-475K Annually
Mid level
350K-475K Annually
Mid level
Artificial Intelligence • Information Technology
The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.
Top Skills: Cloud InfrastructureKubernetes
Reposted 16 Days AgoSaved
In-Office
Cape Canaveral, FL, USA
Senior level
Senior level
Aerospace • Other
The Sr. IT Linux Site Reliability Engineer will manage and optimize Kubernetes clusters, automate systems, and foster collaboration to support SpaceX's engineering teams and infrastructure needs.
Top Skills: AnsibleDockerGitGoGrafanaHelmJSONKubernetesLinuxPrometheusPythonTerraformYaml
Reposted 16 Days AgoSaved
Remote
Texas, USA
Mid level
Mid level
Blockchain
The Blockchain Site Reliability Engineer is responsible for maintaining blockchain nodes' reliability, monitoring, incident response, and building automation tools to enhance operations.
Top Skills: DockerElkGoGrafanaJavaScriptKubernetesLinuxPrometheusPythonRustShell
Reposted 16 Days AgoSaved
In-Office
Centre, Green, OH, USA
125K-185K Annually
Senior level
125K-185K Annually
Senior level
Fintech • Financial Services
Lead the Site Reliability Engineering efforts for the Investment Banking CTO team, focusing on resilience, architectural guidance, and SRE adoption across applications and platforms.
Top Skills: ArgocdFluxcdGCPGoJavaKubernetesPythonTerraform
Reposted 17 Days AgoSaved
In-Office
5 Locations
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.
Top Skills: GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform
17 Days AgoSaved
In-Office
Hawthorne, CA, USA
125K-175K Annually
Junior
125K-175K Annually
Junior
Aerospace • Other
Design, operate, and scale on-premise infrastructure for the Starshield satellite constellation. Build automation for Kubernetes cluster deployment and management, operate core infrastructure (databases, monitoring, distributed storage), collaborate with software teams, troubleshoot across the stack, improve service lifecycle, and ensure high availability through monitoring and performance improvements.
Top Skills: AnsibleBashC++GoKubernetesLinuxOci ContainersPythonTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
17 Days AgoSaved
In-Office
Redmond, WA, USA
125K-175K Annually
Junior
125K-175K Annually
Junior
Aerospace • Other
Design, deploy, and operate on-premises Kubernetes clusters and core infrastructure (databases, monitoring, distributed storage). Build automation, troubleshoot across the Starshield stack, collaborate with software teams to ensure scalable, highly available services, and improve lifecycle processes.
Top Skills: AnsibleBashBazelC++DatabasesDistributed StorageGoKubernetesLinuxMakefilesMonitoringOci ContainersPythonTcp/IpTerraform
17 Days AgoSaved
Remote
United States
170K-210K Annually
Senior level
170K-210K Annually
Senior level
eCommerce
Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.
Top Skills: ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns
17 Days AgoSaved
In-Office or Remote
2 Locations
80K-133K Annually
Mid level
80K-133K Annually
Mid level
Consulting
Maintain and improve reliability of cloud-based enterprise systems by implementing SRE practices. Participate in design and code reviews, incident management, automation (IaC/CI-CD), monitoring, documentation, and collaboration with cross-functional teams to reduce downtime and improve scalability and security.
Top Skills: Ansible Automation PlatformArtifactoryAWSAzureBashCi/CdGitlabIacLinuxPackerPowershellPythonTerraformWindows
17 Days AgoSaved
In-Office
Santa Monica, CA, USA
31-56 Hourly
Junior
31-56 Hourly
Junior
Gaming • Hardware
Entry-level Site Reliability Engineer responsible for monitoring service health, incident response, troubleshooting Kubernetes, networking, DNS, and application issues, building observability (dashboards, alerts, runbooks), automating repetitive tasks, and supporting release reliability and post-incident remediation.
Top Skills: BashCloudContainersDashboardsDnsGitHTTPKubernetesLinuxLoggingMetricsMonitoringPython
Reposted 17 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-350K Annually
Senior level
200K-350K Annually
Senior level
Artificial Intelligence
The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.
Top Skills: AWSAzureCloudflareGCPKubernetesTerraform
Reposted 17 Days AgoSaved
In-Office
Bay Lake, FL, USA
Expert/Leader
Expert/Leader
Digital Media • Gaming • News + Entertainment • Sports
The Principal Site Reliability Engineer will lead DevOps culture, architect security solutions, and monitor emerging technologies for Disney Experiences Technology.
Top Skills: AIAkamaiMobile TechnologiesSecurity ToolsWeb Technologies
Reposted 17 Days AgoSaved
In-Office
New York, NY, USA
123K-165K Annually
Mid level
123K-165K Annually
Mid level
Digital Media • Gaming • News + Entertainment • Sports
The Site Reliability Engineer II contributes to system stability and scalability by implementing automation, enhancing observability, and participating in incident response and root cause analysis.
Top Skills: Argo CdAWSAzureBashCi/CdCloudFormationDatadogDockerEfkElkFluxGCPGithub ActionsGitlab CiGoGrafanaJavaScriptJenkinsKubernetesLinuxNew RelicPrometheusPythonSplunkTerraform
Reposted 17 Days AgoSaved
In-Office
Bay Lake, FL, USA
Expert/Leader
Expert/Leader
Digital Media • Gaming • News + Entertainment • Sports
Lead SRE culture, mentor teams, manage observability and reliability. Design and support cloud-agnostic systems, and automate infrastructure using advanced tools while enhancing organizational performance.
Top Skills: AIAnsibleAWSAzureChefCloud FormationDevops ToolsGCPLinuxLlmsTerraformWindows
Reposted 17 Days AgoSaved
In-Office
Tyson's Corner, VA, USA
159K-230K Annually
Senior level
159K-230K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Reposted 17 Days AgoSaved
Hybrid
Arlington, TX, USA
Senior level
Senior level
Fintech • Financial Services
The Lead Site Reliability Engineer will manage and optimize Kubernetes and Spark environments, supporting data processing and machine learning platforms while collaborating with diverse teams to enhance system reliability and performance.
Top Skills: AnsibleSparkAzureAzure DevopsChefDockerGoJavaJenkinsKubernetesObject StoragePuppetPythonRubyTerraform
Reposted 17 Days AgoSaved
Hybrid
2 Locations
160K-200K Annually
Mid level
160K-200K Annually
Mid level
Aerospace • Cloud • Software • Defense • Automation
The Site Reliability Engineer will design, automate, and operate cloud systems, focusing on DevSecOps and operational stability, while improving reliability and collaborating with engineers across the platform.
Top Skills: AWSAws IamDatadogGitlabGrafanaKubernetesLinux/UnixPythonRancherTerraform
Reposted 17 Days AgoSaved
Hybrid
New York, NY, USA
Senior level
Senior level
Fintech
The SRE/DevOps Engineer will enhance observability and monitoring tools, improve system reliability, conduct post-incident reviews, and collaborate with developers to optimize workflows and CI/CD processes.
Top Skills: AWSAzureAzure BicepAzure DevopsChaos MeshCloud FormationDatadogDockerElasticsearchGCPGithub ActionsGitlab Ci/CdGrafanaGremlinJenkinsKafkaKubernetesTerraform
Reposted 17 Days AgoSaved
In-Office
Columbus, OH, USA
124K-186K Annually
Senior level
124K-186K Annually
Senior level
Information Technology • Security • Software • Consulting
As a Site Reliability Engineer, you will design and support AWS infrastructure, build CI/CD pipelines, debug systems, and promote self-service for product teams. Collaborate with teams for continuous deployment and automation in a cloud environment, leveraging AI tools for efficiency.
Top Skills: Ai ToolingAWSCi/CdGithub ActionsKubernetesPostgresTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account