Top Site Reliability Engineer Jobs

Reposted 11 Days AgoSaved
In-Office or Remote
Plano, TX, USA
117K-209K Annually
Senior level
117K-209K Annually
Senior level
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
Lead reliability for production services in Autodesk GovCloud by building automation, SLO/SLI practices, observability, incident response, resilience testing, and runbooks. Deploy, operate, and improve cloud services while ensuring compliance (FedRAMP) and participating in 24x7 on-call rotations and cross-team collaboration.
Top Skills: APIsAWSAws GovcloudAzureBashCaching TechnologiesCi/CdCloudwatchContainersDatabasesDatadogDeployment AutomationDistributed SystemsDnsDynatraceGoInfrastructure As CodeJavaKubernetesLoad BalancingMessaging SystemsNetworkingPowershellPythonSplunkStorage Platforms
Reposted 11 Days AgoSaved
Remote
United States
160K-190K Annually
Senior level
160K-190K Annually
Senior level
Legal Tech • Software
As a Senior Site Reliability Engineer, you will lead reliability initiatives, design and maintain systems, enhance CI/CD pipelines, and mentor junior engineers while ensuring system availability and performance.
Top Skills: AWSBashCloudwatchEc2EksIamKubernetesLambdaPowershellPythonS3
Reposted 11 Days AgoSaved
Remote
USA
125K-165K Annually
Senior level
125K-165K Annually
Senior level
Healthtech
Design, scale, and operate secure AWS cloud infrastructure (EKS, IAM, RBAC); build and maintain IaC (Terraform/Terragrunt), GitHub Actions CI/CD, Datadog observability, and Python automation; document runbooks, participate in on-call rotations, postmortems, and Agile workflows to improve reliability and security.
Top Skills: AWSDatadogEc2EksFargateGithub ActionsGithub Advanced SecurityHelmIamJIRAKubernetesLambdaPythonRbacSecrets ManagerServerlessTerraformTerragruntVpc
Reposted 11 Days AgoSaved
In-Office
3 Locations
152K-288K Annually
Senior level
152K-288K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Design, build, and operate global, multi-cloud HPC service platforms. Own IaC-driven provisioning, reliability, observability, capacity planning, incident response, and automation to ensure high uptime and QoS for internal customers.
Top Skills: AiopsAWSCi/CdContainer ManagementGCPGoInfrastructure As CodeKubernetesLog CollectionLsfMetricsMonitoringObservabilityOciPerlPythonRubySlurm
Reposted 11 Days AgoSaved
In-Office or Remote
9 Locations
105K-198K Annually
Senior level
105K-198K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Other • Analytics
The Senior Site Reliability Engineer will manage and optimize cloud infrastructure on AWS, design Kubernetes clusters, and automate workflows while mentoring junior members.
Top Skills: AWSGithub ActionsKubernetesNewrelicPython
Reposted 11 Days AgoSaved
In-Office
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Healthtech • Software • Automation
Design, build, and operate Optura's multi-cloud, HIPAA-aware platform: run Kubernetes across cloud and customer on-prem/air-gapped environments, create unified deployment tooling (Helm/operators/GitOps), own SLOs/capacity/incident response, drive reliability, implement identity/networking/security controls, and build IaC/GitOps patterns in partnership with product and security teams.
Top Skills: AksArgo CdAWSAzureBackstageCluster ApiCrossplaneDistributed TracingEksGCPGitopsGkeGoGrafanaHelmKmsKubernetesMtlsOidcOpenshiftOpentelemetryOperatorsPrometheusPulumiPythonRancherReplicatedSecrets ManagementService MeshTalosTerraformVpc
Reposted 11 Days AgoSaved
In-Office
Austin, TX, USA
152K-195K Annually
Senior level
152K-195K Annually
Senior level
Information Technology • Security • Cybersecurity
Lead design, build, and scale of Kubernetes-based, multi-tenant infrastructure and AI tooling. Own CI/CD, IaC, GitOps, and streaming analytics (Kafka/Flink/ClickHouse). Improve observability, SLOs, automated testing, progressive delivery, incident response, and mentor teams on reliability, security, and automation.
Top Skills: Ai/Llm ToolingAksArgo CdBashClickhouseDatadogEksFlinkGithub ActionsGitlab CiGitopsGkeGoGrafanaHelmJenkinsKafkaKubernetesMcp ServersOpentelemetryPrometheusPulumiPythonTerraform
Reposted 11 Days AgoSaved
In-Office
Seattle, WA, USA
175K-200K Annually
Senior level
175K-200K Annually
Senior level
Fintech
Design, build, and maintain platform services and observability tooling to ensure reliable, scalable, and recoverable production systems. Define SLOs/SLIs, automate infrastructure-as-code, manage Kubernetes/Docker environments, perform incident management and root cause analysis, mentor engineers, and improve developer experience and reliability through automation and tooling.
Top Skills: Amazon RdsAnsibleAWSChefClaude Code PluginsCloud-NativeDockerElasticsearchEvent-Driven ArchitectureExcelGCPGcp CloudsqlGoJavaKotlinKubernetesKubernetes OperatorsLuceneMcp ServersMicrosoft OutlookMs Sql ServerMySQLObservabilityPostgresPuppetPythonRedisSolrSpring BootTerraformWord
Reposted 11 Days AgoSaved
In-Office
Seattle, WA, USA
175K-200K Annually
Senior level
175K-200K Annually
Senior level
Fintech
Design, build, and operate reliable, scalable production systems and SRE tooling. Define SLOs, implement observability, automate infrastructure-as-code, manage incident response and recoverability, mentor engineers, and maintain cloud-native services with capacity planning and failover strategies.
Top Skills: Amazon RdsAnsibleAWSChefDockerElasticsearchGCPGcp CloudsqlKubernetesLinuxLuceneExcelMicrosoft OutlookMicrosoft WordMs Sql ServerMySQLPostgresPuppetRedisSolrTerraformUnix
Reposted 11 Days AgoSaved
In-Office or Remote
Denver, CO, USA
149K-158K Annually
Senior level
149K-158K Annually
Senior level
Semiconductor • Manufacturing
The role involves leading reliability initiatives, designing patterns for AI operations, managing SLOs, and mentoring junior engineers. You'll ensure platform resilience and optimize CI/CD pipelines for an AI-first intelligence platform.
Top Skills: AWSBashCloudwatchDatadogDockerEksGitopsJavaKubernetesLambdaPythonSpring BootTerraform
Reposted 11 Days AgoSaved
Hybrid
Palo Alto, CA, USA
30K-30K Annually
Senior level
30K-30K Annually
Senior level
Artificial Intelligence • eCommerce
As a Senior Site Reliability Engineer, you'll ensure the reliability and scalability of production systems, define SLOs and SLIs, lead incident responses, and improve cloud infrastructure performance.
Top Skills: AWSClickhousePostgresPulumiTemporalTurbopuffer
11 Days AgoSaved
In-Office
Lehi, UT, USA
125K-145K Annually
Senior level
125K-145K Annually
Senior level
Security • Software • Cybersecurity
Design, build, and operate highly available, fault-tolerant cloud-native systems. Implement observability, automation, CI/CD, and IaC; respond to incidents, run RCAs, and drive reliability improvements.
Top Skills: AWSAzureBashCi/CdDatadogGCPGoGrafanaInfrastructure As Code (Iac)KubernetesNew RelicOpentelemetryPrometheusPythonSplunkTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 11 Days AgoSaved
In-Office
Washington, DC, USA
185K-230K Annually
Senior level
185K-230K Annually
Senior level
Information Technology • Consulting
Design, deploy, and maintain mission-critical containerized and virtualized workloads; build and operate CI/CD pipelines, monitoring, and configuration management; provision developer toolchains; reduce developer friction; lead incident response and root cause analysis; ensure scalability, availability, and government compliance in production environments.
Top Skills: AnsibleBashCi/CdDesired State ConfigurationF5Gitlab Ci/CdKubernetesMinioMonitoring/ObservabilityPortworxS3-Compatible ServicesVMware
Reposted 11 Days AgoSaved
Remote
United States of America
185K-227K Annually
Senior level
185K-227K Annually
Senior level
Other
The Senior Site Reliability Engineer at Juul Labs ensures operational stability and performance of hybrid cloud infrastructure, leads automation, and handles critical incidents.
Top Skills: AWSBashCloudFormationGCPNutanixPowershellPythonTerraform
Reposted 2 Days AgoSaved
In-Office or Remote
7 Locations
Mid level
Mid level
Cloud • Software
As a Site Reliability / Gitops Engineer, you will automate operations, develop Infrastructure as Code, maintain core services, and collaborate on service architecture.
Top Skills: Ci/CdCloud ComputingElasticsearchGrafanaInfrastructure As CodeLinuxPrometheusPython
13 Days AgoSaved
In-Office
Washington, DC, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and automate on‑prem and cloud infrastructure; manage Kubernetes and core services (databases, monitoring, storage); collaborate with software engineers to build scalable, operable systems; own service lifecycle from design to deployment and refinement.
Top Skills: AnsibleBashBazelDatabasesDistributed DatabasesKubernetesLinuxMakefilesMonitoringPythonStorageTcp/IpTerraform
Reposted 13 Days AgoSaved
In-Office
San Francisco, CA, USA
195K-240K Annually
Senior level
195K-240K Annually
Senior level
Software
The Site Reliability Engineer will enhance reliability, observability, and incident response of You.com's production services, while collaborating with teams to implement best practices and improve operational efficiency through tooling and automation.
Top Skills: AWSBashCi/CdEksGhaGitGitGrafanaOpentelemetryPrometheusPythonTerraform
Reposted 13 Days AgoSaved
In-Office
Seattle, WA, USA
134K-215K Annually
Senior level
134K-215K Annually
Senior level
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
The Senior Site Reliability Engineer I will enhance Axon's observability platform, work on distributed tracing, log aggregation, and metrics infrastructure, and develop internal tools while collaborating with engineering teams.
Top Skills: ArgocdCdkCortexGoGrafanaHelmJaegerJavaLokiOpentelemetryPrometheusPythonTerraform
Reposted 2 Days AgoSaved
In-Office or Remote
7 Locations
200K-200K Annually
Senior level
200K-200K Annually
Senior level
Cloud • Software
The Senior Site Reliability / Gitops Engineer will drive automation and collaboration within the IS team, enhancing Canonical's IT operations and services while managing infrastructure as code and cloud technologies.
Top Skills: Cloud ComputingDockerElasticsearchGitopsGrafanaIacKubernetesLinuxPrometheusPython
Reposted 14 Days AgoSaved
In-Office or Remote
Austin, TX, USA
50K-80K Annually
Senior level
50K-80K Annually
Senior level
Artificial Intelligence • Information Technology • Software
The Senior SRE will manage multi-cloud infrastructure, ensuring reliability and scalability. Responsibilities include building CI/CD pipelines, defining SLOs, and implementing automation.
Top Skills: Ai-Assisted DevelopmentAWSAzureClaude CodeDatadogGCPGrafanaKubernetesTerraform
Reposted 14 Days AgoSaved
In-Office
Santa Clara, CA, USA
148K-276K Annually
Senior level
148K-276K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Operate and improve an AI Data Center AIOps platform: monitor health, own SLOs/SLIs, handle incidents, manage Kubernetes deployments, maintain IaC/CI-CD (Helm/Terraform), and produce runbooks/automation to ensure reliability and scalable telemetry processing.
Top Skills: BashCi/CdClickhouseElastic/ElasticsearchFlinkGrafanaHelmKafkaKubernetesObject StoragePrometheusPulsarPythonSparkTerraformTsdbs
15 Days AgoSaved
In-Office
Redmond, WA, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and automate infrastructure for on‑prem and cloud compute. Manage core services (databases, monitoring, storage), collaborate with software teams to build scalable, operable systems, and own the service lifecycle from design through deployment, operation, and refinement to ensure secure, reliable, and autonomous satellite software services.
Top Skills: AnsibleBashBazelCloudDatabasesKubernetesLinuxMakefilesMonitoringPythonTcp/IpTerraform
15 Days AgoSaved
In-Office
Washington, DC, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and automate core infrastructure (on‑prem and cloud) and manage Kubernetes and Linux fleets. Collaborate with engineers to build scalable, operable services and improve lifecycle: testing, CI/CD, deployment, monitoring, and performance.
Top Skills: AnsibleBashBazelCi/CdCloudDatabasesDistributed DatabasesKubernetesLinuxMakefilesMonitoringPythonStorageTcp/IpTerraform
15 Days AgoSaved
In-Office
Hawthorne, CA, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and automate on‑prem and cloud compute infrastructure; manage core infrastructure (databases, monitoring, storage); collaborate with software teams to build scalable, operable systems; improve service lifecycle from design through deployment, operation, and refinement.
Top Skills: AnsibleBashBazelDatabasesKubernetesLinuxMakeMakefilesMonitoringPythonStorageTcp/IpTerraform
Reposted 15 Days AgoSaved
Hybrid
3 Locations
182K-225K Annually
Senior level
182K-225K Annually
Senior level
Fintech • Software
As a Senior Site Reliability Engineer, you'll build and scale internal platform offerings, design monitoring systems, and collaborate with software engineers to ensure application performance and reliability.
Top Skills: AnsibleAWSCloudFormationDatadogDockerElk StackGrafanaGrpcJavaKubernetesPostgresPrometheusPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account