Top Site Reliability Engineer Jobs

8 Days AgoSaved
Remote
Location, WV, USA
164K-215K Annually
Expert/Leader
164K-215K Annually
Expert/Leader
Aerospace • Information Technology • Professional Services • Security • Software
Design, build, and maintain highly available cloud and on‑prem systems. Automate operations, implement monitoring/alerting, tune performance, and drive incident response and root cause fixes. Collaborate on reliable architectures and CI/CD pipelines, champion SRE best practices (SLIs/SLOs, error budgets), and support proposal technical content.
Top Skills: AWSAzureBashCi/Cd PipelinesContainer OrchestrationDatadogElkGrafanaKubernetesLinuxNetworkingPowershellPrometheusPythonSplunk
8 Days AgoSaved
In-Office or Remote
2 Locations
Mid level
Mid level
Hardware
Lead technical services engineer guiding and training engineers, designing IT architecture, troubleshooting network security and third-party control integrations, coordinating projects, providing customer training and field support, and managing personnel and resources.
Top Skills: 802.1XAmxCrestronExcelMicrosoft OutlookMicrosoft PowerpointMicrosoft WordRadiusSecurity Certificate Management
8 Days AgoSaved
In-Office
Bastrop, TX, USA
Senior level
Senior level
Aerospace • Other
Design, build, operate, scale, and optimize Kubernetes and RKE clusters and Linux infrastructure using automation (Ansible, Terraform). Collaborate with engineers to deploy resilient, high-performance systems, drive automation, define standards, upskill the team, and participate in on-call rotation.
Top Skills: AnsibleArgocdAwx/TowerCephCgroupsCi/CdCiliumCloud-InitCniCriCri-OCsiDockerGitGitopsGoGrafanaHelmInfluxdbIptablesIstioJenkinsJinjaJsonnetKernel ModulesKubernetesLinuxMetallbPkiPrometheusPuppetPythonRedfishRkeRook-CephShellSubversionTerraformVagrantVMwareYaml
Reposted 8 Days AgoSaved
Remote
United States
90K-159K Annually
Mid level
90K-159K Annually
Mid level
Fitness • Healthtech • Information Technology • Payments • Software
The Site Reliability Engineer will enhance system reliability, manage cloud infrastructure, automate processes, support CI/CD pipelines, and troubleshoot production issues.
Top Skills: AnsibleAWSBashChefDockerGitGitlabJenkinsKubernetesMySQLPostgresPythonSQL ServerTerraformVMware
Reposted 13 Days AgoSaved
In-Office
Louisville, KY, USA
Senior level
Senior level
Healthtech • Payments • Software
The Senior SRE I will design and maintain automation for infrastructure provisioning, monitor system health, resolve production incidents, and mentor junior SREs, ensuring reliability and operational efficiency across cloud platforms.
Top Skills: AnsibleAWSAzureBashCloudFormationDatadogDockerGCPGithub ActionsGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonRubySplunkTerraform
Reposted 8 Days AgoSaved
Remote
United States
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Reposted 8 Days AgoSaved
Hybrid
Denver, CO, USA
160K-200K Annually
Mid level
160K-200K Annually
Mid level
Aerospace • Cloud • Software • Defense • Automation
Design and automate cloud systems for U.S. Government, focusing on DevSecOps, reliability, deployment automation, and observability. Participate in on-call rotations, supporting production environments and improving system resilience.
Top Skills: Aws EksDatadogGitlabGrafanaKubernetesLinux/UnixPythonTerraform
Reposted 8 Days AgoSaved
Remote
USA
180K-210K Annually
Senior level
180K-210K Annually
Senior level
Artificial Intelligence • Insurance • Software • Automation
The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.
Top Skills: AWSKubernetesPostgresTerraform
Reposted 8 Days AgoSaved
In-Office
Overland Park, KS, USA
Senior level
Senior level
Healthtech • Professional Services • Software
The Sr Software Engineer leads complex software development, ensuring solution scalability, collaborating with teams, solving technical problems, and advocating for high-quality software solutions.
Top Skills: AngularArgo CdAzure DevopsCi/CdGoogle Cloud PlatformKubernetesNew RelicOpentelemetryRuby On RailsTerraform
Reposted 8 Days AgoSaved
In-Office
Secaucus, NJ, USA
150K-170K Annually
Expert/Leader
150K-170K Annually
Expert/Leader
Healthtech • Database
Seeking a Principal Site Reliability Engineer to build a SRE practice, enhance reliability, mentor teams, and drive performance engineering to optimize Quest products and services.
Top Skills: AnsibleAuroraAWSAzureBigtableCassandraCi/CdCloud Pub/SubCloud SpannerCloud SqlDockerDynamoDBDynatraceGitlabGoGCPJavaJmsKafkaKinesisKubernetesMqPerlPythonRdsRubyShell ScriptingTerraform
Reposted 8 Days AgoSaved
In-Office
Aliso Viejo, CA, USA
146K-219K Annually
Senior level
146K-219K Annually
Senior level
Gaming
The role involves ensuring production quality, owning system reliability, and participating in decision-making. Responsibilities include incident response and lifecycle management in cloud gaming technologies.
Top Skills: BashC++ElasticsearchGoIstioJavaKafkaKong Api GatewayKubernetesKumaLinkerdMongoDBMySQLPostgresPythonRedisRust
Reposted 8 Days AgoSaved
Remote
United States
205K-270K Annually
Senior level
205K-270K Annually
Senior level
Artificial Intelligence • Other • Sales • Software
The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.
Top Skills: ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 8 Days AgoSaved
In-Office or Remote
11 Locations
160K-179K Annually
Senior level
160K-179K Annually
Senior level
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
9 Days AgoSaved
In-Office
Irvine, CA, USA
100K-140K Annually
Junior
100K-140K Annually
Junior
Hardware • Manufacturing
Operate and harden a multi-cloud microservices platform: deploy on Kubernetes, run load/chaos tests, build observability, automate with scripts, define SLO/SLA, ensure security/compliance, participate in incident response, disaster recovery, on-call rotation, and mentor junior team members.
Top Skills: AWSAzureBashGCPGoHpaJavaJvmKubernetesMicroservicesOciPowershellPython
9 Days AgoSaved
In-Office
5 Locations
128K-173K Annually
Senior level
128K-173K Annually
Senior level
Aerospace • Information Technology • Professional Services • Security • Software
Maintain and improve reliability, scalability, and performance of enterprise infrastructure across global sites. Implement automation and infrastructure-as-code, build monitoring and observability, perform RCA and incident response, support patching and RMF changes, integrate new capabilities, and maintain operational documentation and ITIL/ITSM processes to ensure mission-ready, high-availability environments.
Top Skills: AnsibleElkNagiosPowershellPythonScomSolarwindsSplunkTerraform
Senior level
Fintech • Financial Services
Lead SRE technical strategy and architecture for highly available, scalable enterprise platforms. Build automation, observability, and incident response practices; mentor senior engineers; drive capacity planning, production reliability, and adoption of SRE best practices across cloud and on-prem environments.
Top Skills: AnsibleAWSBigQueryChefCloudFormationDatadogDockerElasticsearchElk StackGCPGitlabGoGrafanaJavaJenkinsKafkaKubernetesLinuxMavenPagerdutyPrometheusPrompt EngineeringPuppetPythonRetrieval-Augmented Generation (Rag)Terraform
Mid level
Fintech • Financial Services
Site Reliability Engineer on the Compliance Engineering team responsible for ensuring production service health, capacity planning, monitoring, incident management, SLIs/SLOs, automation to reduce toil, and collaborating with engineers to improve scalability, reliability, and observability across distributed, cloud-native and big-data systems.
Top Skills: Automated TestingAWSAzureDistributed TracingElkGCPGrafanaHadoopJavaLinuxLoggingMetricsObservabilityOpentelemetryPerlPrometheusPythonRelational Databases
9 Days AgoSaved
In-Office
Park, MI, USA
Expert/Leader
Expert/Leader
Fintech • Financial Services
VP-level SRE/DevOps leader responsible for global strategy and delivery of CI/CD, IaC, cloud-native platforms, observability, reliability engineering (SRE), security/compliance, automation, incident management, and mentoring teams to enable migration to microservices and optimize costs and resilience.
Top Skills: AksApp InsightsArmAWSAzureAzure DevopsBashCloudFormationDockerDynatraceEksElkGCPGithub ActionsGkeGrafanaJenkinsKubernetesNew RelicOraclePrometheusPythonSplunkSQLTerraform
9 Days AgoSaved
Hybrid
Chicago, IL, USA
165K-288K Annually
Senior level
165K-288K Annually
Senior level
Artificial Intelligence • Cloud • Fintech • Information Technology • Analytics • Financial Services • Cybersecurity
Lead adoption and standardization of SRE practices across the enterprise. Establish SRE governance, define reliability metrics (SLOs/SLIs), build a Community of Practice, run training/forums, enable automation and tooling, partner with platform teams on observability, chaos engineering, and self-healing, and drive cross-functional alignment for resilience and incident management.
Top Skills: AutomationAzure MonitorChaos EngineeringCi/CdCloud-NativeDevOpsDynatraceHybrid ArchitecturesIncident ManagementObservabilityPlatform EngineeringPrometheusRelease EngineeringSelf-HealingSplunkSre
9 Days AgoSaved
Hybrid
2 Locations
130K-160K Annually
Mid level
130K-160K Annually
Mid level
Cloud • Security
Build and operate the production platform (Kubernetes, AWS, IaC, CI/CD, observability), automate self-service deployment, embed security and secrets management, run and modernize on-call, drive cost efficiency, mentor teammates, and maintain runbooks and post-incident reviews.
Top Skills: AWSBashCi/CdClaudeGitGrafanaKubernetesLinuxPrometheusPythonSaltTerraform
9 Days AgoSaved
Remote
United States
Senior level
Senior level
Agency • Information Technology
Lead SRE role designing and maintaining CI/CD pipelines (GitHub Actions), containerized deployments (Docker, Kubernetes, AKS, Helm), web/mobile app releases, observability, automated testing, and DevOps best practices across cloud environments with cross-functional collaboration and regulatory compliance.
Top Skills: AksAndroidAzure Application InsightsAzure Log AnalyticsAzure MonitorBashBranchingDockerDocker ComposeGitGit HooksGithub ActionsGoogle PlayHelmHerokuiOSIos App StoreJavaKubernetesNpmPowershellPull RequestsPythonSonarqubeVeracodeVercel
9 Days AgoSaved
In-Office
Westerville, OH, USA
Senior level
Senior level
Hardware • Software • Analytics
Owner of cross-platform observability and incident management for Vertiv Digital platforms. Design and operate monitoring, SLOs/SLIs, incident response, SLA governance, capacity planning, automation to reduce toil, CI/CD reliability, and enforce DevSecOps and operational governance across cloud and containerized environments.
Top Skills: AnsibleAWSAzure DevopsAzure MonitorC#Ci/CdCompass AiCursorDastDatadogDockerFeature FlagsGithub ActionsGitlabGrafanaJavaJavaScriptJenkinsKubernetesPower AutomatePowershellPrometheusPythonRubySastSecrets ManagementSite ScopeSplunkTerraformUipathWorkatoWriter Ai
9 Days AgoSaved
Hybrid
Houston, TX, USA
Senior level
Senior level
Hardware • Other • Energy
Maintain and monitor production systems for availability and performance; lead incident response and postmortems; implement observability, alerting, and automated remediation; optimize distributed systems (AKKA.NET) and PostgreSQL; build CI/CD pipelines and infrastructure-as-code.
Top Skills: Akka.NetAWSAzureAzure DevopsAzure PipelinesBashC#DatadogDockerElkGCPGitGithub ActionsGitlabGitlab CiGrafanaKubernetesOpentelemetryPhobosPostgresPowershellPrometheusPythonTerraform
Reposted 9 Days AgoSaved
In-Office
San Jose, CA, USA
207K-259K Annually
Senior level
207K-259K Annually
Senior level
Aerospace
Responsible for the reliability, scalability, performance, and security of core systems, implementing infrastructure, maintaining cloud-native services, and developing automation solutions.
Top Skills: AirflowAmazon EksArgocdAWSBashDockerElk StackGitlab CiGrafanaJenkinsKafkaPowershellPrometheusPythonSpark
10 Days AgoSaved
In-Office
Lehi, UT, USA
160K-190K Annually
Senior level
160K-190K Annually
Senior level
Security • Software • Cybersecurity
Lead platform reliability and cloud modernization across multi-cloud (AWS/Azure/GCP). Define SLIs/SLOs, run incident response, build observability and IaC (Terraform), champion Kubernetes and GitOps, automate operational workflows, and mentor engineers to reduce toil and improve platform reliability and developer velocity.
Top Skills: AksAWSAzureBashCi/CdCniContainersDnsEksGCPGithub ActionsGitopsGkeGoGrafanaKubernetesLoad BalancingOpentelemetryPkiPod SecurityPrometheusPythonRbacServerlessService MeshSplunkTerraformTlsZero-Trust Networking
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account