Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Aerospace • Information Technology • Professional Services • Security • Software
Design, build, and maintain highly available cloud and on‑prem systems. Automate operations, implement monitoring/alerting, tune performance, and drive incident response and root cause fixes. Collaborate on reliable architectures and CI/CD pipelines, champion SRE best practices (SLIs/SLOs, error budgets), and support proposal technical content.
Top Skills:
AWSAzureBashCi/Cd PipelinesContainer OrchestrationDatadogElkGrafanaKubernetesLinuxNetworkingPowershellPrometheusPythonSplunk
Hardware
Lead technical services engineer guiding and training engineers, designing IT architecture, troubleshooting network security and third-party control integrations, coordinating projects, providing customer training and field support, and managing personnel and resources.
Top Skills:
802.1XAmxCrestronExcelMicrosoft OutlookMicrosoft PowerpointMicrosoft WordRadiusSecurity Certificate Management
Aerospace • Other
Design, build, operate, scale, and optimize Kubernetes and RKE clusters and Linux infrastructure using automation (Ansible, Terraform). Collaborate with engineers to deploy resilient, high-performance systems, drive automation, define standards, upskill the team, and participate in on-call rotation.
Top Skills:
AnsibleArgocdAwx/TowerCephCgroupsCi/CdCiliumCloud-InitCniCriCri-OCsiDockerGitGitopsGoGrafanaHelmInfluxdbIptablesIstioJenkinsJinjaJsonnetKernel ModulesKubernetesLinuxMetallbPkiPrometheusPuppetPythonRedfishRkeRook-CephShellSubversionTerraformVagrantVMwareYaml
Fitness • Healthtech • Information Technology • Payments • Software
The Site Reliability Engineer will enhance system reliability, manage cloud infrastructure, automate processes, support CI/CD pipelines, and troubleshoot production issues.
Top Skills:
AnsibleAWSBashChefDockerGitGitlabJenkinsKubernetesMySQLPostgresPythonSQL ServerTerraformVMware
Healthtech • Payments • Software
The Senior SRE I will design and maintain automation for infrastructure provisioning, monitor system health, resolve production incidents, and mentor junior SREs, ensuring reliability and operational efficiency across cloud platforms.
Top Skills:
AnsibleAWSAzureBashCloudFormationDatadogDockerGCPGithub ActionsGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonRubySplunkTerraform
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills:
AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Aerospace • Cloud • Software • Defense • Automation
Design and automate cloud systems for U.S. Government, focusing on DevSecOps, reliability, deployment automation, and observability. Participate in on-call rotations, supporting production environments and improving system resilience.
Top Skills:
Aws EksDatadogGitlabGrafanaKubernetesLinux/UnixPythonTerraform
Artificial Intelligence • Insurance • Software • Automation
The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.
Top Skills:
AWSKubernetesPostgresTerraform
Healthtech • Professional Services • Software
The Sr Software Engineer leads complex software development, ensuring solution scalability, collaborating with teams, solving technical problems, and advocating for high-quality software solutions.
Top Skills:
AngularArgo CdAzure DevopsCi/CdGoogle Cloud PlatformKubernetesNew RelicOpentelemetryRuby On RailsTerraform
Healthtech • Database
Seeking a Principal Site Reliability Engineer to build a SRE practice, enhance reliability, mentor teams, and drive performance engineering to optimize Quest products and services.
Top Skills:
AnsibleAuroraAWSAzureBigtableCassandraCi/CdCloud Pub/SubCloud SpannerCloud SqlDockerDynamoDBDynatraceGitlabGoGCPJavaJmsKafkaKinesisKubernetesMqPerlPythonRdsRubyShell ScriptingTerraform
Gaming
The role involves ensuring production quality, owning system reliability, and participating in decision-making. Responsibilities include incident response and lifecycle management in cloud gaming technologies.
Top Skills:
BashC++ElasticsearchGoIstioJavaKafkaKong Api GatewayKubernetesKumaLinkerdMongoDBMySQLPostgresPythonRedisRust
Artificial Intelligence • Other • Sales • Software
The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.
Top Skills:
ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills:
Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
Hardware • Manufacturing
Operate and harden a multi-cloud microservices platform: deploy on Kubernetes, run load/chaos tests, build observability, automate with scripts, define SLO/SLA, ensure security/compliance, participate in incident response, disaster recovery, on-call rotation, and mentor junior team members.
Top Skills:
AWSAzureBashGCPGoHpaJavaJvmKubernetesMicroservicesOciPowershellPython
9 Days AgoSaved
Aerospace • Information Technology • Professional Services • Security • Software
Maintain and improve reliability, scalability, and performance of enterprise infrastructure across global sites. Implement automation and infrastructure-as-code, build monitoring and observability, perform RCA and incident response, support patching and RMF changes, integrate new capabilities, and maintain operational documentation and ITIL/ITSM processes to ensure mission-ready, high-availability environments.
Top Skills:
AnsibleElkNagiosPowershellPythonScomSolarwindsSplunkTerraform
9 Days AgoSaved
Fintech • Financial Services
Lead SRE technical strategy and architecture for highly available, scalable enterprise platforms. Build automation, observability, and incident response practices; mentor senior engineers; drive capacity planning, production reliability, and adoption of SRE best practices across cloud and on-prem environments.
Top Skills:
AnsibleAWSBigQueryChefCloudFormationDatadogDockerElasticsearchElk StackGCPGitlabGoGrafanaJavaJenkinsKafkaKubernetesLinuxMavenPagerdutyPrometheusPrompt EngineeringPuppetPythonRetrieval-Augmented Generation (Rag)Terraform
9 Days AgoSaved
Fintech • Financial Services
Site Reliability Engineer on the Compliance Engineering team responsible for ensuring production service health, capacity planning, monitoring, incident management, SLIs/SLOs, automation to reduce toil, and collaborating with engineers to improve scalability, reliability, and observability across distributed, cloud-native and big-data systems.
Top Skills:
Automated TestingAWSAzureDistributed TracingElkGCPGrafanaHadoopJavaLinuxLoggingMetricsObservabilityOpentelemetryPerlPrometheusPythonRelational Databases
Fintech • Financial Services
VP-level SRE/DevOps leader responsible for global strategy and delivery of CI/CD, IaC, cloud-native platforms, observability, reliability engineering (SRE), security/compliance, automation, incident management, and mentoring teams to enable migration to microservices and optimize costs and resilience.
Top Skills:
AksApp InsightsArmAWSAzureAzure DevopsBashCloudFormationDockerDynatraceEksElkGCPGithub ActionsGkeGrafanaJenkinsKubernetesNew RelicOraclePrometheusPythonSplunkSQLTerraform
Artificial Intelligence • Cloud • Fintech • Information Technology • Analytics • Financial Services • Cybersecurity
Lead adoption and standardization of SRE practices across the enterprise. Establish SRE governance, define reliability metrics (SLOs/SLIs), build a Community of Practice, run training/forums, enable automation and tooling, partner with platform teams on observability, chaos engineering, and self-healing, and drive cross-functional alignment for resilience and incident management.
Top Skills:
AutomationAzure MonitorChaos EngineeringCi/CdCloud-NativeDevOpsDynatraceHybrid ArchitecturesIncident ManagementObservabilityPlatform EngineeringPrometheusRelease EngineeringSelf-HealingSplunkSre
Cloud • Security
Build and operate the production platform (Kubernetes, AWS, IaC, CI/CD, observability), automate self-service deployment, embed security and secrets management, run and modernize on-call, drive cost efficiency, mentor teammates, and maintain runbooks and post-incident reviews.
Top Skills:
AWSBashCi/CdClaudeGitGrafanaKubernetesLinuxPrometheusPythonSaltTerraform
Agency • Information Technology
Lead SRE role designing and maintaining CI/CD pipelines (GitHub Actions), containerized deployments (Docker, Kubernetes, AKS, Helm), web/mobile app releases, observability, automated testing, and DevOps best practices across cloud environments with cross-functional collaboration and regulatory compliance.
Top Skills:
AksAndroidAzure Application InsightsAzure Log AnalyticsAzure MonitorBashBranchingDockerDocker ComposeGitGit HooksGithub ActionsGoogle PlayHelmHerokuiOSIos App StoreJavaKubernetesNpmPowershellPull RequestsPythonSonarqubeVeracodeVercel
Hardware • Software • Analytics
Owner of cross-platform observability and incident management for Vertiv Digital platforms. Design and operate monitoring, SLOs/SLIs, incident response, SLA governance, capacity planning, automation to reduce toil, CI/CD reliability, and enforce DevSecOps and operational governance across cloud and containerized environments.
Top Skills:
AnsibleAWSAzure DevopsAzure MonitorC#Ci/CdCompass AiCursorDastDatadogDockerFeature FlagsGithub ActionsGitlabGrafanaJavaJavaScriptJenkinsKubernetesPower AutomatePowershellPrometheusPythonRubySastSecrets ManagementSite ScopeSplunkTerraformUipathWorkatoWriter Ai
Hardware • Other • Energy
Maintain and monitor production systems for availability and performance; lead incident response and postmortems; implement observability, alerting, and automated remediation; optimize distributed systems (AKKA.NET) and PostgreSQL; build CI/CD pipelines and infrastructure-as-code.
Top Skills:
Akka.NetAWSAzureAzure DevopsAzure PipelinesBashC#DatadogDockerElkGCPGitGithub ActionsGitlabGitlab CiGrafanaKubernetesOpentelemetryPhobosPostgresPowershellPrometheusPythonTerraform
Aerospace
Responsible for the reliability, scalability, performance, and security of core systems, implementing infrastructure, maintaining cloud-native services, and developing automation solutions.
Top Skills:
AirflowAmazon EksArgocdAWSBashDockerElk StackGitlab CiGrafanaJenkinsKafkaPowershellPrometheusPythonSpark
Security • Software • Cybersecurity
Lead platform reliability and cloud modernization across multi-cloud (AWS/Azure/GCP). Define SLIs/SLOs, run incident response, build observability and IaC (Terraform), champion Kubernetes and GitOps, automate operational workflows, and mentor engineers to reduce toil and improve platform reliability and developer velocity.
Top Skills:
AksAWSAzureBashCi/CdCniContainersDnsEksGCPGithub ActionsGitopsGkeGoGrafanaKubernetesLoad BalancingOpentelemetryPkiPod SecurityPrometheusPythonRbacServerlessService MeshSplunkTerraformTlsZero-Trust Networking
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results


































