Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs
Insurance • Cybersecurity
The Site Reliability Engineer II will build and operate infrastructure, improve system reliability, and enhance developer tools while collaborating across teams using AWS, Terraform, and IaC principles.
Top Skills:
AWSEcsGithub ActionsGoKafkaKinesisKubernetesPythonTerraform
Fintech • Financial Services
Responsible for network deployments, automation, and system monitoring. Collaborates with teams to enhance network design and performance, ensuring scalability and security.
Top Skills:
AnsibleAristaBgpCiscoCloudFormationDatadogFortinetGitJSONJuniperLinuxMplsOspfPrometheusPythonStpTerraformUnixVxlanYaml
Automotive
The Staff Site Reliability Engineer will optimize cloud-native systems for vehicle telemetry using Kubernetes and AWS, ensuring reliability and operational excellence through advanced observability and automation.
Top Skills:
AirflowAWSCi/CdDatadogGrafanaGrpcJavaKafkaKinesisKubernetesPythonRestScalaTerraform
Fintech • Information Technology • Payments
The Staff Platform Engineer is responsible for maintaining and improving cloud-native platforms, managing operations, ensuring reliability, and implementing automation, particularly on Azure while also supporting AWS environments.
Top Skills:
AWSAzureKubernetesTerraform
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
The Senior Site Reliability Engineer will oversee the deployment and reliability of digital engineering tools, enhance performance, and mentor junior engineers.
Top Skills:
AnsibleFluent BitGrafanaLokiPostgresPrometheusPython
Computer Vision • Hardware • Machine Learning • Robotics • Software
The role involves maintaining cloud infrastructure, collaborating with engineering teams, troubleshooting issues, deploying solutions, and ensuring system reliability.
Top Skills:
AnsibleC++GrafanaHelmKubernetesPagerdutyPythonTerraformTypescript
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills:
AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Financial Services
The Site Reliability Engineer will enhance global infrastructure through coding, monitoring tools, and optimizing systems to ensure efficiency and resilience.
Top Skills:
Apache KafkaBigtableC/C++CassandraCi/CdClickhouseGoKubernetesLinuxPythonRabbitMQRust
Artificial Intelligence • Software
As a Senior Staff SRE Tech Lead, you'll oversee reliability and scalability, mentor engineers, optimize systems, and enhance data infrastructure.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Cloud • Software
The Site Reliability Engineer (SRE) will manage reliable, scalable systems, focusing on software development, infrastructure automation, and incident response. Responsibilities include monitoring, CI/CD pipeline management, security compliance, and cost optimization while collaborating with various teams.
Top Skills:
AWSAzureDockerElk StackGCPGitGrafanaJavaKubernetesPHPPrometheusPythonShellTerraform
Other
As a Platform Engineer/Dev Ops, you will expand cloud infrastructure, implement monitoring systems, manage databases, and leverage CI/CD tools, working collaboratively with various teams.
Top Skills:
AWSAzureBashDatadogElk StackKubernetesOpentofuPrometheusPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills:
Amazon EksAWSCircleCIDockerGithub ActionsGitlabGoKubernetesLinuxLlmsMonitoring And Observability ToolsPulumiTerraform
Cloud • Information Technology • Security • Software
Lead and grow a global Cloud Support/SRE team to ensure SaaS and self-hosted infrastructure reliability. Own incident response for Severity 1 events, refine support workflows, track KPIs (CSAT, MTTR, first-response), and collaborate with Product, Engineering, and Solutions teams to drive product improvements and operational excellence.
Top Skills:
AWSAzureBashDnsGCPGoKubernetesLinuxLoad BalancingPythonSsl/TlsTcp/Ip
Blockchain • Fintech • Social Media • Cryptocurrency • NFT • Web3
Design, build, and operate scalable, highly available infrastructure and platform software for Zora's blockchain services (indexer, APIs, data pipelines). Automate workflows, maintain core systems, improve developer experience, participate in on-call rotation, and contribute strategic technical direction.
Top Skills:
AsyncioBaseBridgesCephCloudflare Pages FunctionsDatadogDockerEthereumGoIpfsKubernetesMongoDBOpentelemetryOptimismOptimistic RollupsPlasmaPolygonPostgresPythonRpc NodesSidechainsVercelZk-Rollups
Security • Software
The Site Reliability Engineer will enhance service reliability, automate tasks, manage production services, and collaborate with Dev teams and DevOps Engineers.
Top Skills:
AnsibleAWSAzureBashChefCloudFormationGCPLinuxPowershellPuppetPythonRubyTerraformUnixWindows
Blockchain • Financial Services • Cryptocurrency • Web3
As a Senior Site Reliability Engineer, you will manage the reliability and efficiency of Kraken's Data platform, working with multiple teams to ensure high performance and scalability. Responsibilities include designing data governance mechanisms, managing CI/CD pipelines, implementing monitoring solutions, and collaborating on various data projects.
Top Skills:
Apache AirflowSparkAWSDebeziumDockerKafkaKubernetesPythonTerraform
Blockchain • Financial Services • Cryptocurrency • Web3
As a SRE/DevOps Engineer at Kraken, you will build infrastructure, support tools, drive standardization, and guide engineers in an efficient remote environment.
Top Skills:
BashContinuous IntegrationDockerGitGrafanaLinuxPrometheusPythonRustTerraform
Software • Cybersecurity
As an SRE Engineer II, you'll manage multi-cloud infrastructure on Azure, AWS, and GCP, enhance system reliability, and support engineering teams by automating processes and implementing best practices for cloud applications.
Top Skills:
Arm TemplatesAWSAws CodepipelineAzureAzure DevopsCloudFormationGCPGoJenkinsPowershellPythonTerraform
Insurance
As a Lead Site Reliability Engineer, you will ensure Azure cloud environments are resilient, secure and observable. You will implement CI/CD automation, coach teams on best practices, and contribute to architecture reviews.
Top Skills:
Application InsightsAzureAzure DevopsAzure MonitorBashDatadogDefender For CloudDockerEfkElkGithub ActionsGoGrafanaKubernetesLog AnalyticsPowershellPrometheusPythonTerraform
Fintech • Payments
Lead an SRE team to improve reliability, observability, and automation across Azure. Define SLOs/SLIs, manage error budgets and on-call, standardize IaC/GitOps, build self-healing systems, and collaborate on cost, governance, and operational strategy.
Top Skills:
AzureAzure MonitorAzure PoliciesBashBicepGithub ActionsGitopsGoGrafanaIacKqlLog AnalyticsPowershellPrometheusPythonTerraform
AdTech • Big Data • Digital Media • Software
The Senior Site Reliability Engineer role involves ensuring the reliability and performance of production systems, leading automation efforts, architecting scalable solutions, and mentoring other engineers. You'll collaborate with various teams and participate in on-call rotations to maintain system availability.
Top Skills:
AnsibleArgo CdAws EcrGitGithub ActionsHadoopJenkinsKafkaKubernetesNexusPuppetTerraformYum
Information Technology • Software • Automation
The Site Reliability Engineer ensures operational stability in a cloud environment, providing customer support and troubleshooting while collaborating in a fast-paced team.
Top Skills:
AccumuloAnsibleAWSBashDockerGrafanaHadoopHadoop Distributed File SystemJavaJIRAKubernetesLinuxOpenstackPrometheusPythonSaltVirtualization
Information Technology • Software • Automation
The Senior Site Reliability Engineer will manage AWS environments, develop Infrastructure as Code, and automate operational tasks to ensure high availability in cloud systems.
Top Skills:
Amazon Web Services (Aws)AnsibleAws Certified Developer-AssociateAws Certified Solutions Architect-AssociateAws Certified Solutions Architect-ProfessionalAws Certified Sysops Administrator-AssociateCertified Kubernetes Administrator (Ckad)Ci/CdDockerElastic Certified EngineerElastic Certified Observability EngineerKubernetesTerraform
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Top Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results































