Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs
Computer Vision • Hardware • Machine Learning • Robotics • Software
The role involves maintaining cloud infrastructure, collaborating with engineering teams, troubleshooting issues, deploying solutions, and ensuring system reliability.
Top Skills:
AnsibleC++GrafanaHelmKubernetesPagerdutyPythonTerraformTypescript
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills:
AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Financial Services
The Site Reliability Engineer will enhance global infrastructure through coding, monitoring tools, and optimizing systems to ensure efficiency and resilience.
Top Skills:
Apache KafkaBigtableC/C++CassandraCi/CdClickhouseGoKubernetesLinuxPythonRabbitMQRust
Artificial Intelligence • Software
As a Senior Staff SRE Tech Lead, you'll oversee reliability and scalability, mentor engineers, optimize systems, and enhance data infrastructure.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Cloud • Software
The Site Reliability Engineer (SRE) will manage reliable, scalable systems, focusing on software development, infrastructure automation, and incident response. Responsibilities include monitoring, CI/CD pipeline management, security compliance, and cost optimization while collaborating with various teams.
Top Skills:
AWSAzureDockerElk StackGCPGitGrafanaJavaKubernetesPHPPrometheusPythonShellTerraform
Other
As a Platform Engineer/Dev Ops, you will expand cloud infrastructure, implement monitoring systems, manage databases, and leverage CI/CD tools, working collaboratively with various teams.
Top Skills:
AWSAzureBashDatadogElk StackKubernetesOpentofuPrometheusPythonTerraform
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills:
Amazon EksAWSCircleCIDockerGithub ActionsGitlabGoKubernetesLinuxLlmsMonitoring And Observability ToolsPulumiTerraform
Cloud • Information Technology • Security • Software
Lead and grow a global Cloud Support/SRE team to ensure SaaS and self-hosted infrastructure reliability. Own incident response for Severity 1 events, refine support workflows, track KPIs (CSAT, MTTR, first-response), and collaborate with Product, Engineering, and Solutions teams to drive product improvements and operational excellence.
Top Skills:
AWSAzureBashDnsGCPGoKubernetesLinuxLoad BalancingPythonSsl/TlsTcp/Ip
Blockchain • Fintech • Social Media • Cryptocurrency • NFT • Web3
Design, build, and operate scalable, highly available infrastructure and platform software for Zora's blockchain services (indexer, APIs, data pipelines). Automate workflows, maintain core systems, improve developer experience, participate in on-call rotation, and contribute strategic technical direction.
Top Skills:
AsyncioBaseBridgesCephCloudflare Pages FunctionsDatadogDockerEthereumGoIpfsKubernetesMongoDBOpentelemetryOptimismOptimistic RollupsPlasmaPolygonPostgresPythonRpc NodesSidechainsVercelZk-Rollups
Security • Software
The Site Reliability Engineer will enhance service reliability, automate tasks, manage production services, and collaborate with Dev teams and DevOps Engineers.
Top Skills:
AnsibleAWSAzureBashChefCloudFormationGCPLinuxPowershellPuppetPythonRubyTerraformUnixWindows
Blockchain • Financial Services • Cryptocurrency • Web3
As a Senior Site Reliability Engineer, you will manage the reliability and efficiency of Kraken's Data platform, working with multiple teams to ensure high performance and scalability. Responsibilities include designing data governance mechanisms, managing CI/CD pipelines, implementing monitoring solutions, and collaborating on various data projects.
Top Skills:
Apache AirflowSparkAWSDebeziumDockerKafkaKubernetesPythonTerraform
Blockchain • Financial Services • Cryptocurrency • Web3
As a SRE/DevOps Engineer at Kraken, you will build infrastructure, support tools, drive standardization, and guide engineers in an efficient remote environment.
Top Skills:
BashContinuous IntegrationDockerGitGrafanaLinuxPrometheusPythonRustTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Software • Cybersecurity
As an SRE Engineer II, you'll manage multi-cloud infrastructure on Azure, AWS, and GCP, enhance system reliability, and support engineering teams by automating processes and implementing best practices for cloud applications.
Top Skills:
Arm TemplatesAWSAws CodepipelineAzureAzure DevopsCloudFormationGCPGoJenkinsPowershellPythonTerraform
Insurance
As a Lead Site Reliability Engineer, you will ensure Azure cloud environments are resilient, secure and observable. You will implement CI/CD automation, coach teams on best practices, and contribute to architecture reviews.
Top Skills:
Application InsightsAzureAzure DevopsAzure MonitorBashDatadogDefender For CloudDockerEfkElkGithub ActionsGoGrafanaKubernetesLog AnalyticsPowershellPrometheusPythonTerraform
Fintech • Payments
Lead an SRE team to improve reliability, observability, and automation across Azure. Define SLOs/SLIs, manage error budgets and on-call, standardize IaC/GitOps, build self-healing systems, and collaborate on cost, governance, and operational strategy.
Top Skills:
AzureAzure MonitorAzure PoliciesBashBicepGithub ActionsGitopsGoGrafanaIacKqlLog AnalyticsPowershellPrometheusPythonTerraform
AdTech • Big Data • Digital Media • Software
The Senior Site Reliability Engineer role involves ensuring the reliability and performance of production systems, leading automation efforts, architecting scalable solutions, and mentoring other engineers. You'll collaborate with various teams and participate in on-call rotations to maintain system availability.
Top Skills:
AnsibleArgo CdAws EcrGitGithub ActionsHadoopJenkinsKafkaKubernetesNexusPuppetTerraformYum
Information Technology • Software • Automation
The Site Reliability Engineer ensures operational stability in a cloud environment, providing customer support and troubleshooting while collaborating in a fast-paced team.
Top Skills:
AccumuloAnsibleAWSBashDockerGrafanaHadoopHadoop Distributed File SystemJavaJIRAKubernetesLinuxOpenstackPrometheusPythonSaltVirtualization
Information Technology • Software • Automation
The Senior Site Reliability Engineer will manage AWS environments, develop Infrastructure as Code, and automate operational tasks to ensure high availability in cloud systems.
Top Skills:
Amazon Web Services (Aws)AnsibleAws Certified Developer-AssociateAws Certified Solutions Architect-AssociateAws Certified Solutions Architect-ProfessionalAws Certified Sysops Administrator-AssociateCertified Kubernetes Administrator (Ckad)Ci/CdDockerElastic Certified EngineerElastic Certified Observability EngineerKubernetesTerraform
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Artificial Intelligence • Software
The Senior Site Reliability Engineer will ensure the reliability and scalability of our Generative AI SaaS platform, implement automation, and support incident response efforts.
Top Skills:
AWSAzureBashCloudFormationDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
Artificial Intelligence • Software
The Site Reliability Engineer will manage infrastructure, drive enterprise deployments, and ensure the reliability of the Freeplay platform by working closely with customers and optimizing cloud architectures.
Top Skills:
AWSAzureDatadogElasticsearchGCPHelmKotsNats JetstreamPostgresReplicatedTerraform
Fintech • Financial Services
Design, automate, and maintain reliable, scalable systems; monitor and respond to incidents; perform capacity planning and performance tuning; build operational tooling; collaborate with development teams and lead/coach staff to improve resilience and operational practices.
AdTech • Marketing Tech • Analytics
As a Staff Software Engineer - SRE, you'll manage cloud infrastructure, improve application reliability, collaborate across teams, and support back-office systems.
Top Skills:
AWSDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRdsRedshiftShell/BashSparkTerraform
AdTech • Marketing Tech • Analytics
Manage and support customer applications, improve system reliability, collaborate with teams on infrastructure needs, and help drive architectural decisions.
Top Skills:
Auto ScalingAWSCdnsDatadogDnsDockerKafkaKibanaKubernetesLinuxLoad BalancersPostgresProxy ServersPythonRdsRedshiftShell/BashSparkTerraformWafs
AdTech • Marketing Tech • Analytics
The Staff SRE DevOps Engineer will manage customer applications, improve system reliability, collaborate on architecture discussions, and support infrastructure needs across teams.
Top Skills:
AWSBashDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRedshiftSparkTerraform
Top Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results




























