Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs
3D Printing • Artificial Intelligence • Software • Design
The role involves building reliable platforms for 3D/4D content delivery to AR/VR devices, monitoring system health, and improving operational practices in collaboration with the engineering team.
Top Skills:
Aws FargateCoreweaveGrafanaKubernetesPrometheusTerraform
Cloud
The Staff Site Reliability Engineer will manage large-scale cloud production systems, ensuring reliability and performance, while automating processes and responding to incidents.
Top Skills:
AWSBashCloudFormationDockerGoHelmKubernetesPythonRubyTerraform
Financial Services
As a Staff SRE Engineer, you'll lead the Data Infra team in improving reliability, architecture, and automation for the Data Platform while mentoring engineers.
Top Skills:
AWSClojureDatomicEc2KubernetesLambdasScalaSparkStep Functions
Artificial Intelligence • Cloud • Information Technology • Security • Software
The Site Reliability Engineer ensures system reliability and performance, manages incidents, implements automation, and collaborates with teams for software delivery and operational readiness.
Top Skills:
AutomationCloud ServicesInfrastructure-As-CodeObservability Tools
Financial Services
The Staff Engineer will support and optimize messaging platforms, design solutions to improve operational efficiency, and collaborate with teams on business-focused solutions.
Top Skills:
AmpsAWSEksFixJavaKafkaKubernetesLinuxMqSpringSQL
Artificial Intelligence • Fintech • Payments • Social Impact • Analytics • Financial Services • Automation
As a Senior SRE, you'll ensure reliable and scalable systems, develop observability solutions and infrastructure as code, and lead incident response efforts.
Top Skills:
AWSCloudFormationDatadogElkPrometheusTerraform
Fintech
The Principal Site Reliability Engineer at Fidelity will enhance system reliability, manage large-scale infrastructures, and automate processes using various technologies.
Top Skills:
AnsibleAWSCi/CdDatadogGrafanaJenkinsPythonTerraformYugabyte
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Information Technology • Mobile • Software
As a Site Reliability Engineer, you'll ensure system reliability and scalability, automate processes, optimize performance, and collaborate on system design.
Top Skills:
AWSAzureBashCloudFormationDatadogDockerElkGoGoogle Cloud PlatformGrafanaHelmKubernetesNew RelicPrometheusPulumiPythonTerraform
AdTech • Big Data • eCommerce • Marketing Tech • Real Estate • Software
The Site Reliability Engineer will manage AWS infrastructure, optimize Kubernetes environments, build CI/CD pipelines, and enhance system security and performance.
Top Skills:
AnsibleAWSBashCloudflareCloudwatchDockerGitlabGoGrafanaKubernetesPrometheusPythonTerraform
Insurance • Cybersecurity
The Site Reliability Engineer II will build and operate infrastructure, improve system reliability, and enhance developer tools while collaborating across teams using AWS, Terraform, and IaC principles.
Top Skills:
AWSEcsGithub ActionsGoKafkaKinesisKubernetesPythonTerraform
Fintech • Financial Services
Responsible for network deployments, automation, and system monitoring. Collaborates with teams to enhance network design and performance, ensuring scalability and security.
Top Skills:
AnsibleAristaBgpCiscoCloudFormationDatadogFortinetGitJSONJuniperLinuxMplsOspfPrometheusPythonStpTerraformUnixVxlanYaml
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Automotive
The Staff Site Reliability Engineer will optimize cloud-native systems for vehicle telemetry using Kubernetes and AWS, ensuring reliability and operational excellence through advanced observability and automation.
Top Skills:
AirflowAWSCi/CdDatadogGrafanaGrpcJavaKafkaKinesisKubernetesPythonRestScalaTerraform
Fintech • Information Technology • Payments
The Staff Platform Engineer is responsible for maintaining and improving cloud-native platforms, managing operations, ensuring reliability, and implementing automation, particularly on Azure while also supporting AWS environments.
Top Skills:
AWSAzureKubernetesTerraform
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
The Senior Site Reliability Engineer will oversee the deployment and reliability of digital engineering tools, enhance performance, and mentor junior engineers.
Top Skills:
AnsibleFluent BitGrafanaLokiPostgresPrometheusPython
Computer Vision • Hardware • Machine Learning • Robotics • Software
The role involves maintaining cloud infrastructure, collaborating with engineering teams, troubleshooting issues, deploying solutions, and ensuring system reliability.
Top Skills:
AnsibleC++GrafanaHelmKubernetesPagerdutyPythonTerraformTypescript
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills:
AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Financial Services
The Site Reliability Engineer will enhance global infrastructure through coding, monitoring tools, and optimizing systems to ensure efficiency and resilience.
Top Skills:
Apache KafkaBigtableC/C++CassandraCi/CdClickhouseGoKubernetesLinuxPythonRabbitMQRust
Artificial Intelligence • Software
As a Senior Staff SRE Tech Lead, you'll oversee reliability and scalability, mentor engineers, optimize systems, and enhance data infrastructure.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Cloud • Software
The Site Reliability Engineer (SRE) will manage reliable, scalable systems, focusing on software development, infrastructure automation, and incident response. Responsibilities include monitoring, CI/CD pipeline management, security compliance, and cost optimization while collaborating with various teams.
Top Skills:
AWSAzureDockerElk StackGCPGitGrafanaJavaKubernetesPHPPrometheusPythonShellTerraform
Other
As a Platform Engineer/Dev Ops, you will expand cloud infrastructure, implement monitoring systems, manage databases, and leverage CI/CD tools, working collaboratively with various teams.
Top Skills:
AWSAzureBashDatadogElk StackKubernetesOpentofuPrometheusPythonTerraform
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills:
Amazon EksAWSCircleCIDockerGithub ActionsGitlabGoKubernetesLinuxLlmsMonitoring And Observability ToolsPulumiTerraform
Cloud • Information Technology • Security • Software
Lead and grow a global Cloud Support/SRE team to ensure SaaS and self-hosted infrastructure reliability. Own incident response for Severity 1 events, refine support workflows, track KPIs (CSAT, MTTR, first-response), and collaborate with Product, Engineering, and Solutions teams to drive product improvements and operational excellence.
Top Skills:
AWSAzureBashDnsGCPGoKubernetesLinuxLoad BalancingPythonSsl/TlsTcp/Ip
Gaming • Mobile
The Site Reliability Engineer (SRE) will enhance production system stability and performance, collaborate with DevOps, manage on-call responsibilities, and improve observability. Responsibilities include monitoring, reliability engineering, incident management, and documentation.
Top Skills:
ArgocdAWSBashEc2EksGithub ActionsGitlab Ci/CdGraylogHashicorp VaultHelmIamKubernetesNew RelicPythonRoute53S3Terraform
Popular Job Searches
All Filters
Total selected ()
No Results
No Results









.png)























