Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs
Hardware • Information Technology • Other • Software • Analytics
Responsible for developing and maintaining FedRAMP-compliant SaaS and PaaS systems in AWS GovCloud, ensuring reliability, automation, and security of cloud infrastructure.
Top Skills:
AnsibleAWSAzureBashEcsEksGrafanaKubernetesPerlPowershellPrtgPythonSumo LogicTerraform
Fitness • Healthtech • Retail • Pharmaceutical
Lead the reliability and scalability of integration platforms, manage operations teams, define SLOs/SLIs, and improve automation and system resilience.
Top Skills:
AceApicApigeeApimDatapowerJwtKongKubernetesMqOauth 2.0Splunk
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Analytics
The Site Reliability Engineer will ensure the reliability and performance of IaaS services, perform incident resolution, and enhance system reliability through automation while supporting mobility across hybrid infrastructures and collaborating extensively with various teams.
Top Skills:
AnsibleAWSAzureBashGitlab CiJenkinsKubernetesLinuxOpenshiftPythonTerraformVmware Vsphere
Cloud • Information Technology
The Site Reliability Engineer will support IaaS services, monitor infrastructure health, perform root cause analysis, automate processes, and collaborate with teams for service reliability.
Top Skills:
AnsibleAWSAzureBashGitlab CiJenkinsKubernetesLinuxOpenshiftPythonTerraformVmware Vsphere
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Senior Site Reliability Engineer will ensure the performance, availability, and resilience of GM Motorsports' data platforms, focusing on high-throughput telemetry and analytics. Responsibilities include designing reliability practices, managing data pipelines, building observability frameworks, and driving infrastructure automation while mentoring team engineers.
Top Skills:
DatabricksDatadogDevOpsFlinkGrafanaKafkaKubernetesLinuxOpentelemetryPlatform EngineeringPrometheusSite Reliability EngineeringSparkTerraform
Consulting
As a Site Reliability Engineer, you'll enhance system performance and reliability through automation, monitor service levels, manage incidents, and improve application stability while collaborating with agile teams.
Top Skills:
.Net CoreApi GatewayAppdynamicsAWSC#DatadogDockerDynatraceEc2EksHibernateJ2EeJavaScriptJdbcJenkinsJqueryKubernetesLambdaNew RelicNode.jsReactSplunkSpringTomcat
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills:
AWSAzureC++GCPGoKubernetesOci
Information Technology • Software
As a DevOps Engineer, you'll design and scale secure systems, manage AWS environments, automate operations, and ensure operational excellence for revenue teams.
Top Skills:
Amazon AuroraAWSDockerDynamoDBGithub ActionsKafkaS3SnowflakeSparkSqsTerraform
Software • Cybersecurity
As an SRE Engineer II, manage multi-cloud infrastructure, enhance reliability, design services, implement IaC, develop CI/CD pipelines, and automate tasks with a focus on security and scalability.
Top Skills:
Arm TemplatesAWSAws CodepipelineAzureAzure DevopsCloudFormationGCPGoJenkinsPowershellPythonTerraform
Information Technology • Consulting
The Lead Site Reliability Engineer will create an IT support automation strategy to reduce ticket volume and automate workflows, focusing on eliminating recurring issues and driving measurable outcomes across IT support systems.
Top Skills:
AWSAzureGCPGoKubernetesOauth2PulumiPythonRest ApisSAMLTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Software
As a Site Reliability Engineer, you will manage the reliability and scalability of platform infrastructure, build observability tools, and automate processes to enhance operational excellence.
Top Skills:
AWSGCPGoKubernetesPulumiPythonTerraform
Artificial Intelligence • Machine Learning • Security • Database • Analytics • Big Data Analytics
As a Site Reliability Engineer, you'll ensure the availability and performance of AI applications, maintain infrastructure, automate tasks, and troubleshoot issues in high-scale environments.
Top Skills:
AnsibleAWSAzureBashCircleCICloudFormationDatadogDockerDynatraceEc2Elk StackGCPGitlab CiGoGrafanaJenkinsKubernetesLambdaLinuxPrometheusPythonS3TerraformUnix
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills:
AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Artificial Intelligence • Information Technology • Software
The role involves developing and maintaining tools for data center networks, ensuring performance and security while collaborating with teams on network management.
Top Skills:
Azure DevopsGitlabGoGrafanaJenkinsLinuxNagiosPrometheusPythonSolarwindsZabbix
Cloud • Information Technology • Security • Software
The Site Reliability Engineer III role involves developing services, automating infrastructure, managing CI/CD environments, and improving system reliability for applications and technology stacks.
Top Skills:
AnsibleAzureChefGitGitlabGoJavaScriptKubernetesMs Sql ServerPostgresPuppetPythonTerraform
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Software
The Senior SRE Manager will establish an SRE team, implement best practices, manage incidents, and enhance system reliability, scaling operations effectively.
Top Skills:
Cloud InfrastructureDistributed SystemsObservability
Information Technology • Security • Software
As a Site Reliability Engineer, you'll build and maintain cloud infrastructure, support Kubernetes workloads, and enhance CI/CD pipelines within a collaborative team environment.
Top Skills:
AnsibleAWSGCPGoKubernetesLinuxPythonTerraform
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills:
AWSGoKubernetesPuppetPythonTerraform
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 8 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Site Reliability Engineer will build and scale identity management tools, automate operations, ensure security, and support AWS, GCP, and Azure environments.
Top Skills:
AnsibleAWSAzureC#Cloud Identity ProvidersDockerGCPGoInfrastructure As CodeJavaKubernetesPythonRubyTerraform
Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
The role involves diagnosing infrastructure issues, participating in on-call rotations, improving application availability, and enhancing automation in cloud environments.
Top Skills:
AnsibleAWSC++CloudFormationDatadogGoHelmJavaKotlinKubernetesNew RelicPostgresSplunkTerraform
Information Technology • Other • Software • Consulting
The Site Reliability Engineer at CardioOne will enhance the reliability and performance of production systems, implement automation, and lead incident response efforts while collaborating with development teams.
Top Skills:
AnsibleAWSAzureDatadogDockerEcsJavaKubernetesPythonTerraformTerragrunt
Software
Design, implement, and maintain scalable backend systems and APIs; build cloud infrastructure (preferably GCP) using Terraform; operate containerized workloads with Kubernetes; ensure reliability, security, and performance; participate in on-call rotations, architecture discussions, and cross-functional delivery.
Top Skills:
Ci/CdCloud AutomationContainer OrchestrationGoGoogle Cloud PlatformIamInfrastructure As CodeKubernetesMicroservicesPythonService-Oriented ArchitectureTerraform
Top Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results




















.png)













