Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs
Aerospace • Big Data • Greentech • Hardware • Social Impact
The Site Reliability Engineer will build, deploy, and operate computing services for satellite imaging, ensuring reliable and scalable infrastructure while collaborating with cross-functional teams.
Top Skills:
AlloyAnsibleBashCloud-Native InfrastructureGrafanaHelmK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Site Reliability Engineer at Replit, you'll enhance system reliability through observability, automation, incident management, and performance optimization, serving millions globally.
Top Skills:
AnsibleDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPulumiPythonTerraform
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
As a Staff Site Reliability Engineer at Replit, you will ensure infrastructure reliability, drive automation, lead incident management, and mentor the engineering team while enhancing system performance and observability.
Top Skills:
DatadogGoGoogle Cloud PlatformGrafanaKubernetesOpentelemetryPrometheusPythonTerraform
Healthtech • Database
Seeking a Principal Site Reliability Engineer to build a SRE practice, enhance reliability, mentor teams, and drive performance engineering to optimize Quest products and services.
Top Skills:
AnsibleAuroraAWSAzureBigtableCassandraCi/CdCloud Pub/SubCloud SpannerCloud SqlDockerDynamoDBDynatraceGitlabGoGCPJavaJmsKafkaKinesisKubernetesMqPerlPythonRdsRubyShell ScriptingTerraform
Financial Services
As a Principal Site Reliability Engineer, you'll lead a team focusing on observability and automating solutions for cloud and on-prem infrastructures, enhancing reliability and incident response across T. Rowe Price's tech ecosystem.
Top Skills:
.Net CoreAmazon AwsAnsibleElastic StackGoGrafanaJavaMySQLNew RelicNode.jsPostgresPrometheusPythonSolarwinds DpaSplunkSQL ServerTerraformVagrantVault
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
As a Site Reliability Engineer II, you will develop automation workflows and services, manage cloud operations, participate in incident response, and influence architectural patterns for improved efficiency.
Top Skills:
AWSAws CloudformationAzureC#Ci/CdGoJavaKubernetesPythonTemporalTerraform
Fintech • Information Technology • Payments
The Staff Site Reliability Engineer designs and builds cloud-native infrastructure on Azure for data services, ensuring reliability, security, and scalability.
Top Skills:
AutomationAzure Kubernetes ServiceConfiguration ManagementContainer OrchestrationInfrastructure As CodeAzure
Fintech • Information Technology • Payments
The Staff Site Reliability Engineer will lead CI/CD initiatives, automate infrastructure, mentor engineers, and enhance platform resilience through technical contributions and problem-solving.
Top Skills:
AWSCi/CdDockerGrafanaGrafana LokiHoneycombInfrastructure As CodeIstioKubernetesOpentelemetryPrometheusTerraform
Fintech • Payments • Financial Services
The Site Reliability Engineer will assist clients with Redline products, manage production environments, troubleshoot issues, and ensure automation and customer satisfaction.
Top Skills:
C/C++JavaLinuxPython
Aerospace • Security • Energy • Industrial
Support machine learning operations and cloud projects, maintain CI/CD infrastructure, manage API deployment, and collaborate with cross-functional teams.
Top Skills:
AzureAzure DevopsAzure MlDjangoDockerFastapiFlaskGCPGithub ActionsGitlab CiGrafanaJenkinsKubeflowKubernetesMlflowPrometheusPythonTerraformVertex Ai
Aerospace • Security • Energy • Industrial
The candidate will design and maintain cloud infrastructure, implement CI/CD pipelines, optimize system performance, and mentor junior engineers in SRE and DevOps practices.
Top Skills:
AnsibleAWSAzureBashCloudFormationDatadogDockerElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesPrometheusPythonTerraform
Fintech • Financial Services
The role involves shaping release engineering practices, implementing AI-driven solutions, and ensuring software reliability through collaboration and automation.
Top Skills:
Ai-Powered ToolsAzureBashC#Github CopilotJavaPowershell
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Artificial Intelligence • Cloud • Events • Productivity • Software • Business Intelligence • Conversational AI
The Site Reliability Engineer will maintain and improve large scale communication infrastructure, troubleshoot SIP and RTP flows, manage Kubernetes services, and enhance operational reliability through automation and observability improvements.
Top Skills:
AWSElk StackGCPGitopsGrafanaHelmKubernetesLinuxPrometheusPythonRtpShell ScriptingSipTcp/IpTcpdumpWireshark
Fintech • Analytics
The Site Reliability Engineer will support and automate critical Real Time applications, ensuring service availability and quality across cloud and on-premise deployments, while also collaborating with various teams on operational documentation and incident management.
Top Skills:
AWSAzureDatadogDockerGitKubernetesPythonUnix/Linux
Cloud
The Staff Site Reliability Engineer will lead the design of AWS solutions, manage incident responses, and mentor junior engineers, ensuring reliability and security in federal environments.
Top Skills:
AWSDatabricksGoHelmKubernetesRedshiftSnowflakeTerraform
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence
The Deployment Engineer will manage AI inference clusters, optimizing deployment, capacity allocation, and ensuring reliability of pipeline operations across datacenters.
Top Skills:
DockerGrafanaInfluxdbK8SLinuxPrometheusPython
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing observability infrastructure, and supporting SRE practices and cloud deployments.
Top Skills:
AWSAzureCloudFormationDockerGCPGoJavaKubernetesNode.jsOpentelemetryPulumiPythonRustTerraform
Artificial Intelligence • Information Technology • Software • Generative AI
The Site Reliability Engineer will ensure the reliability and performance of SaaS production systems, manage deployments and incident responses, and improve operational processes within a dynamic AI environment.
Top Skills:
AWSAzureBashDockerElkGCPGitGoGrafanaKubernetesPrometheusPulumiPythonTerraform
Information Technology
The role involves securing and maintaining the reliability of X Money's infrastructure, focusing on AWS, Kubernetes, and code security while implementing best practices and collaborative problem-solving.
Top Skills:
AWSDynamoDBKubernetesPythonRdsTerraform
Cloud • Information Technology • Biotech
The Site Reliability Engineer will build and deploy Linux servers, research technologies, monitor system performance, and resolve technical incidents.
Top Skills:
Infrastructure-As-CodeLinuxNetworkingVirtualization
Digital Media
In this role, you'll design and manage cloud infrastructure, focus on CI/CD, and support observability for scalable applications while collaborating with engineering teams.
Top Skills:
Amazon Web ServicesApache AirflowApache KafkaArgo CdChronosphereDbtDockerDynamoDBFastapiFlaskGithub ActionsGoogle Cloud PlatformHelmIstioKubernetesKustomizeNode.jsOpensearchOpentelemetryPagerdutyPandasPostgresPrometheusPythonQdrantRayReactRedisSnowflakeTerraformTerragrunt
Digital Media
As a Principal Site Reliability Engineer, you'll lead reliability practices, mentor engineers, and manage cloud infrastructure in a multi-cloud environment, focusing on continuous improvement and innovation.
Top Skills:
Amazon Web ServicesApache AirflowApache KafkaArgo CdAws AuroraChronosphereDbtDockerDynamoDBFastapiFlaskGcp Cloud SqlGithub ActionsGoogle Cloud PlatformHelmIstioKubernetesKustomizeNode.jsOpensearchOpentelemetryPagerdutyPandasPostgresPrometheusPythonQdrantRayReactRedisSnowflakeTerraformTerragrunt
Software
The Staff Systems Engineer is responsible for architecting and maintaining VMware-based infrastructure, automating operations, and collaborating with cross-functional teams to enhance system performance and reliability.
Top Skills:
Active DirectoryAnsibleAutomation FrameworksAviAzure DevopsF5 Big-IpGitJenkinsLinuxPowercliPythonTcp/IpTerraformVMwareWindows Server
Financial Services
Design, develop, and deploy robust platform solutions while ensuring reliability, scalability, and security of the system. Collaborate with teams to enhance tooling and automation.
Top Skills:
GCPKubernetesTerraform
Artificial Intelligence • Software • Generative AI
As a Principal SRE, you will lead reliability, scalability, and operational health of Gradial's platform, driving improvements and collaborating with engineering.
Top Skills:
Ci/CdInfrastructure As CodeKubernetesObservabilityPythonTypescript
Top Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results








%20(1).png)
.jpg)






.png)













