Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Aerospace • Defense
Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.
Top Skills:
C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)
Software
Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.
Top Skills:
Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk
Cloud • Fintech • HR Tech
The Senior Site Reliability Engineer will ensure platform health, automate operations, maintain security, and support development teams, optimizing CI/CD processes and collaborating across time zones.
Top Skills:
Amazon Web ServicesArgo CdC#GoKubernetesPythonRubyRustTerraform
Real Estate • Software
As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.
Top Skills:
AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt
Cloud • Information Technology • Internet of Things • Professional Services • Software
Operate and scale ThousandEyes Federal region infrastructure in a FedRAMP-compliant AWS environment. Design, deploy, and automate cloud-native services, implement IaC, monitor and audit systems, collaborate with security teams to remediate vulnerabilities, participate in 24x7 incident response and capacity planning, and ensure platform reliability, performance, and compliance.
Top Skills:
AWSFedrampGoKubernetesLinuxPuppetPythonTerraformUnixUs Govcloud
Information Technology • Consulting
As a Senior Site Reliability Engineer, you'll design and maintain critical applications, develop CI/CD pipelines, and ensure high availability while leading incident response and providing innovative solutions to meet customer needs.
Top Skills:
AnsibleBashDesired State ConfigurationGitlab Ci/CdKubernetesVMware
Fintech • Financial Services
The Senior Site Reliability Engineer will enhance system reliability, automate operations, ensure compliance, and collaborate with engineering teams to improve production systems at AssetMark.
Top Skills:
Alerting ToolsAWSAzureC#Ci/CdDockerGCPInfrastructure-As-CodeJavaKubernetesLogging ToolsMonitoring ToolsPythonTracing Tools
Aerospace • Other
The Sr. Site Reliability Engineer at SpaceX is responsible for enhancing distributed systems, managing large data clusters, and ensuring software reliability on the Starlink project, focusing on customer experience and operational efficiency.
Top Skills:
Apache KafkaC#FlinkGoHbaseHdfsIstioJavaKubernetesLinuxPythonScalaSpark
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
The Senior Site Reliability Engineer ensures the reliability and performance of cloud-native Kubernetes platforms by building tools, facilitating self-service for engineers, and promoting best practices.
Top Skills:
ArgocdAWSAzureC#Ci/CdGitGoJavaKubernetesPulumiPythonTerraform
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
Design and build cloud infrastructure, automate platforms, mentor engineers, and enhance reliability and performance for Axon's products.
Top Skills:
ApmAWSAzureCi/CdCloudFormationGoKubernetesPythonTerraform
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
As a Senior Site Reliability Engineer, you will design cloud infrastructure, develop automation tools, write production code, and mentor engineers while managing multi-cloud environments and improving reliability.
Top Skills:
ApmAWSAzureCdkCi/CdCloudFormationGoKubernetesPythonTerraform
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
As a Senior Site Reliability Engineer, you'll design cloud infrastructure, lead automation initiatives, and enhance operational efficiency while mentoring others and handling incident responses.
Top Skills:
AWSAzureCi/CdCloudFormationGoKubernetesPythonTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Database • Analytics
This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.
Top Skills:
AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform
Artificial Intelligence • Information Technology • Consulting
Own and operate production infrastructure: manage Kubernetes across regions, maintain IaC and GitOps CI/CD workflows, optimize real-time data pipelines, build observability and alerting, debug incidents, and lead cloud cost and capacity planning for a small engineering team.
Top Skills:
Alerting)Ci/CdGitopsKubernetesMetricsObservability (LoggingTerraform
Software
Lead architecture, design, and evolution of a global multi-region cloud SRE platform for GPU/AI compute. Author and maintain platform architecture, enforce design invariants, review framework changes, run plugin framework, decide tier placements, coordinate with cloud teams and security, produce pre-flight designs, and shepherd implementations through engineering squads.
Top Skills:
BmcDcgmDdnGitopsGpu OperatorInfinibandIpmiKuberayKubernetesKueueLustreMigNcclNetappNvlinkNvme-OfNvswitchPureRayRedfishRoceSlurmSubnet ManagerVastVgpuVolcanoXidZtp
Software
Lead design and implement a global public cloud SRE platform for AI and compute workloads. Own architecture and production engineering for observability, cluster health, remediation, lifecycle, secrets, CI/CD, backup/DR, and automation. Collaborate with cross-functional teams to build scalable, reliable multi-region services and run them in production (on-call).
Top Skills:
ArgoAws KmsBmcCosignCrdtDatadogDcgmDdnElasticsearchFluxGcp KmsGoHashicorp VaultHelmInfinibandIpmiJaegerJavaKuberayKubernetesKubernetes Operator (Crd/Controller)KueueKustomizeLokiLustreMimirMtlsNcclNetappNvme-OfOpentelemetryPaxosPrometheusPrometheus QueryPurePythonRaftRayRedfishRoceRustSlurmSQLTempoThanosVastVictoriametricsVolcano
Information Technology • Security • Cybersecurity
Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.
Top Skills:
Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform
Fintech • Payments • Software • Financial Services
Lead Site Reliability Engineer responsible for ensuring platform scalability and uptime on AWS. Own CI/CD and GitHub repository practices, run deployment pipelines, manage incidents and post-mortems, implement observability and logging, and coordinate technical alignment across US and international teams with bilingual communication.
Top Skills:
AlertingAWSCi/CdDeployment PipelinesGitGitGithub ActionsLog ManagementMonitoring ToolsObservabilityScripting
Fintech • Payments • Software • Financial Services
Senior SRE responsible for ensuring platform scalability, reliability, and runtime efficiency on AWS. Own CI/CD and GitHub repo workflows, lead incident response and post-mortems, implement observability/monitoring and logging, and collaborate cross-border using bilingual Mandarin and English.
Top Skills:
AlertingAWSCi/CdDeployment PipelinesGitGithub ActionsLoggingMonitoringObservabilityScripting
eCommerce • Other • Retail
As a Senior Site Reliability Engineer, you will build and support platforms for reliable digital experiences, improve system reliability, and guide technical decisions within the team.
Top Skills:
AWSAzureBashDockerFastlyGCPGitGithub ActionsGoKubernetesNext.JsNode.jsReact
Artificial Intelligence • Machine Learning • Security • Software
The Senior Staff Site Reliability Engineer will be responsible for ensuring system reliability, debugging issues, mentoring the engineering team, and maintaining infrastructure and CI/CD pipelines.
Top Skills:
AWSDatadogDockerGithub ActionsGrafanaHelmKotlinKubernetesPostgresPrometheusPythonRustTerraformTerragruntTypescript
Big Data • Analytics
Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.
Top Skills:
.NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform
Artificial Intelligence
Own operational excellence for cloud infrastructure: run incident management, improve reliability through automation, own a platform domain (e.g., Kubernetes, Temporal, observability), manage vendor and cost relationships, and deliver measurable reductions in incidents and costs within 12 months.
Top Skills:
AWSKubernetesLlm ApisMongoDBObservabilityPythonTemporal
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
Lead reliability for Autodesk GovCloud services by deploying, operating, and automating production systems. Define SLOs/SLIs, build observability and automation, run incident response and on-call rotation, ensure compliance (FedRAMP), perform resilience testing and toil reduction, and collaborate across engineering, security, and platform teams to improve service reliability and operability.
Top Skills:
APIsAWSAws GovcloudAzureBashCaching TechnologiesCi/CdCloudwatchContainersDatabasesDatadogDnsDynatraceFedrampGoIl4Il5Infrastructure As CodeJavaKubernetesLoad BalancingMessaging SystemsNetworkingPowershellPythonSplunkStorage Platforms
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
Lead reliability for production services in Autodesk GovCloud: deploy, operate, and automate cloud services; define SLOs/SLIs and observability; drive incident response, resilience testing, and toil reduction; ensure compliance (FedRAMP) and participate in 24x7 on-call rotation.
Top Skills:
APIsAWSAws GovcloudAzureBashCi/CdCloudwatchContainersDatadogDnsDynatraceGoInfrastructure As CodeJavaKubernetesLoad BalancingNetworkingPowershellPythonSplunk
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results












.png)

.png)










