Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Loft Orbital

Senior Site Reliability Engineer

8 Days AgoSaved

Remote or Hybrid

180K-240K Annually

Senior level

180K-240K Annually

Senior level

Aerospace • Defense

Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.

Top Skills: C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)

Arkestro

Senior Site Reliability Engineer

8 Days AgoSaved

Remote

United States

160K-180K Annually

Senior level

160K-180K Annually

Senior level

Software

Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.

Top Skills: Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk

Workday

Senior Site Reliability Engineer

Reposted 8 Days AgoSaved

In-Office

Reston, VA, USA

133K-238K Annually

Senior level

133K-238K Annually

Senior level

Cloud • Fintech • HR Tech

The Senior Site Reliability Engineer will ensure platform health, automate operations, maintain security, and support development teams, optimizing CI/CD processes and collaborating across time zones.

Top Skills: Amazon Web ServicesArgo CdC#GoKubernetesPythonRubyRustTerraform

OfficeSpace Software

Senior Site Reliability Engineer

Reposted 8 Days AgoSaved

Remote

United States

Senior level

Real Estate • Software

As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.

Top Skills: AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt

Cisco

Senior Site Reliability Engineer (FedRAMP) - ThousandEyes

9 Days AgoSaved

In-Office

3 Locations

147K-278K Annually

Senior level

147K-278K Annually

Senior level

Cloud • Information Technology • Internet of Things • Professional Services • Software

Operate and scale ThousandEyes Federal region infrastructure in a FedRAMP-compliant AWS environment. Design, deploy, and automate cloud-native services, implement IaC, monitor and audit systems, collaborate with security teams to remediate vulnerabilities, participate in 24x7 incident response and capacity planning, and ensure platform reliability, performance, and compliance.

Top Skills: AWSFedrampGoKubernetesLinuxPuppetPythonTerraformUnixUs Govcloud

MetroStar

Sr. Site Reliability Engineer III (6448)

Reposted 9 Days AgoSaved

In-Office

Washington, DC, USA

185K-230K Annually

Senior level

185K-230K Annually

Senior level

Information Technology • Consulting

As a Senior Site Reliability Engineer, you'll design and maintain critical applications, develop CI/CD pipelines, and ensure high availability while leading incident response and providing innovative solutions to meet customer needs.

Top Skills: AnsibleBashDesired State ConfigurationGitlab Ci/CdKubernetesVMware

AssetMark

Senior Site Reliability Engineer

Reposted 9 Days AgoSaved

In-Office

Charlotte, NC, USA

160K-180K Annually

Senior level

160K-180K Annually

Senior level

Fintech • Financial Services

The Senior Site Reliability Engineer will enhance system reliability, automate operations, ensure compliance, and collaborate with engineering teams to improve production systems at AssetMark.

Top Skills: Alerting ToolsAWSAzureC#Ci/CdDockerGCPInfrastructure-As-CodeJavaKubernetesLogging ToolsMonitoring ToolsPythonTracing Tools

SpaceX

Sr. Site Reliability Engineer (Starlink)

Reposted 9 Days AgoSaved

In-Office

Hawthorne, CA, USA

160K-220K Annually

Senior level

160K-220K Annually

Senior level

Aerospace • Other

The Sr. Site Reliability Engineer at SpaceX is responsible for enhancing distributed systems, managing large data clusters, and ensuring software reliability on the Starlink project, focusing on customer experience and operational efficiency.

Top Skills: Apache KafkaC#FlinkGoHbaseHdfsIstioJavaKubernetesLinuxPythonScalaSpark

Axon

Senior Site Reliability Engineer I

Reposted 9 Days AgoSaved

In-Office

Boston, MA, USA

134K-215K Annually

Senior level

134K-215K Annually

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

The Senior Site Reliability Engineer ensures the reliability and performance of cloud-native Kubernetes platforms by building tools, facilitating self-service for engineers, and promoting best practices.

Top Skills: ArgocdAWSAzureC#Ci/CdGitGoJavaKubernetesPulumiPythonTerraform

Axon

Senior Site Reliability Engineer I

Reposted 9 Days AgoSaved

In-Office

Atlanta, GA, USA

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

Design and build cloud infrastructure, automate platforms, mentor engineers, and enhance reliability and performance for Axon's products.

Top Skills: ApmAWSAzureCi/CdCloudFormationGoKubernetesPythonTerraform

Axon

Senior Site Reliability Engineer I

Reposted 9 Days AgoSaved

In-Office

Boston, MA, USA

150K-180K Annually

Senior level

150K-180K Annually

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

As a Senior Site Reliability Engineer, you will design cloud infrastructure, develop automation tools, write production code, and mentor engineers while managing multi-cloud environments and improving reliability.

Top Skills: ApmAWSAzureCdkCi/CdCloudFormationGoKubernetesPythonTerraform

Axon

Senior Site Reliability Engineer I

Reposted 9 Days AgoSaved

In-Office

Seattle, WA, USA

150K-180K Annually

Senior level

150K-180K Annually

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

As a Senior Site Reliability Engineer, you'll design cloud infrastructure, lead automation initiatives, and enhance operational efficiency while mentoring others and handling incident responses.

Top Skills: AWSAzureCi/CdCloudFormationGoKubernetesPythonTerraform

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

ClickHouse

Senior Site Reliability Engineer- Remote

Reposted 9 Days AgoSaved

Remote

United States

141K-208K Annually

Senior level

141K-208K Annually

Senior level

Database • Analytics

This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.

Top Skills: AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform

Nebius

Senior Site Reliability Engineer (In-Office Required)

10 Days AgoSaved

In-Office

New York City, NY, USA

156K-262K Annually

Senior level

156K-262K Annually

Senior level

Artificial Intelligence • Information Technology • Consulting

Own and operate production infrastructure: manage Kubernetes across regions, maintain IaC and GitOps CI/CD workflows, optimize real-time data pipelines, build observability and alerting, debug incidents, and lead cloud cost and capacity planning for a small engineering team.

Top Skills: Alerting)Ci/CdGitopsKubernetesMetricsObservability (LoggingTerraform

Bitdeer Group

Sr. SRE Platform Architect

10 Days AgoSaved

In-Office

San Jose, CA, USA

Senior level

Software

Lead architecture, design, and evolution of a global multi-region cloud SRE platform for GPU/AI compute. Author and maintain platform architecture, enforce design invariants, review framework changes, run plugin framework, decide tier placements, coordinate with cloud teams and security, produce pre-flight designs, and shepherd implementations through engineering squads.

Top Skills: BmcDcgmDdnGitopsGpu OperatorInfinibandIpmiKuberayKubernetesKueueLustreMigNcclNetappNvlinkNvme-OfNvswitchPureRayRedfishRoceSlurmSubnet ManagerVastVgpuVolcanoXidZtp

Bitdeer Group

Sr. SRE Platform Software Engineer

10 Days AgoSaved

In-Office

San Jose, CA, USA

Senior level

Software

Lead design and implement a global public cloud SRE platform for AI and compute workloads. Own architecture and production engineering for observability, cluster health, remediation, lifecycle, secrets, CI/CD, backup/DR, and automation. Collaborate with cross-functional teams to build scalable, reliable multi-region services and run them in production (on-call).

Top Skills: ArgoAws KmsBmcCosignCrdtDatadogDcgmDdnElasticsearchFluxGcp KmsGoHashicorp VaultHelmInfinibandIpmiJaegerJavaKuberayKubernetesKubernetes Operator (Crd/Controller)KueueKustomizeLokiLustreMimirMtlsNcclNetappNvme-OfOpentelemetryPaxosPrometheusPrometheus QueryPurePythonRaftRayRedfishRoceRustSlurmSQLTempoThanosVastVictoriametricsVolcano

Cato Networks

Senior SRE - Government Cloud Operations

10 Days AgoSaved

Remote

United States

Senior level

Information Technology • Security • Cybersecurity

Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.

Top Skills: Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform

Kody

Senior Site Reliability Engineer- Palo Alto, the US

10 Days AgoSaved

In-Office

Palo Alto, CA, USA

Senior level

Fintech • Payments • Software • Financial Services

Lead Site Reliability Engineer responsible for ensuring platform scalability and uptime on AWS. Own CI/CD and GitHub repository practices, run deployment pipelines, manage incidents and post-mortems, implement observability and logging, and coordinate technical alignment across US and international teams with bilingual communication.

Top Skills: AlertingAWSCi/CdDeployment PipelinesGitGitGithub ActionsLog ManagementMonitoring ToolsObservabilityScripting

Kody

Senior Site Reliability Engineer- San Francisco, CA, the US

10 Days AgoSaved

In-Office

San Francisco, CA, USA

Senior level

Fintech • Payments • Software • Financial Services

Senior SRE responsible for ensuring platform scalability, reliability, and runtime efficiency on AWS. Own CI/CD and GitHub repo workflows, lead incident response and post-mortems, implement observability/monitoring and logging, and collaborate cross-border using bilingual Mandarin and English.

Top Skills: AlertingAWSCi/CdDeployment PipelinesGitGithub ActionsLoggingMonitoringObservabilityScripting

Target

Senior Site Reliability Engineer - Target.com Web Enablement

Reposted 10 Days AgoSaved

In-Office

55445, Minneapolis, MN, USA

98K-176K Annually

Senior level

98K-176K Annually

Senior level

eCommerce • Other • Retail

As a Senior Site Reliability Engineer, you will build and support platforms for reliable digital experiences, improve system reliability, and guide technical decisions within the team.

Top Skills: AWSAzureBashDockerFastlyGCPGitGithub ActionsGoKubernetesNext.JsNode.jsReact

HiveWatch

Senior Site Reliability Engineer

Reposted 10 Days AgoSaved

In-Office

El Segundo, CA, USA

183K-235K Annually

Senior level

183K-235K Annually

Senior level

Artificial Intelligence • Machine Learning • Security • Software

The Senior Staff Site Reliability Engineer will be responsible for ensuring system reliability, debugging issues, mentoring the engineering team, and maintaining infrastructure and CI/CD pipelines.

Top Skills: AWSDatadogDockerGithub ActionsGrafanaHelmKotlinKubernetesPostgresPrometheusPythonRustTerraformTerragruntTypescript

Climavision

Senior Site Reliability Engineer (C#, .NET)

11 Days AgoSaved

Remote

United States

135K-170K Annually

Senior level

135K-170K Annually

Senior level

Big Data • Analytics

Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.

Top Skills: .NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform

Synthesia

Senior Site Reliability Engineer

Reposted 11 Days AgoSaved

Remote

Senior level

Artificial Intelligence

Own operational excellence for cloud infrastructure: run incident management, improve reliability through automation, own a platform domain (e.g., Kubernetes, Temporal, observability), manage vendor and cost relationships, and deliver measurable reductions in incidents and costs within 12 months.

Top Skills: AWSKubernetesLlm ApisMongoDBObservabilityPythonTemporal

Autodesk

Senior Site Reliability Engineer

Reposted 11 Days AgoSaved

In-Office

San Francisco, CA, USA

117K-209K Annually

Senior level

117K-209K Annually

Senior level

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial

Lead reliability for Autodesk GovCloud services by deploying, operating, and automating production systems. Define SLOs/SLIs, build observability and automation, run incident response and on-call rotation, ensure compliance (FedRAMP), perform resilience testing and toil reduction, and collaborate across engineering, security, and platform teams to improve service reliability and operability.

Top Skills: APIsAWSAws GovcloudAzureBashCaching TechnologiesCi/CdCloudwatchContainersDatabasesDatadogDnsDynatraceFedrampGoIl4Il5Infrastructure As CodeJavaKubernetesLoad BalancingMessaging SystemsNetworkingPowershellPythonSplunkStorage Platforms

Autodesk

Senior Site Reliability Engineer

Reposted 11 Days AgoSaved

Remote

Idaho, USA

117K-209K Annually

Senior level

117K-209K Annually

Senior level

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial

Lead reliability for production services in Autodesk GovCloud: deploy, operate, and automate cloud services; define SLOs/SLIs and observability; drive incident response, resilience testing, and toil reduction; ensure compliance (FedRAMP) and participate in 24x7 on-call rotation.

Top Skills: APIsAWSAws GovcloudAzureBashCi/CdCloudwatchContainersDatadogDnsDynatraceGoInfrastructure As CodeJavaKubernetesLoad BalancingNetworkingPowershellPythonSplunk