Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Earnin

Staff Site Reliability Engineer

Reposted 17 Days AgoSaved

In-Office

Mountain View, CA, USA

252K-308K Annually

Senior level

252K-308K Annually

Senior level

Fintech • Payments • Financial Services

Lead EarnIn's AI-first reliability engineering, enhancing incident response, automation, and resilience in operations while mentoring engineers.

Top Skills: AIAWSCloudwatchDatadogGoKubernetesOpentelemetryPythonTerraform

SpaceX

Sr. IT Linux Site Reliability Engineer

Reposted 17 Days AgoSaved

In-Office

Star, TX, USA

Senior level

Aerospace • Other

The role involves managing Kubernetes and Linux servers, supporting containerized applications, implementing automation solutions, and mentoring peers in a fast-paced environment.

Top Skills: AnsibleDockerGoGrafanaHelmInfluxdbJSONKubernetesLinuxPrometheusPythonRkeTerraformYaml

SpaceX

Sr. IT Linux Site Reliability Engineer

Reposted 17 Days AgoSaved

In-Office

Hawthorne, CA, USA

160K-220K Annually

Senior level

160K-220K Annually

Senior level

Aerospace • Other

The Sr. IT Linux Site Reliability Engineer will manage and optimize Kubernetes clusters, automate systems, and collaborate with teams to ensure system resilience and performance.

Top Skills: AnsibleDockerGoGrafanaKubernetesLinuxPrometheusPythonTerraform

Workday

(Sr) Site Reliability Engineer (US Federal)

18 Days AgoSaved

In-Office

Reston, VA, USA

148K-264K Annually

Senior level

148K-264K Annually

Senior level

Cloud • Fintech • HR Tech

Operate, monitor, and maintain Workdays Core Platform to ensure high availability and security. Automate infrastructure and CI/CD pipelines, improve observability and incident response, troubleshoot platform issues, document systems, and collaborate with engineering teams. Support federal contracts requiring U.S. citizenship and potential security clearance; may require onsite presence in the DC/MD/VA area.

Top Skills: Automated TestingCi/CdConcurrencyDistributed SystemsInfrastructure AutomationJavaKotlinLoad TestingMultithreadingObservabilityScala

Freddie Mac

Site Reliability Engineer Tech Lead

18 Days AgoSaved

In-Office

2 Locations

145K-217K Annually

Senior level

145K-217K Annually

Senior level

Financial Services

Lead SRE designing and automating infrastructure and operations to ensure availability, scalability, and performance. Own incident response, monitoring/observability, capacity planning, performance optimization, CI/CD automation, and documentation while mentoring teams and driving automation-first improvements.

Top Skills: AnsibleAutosysAWSAzureBashCi/CdConfiguration ManagementControl-MDockerDremioEksElasticsearch ApmElasticsearch ObservabilityGCPGoJavaJenkinsKubernetesMongoDBOpentelemetryPythonQlik ReplicateSnowflakeSQLTerraform

Canonical

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

In-Office or Remote

7 Locations

Senior level

Cloud • Software

The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.

Top Skills: KubernetesLinuxOpenstackPython

Circle (circle.so)

Senior Site Reliability Engineer

24 Days AgoSaved

Easy Apply

Remote

31 Locations

Easy Apply

130K-140K Annually

Senior level

130K-140K Annually

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

SimSpace

Staff Site Reliability Engineer

Reposted 18 Days AgoSaved

Remote

U.S.

165K-230K Annually

Senior level

165K-230K Annually

Senior level

Information Technology • Security

The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.

Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython

Refine Technology Inc

Site Reliability Engineer (In-Person)

18 Days AgoSaved

Hybrid

New York, NY, USA

95K-125K Annually

Mid level

95K-125K Annually

Mid level

Artificial Intelligence • eCommerce • Retail • Software

Build and maintain CI/CD pipelines, manage and automate cloud infrastructure and configurations, implement monitoring/logging and alerting for reliability, enforce security and compliance practices, and collaborate with development teams to support scaling and operations.

Top Skills: Soc ISoc Ii

Latent

Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office

San Francisco, CA, USA

200K-275K Annually

Senior level

200K-275K Annually

Senior level

Artificial Intelligence • Healthtech • Information Technology • Software

As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.

Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript

Andromeda (andromeda.ai)

Staff SRE, AI Infrastructure

Reposted 18 Days AgoSaved

In-Office or Remote

3 Locations

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.

Top Skills: AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform

Arista Networks

FedRAMP Site Reliability Engineer (FedSRE) - CloudVision

Reposted 18 Days AgoSaved

Remote

101K-161K Annually

Senior level

101K-161K Annually

Senior level

Cloud • Software • Analytics

Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.

Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

LSEG (London Stock Exchange Group)

CDSClear IT Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office

Saint Louis, MO, USA

Senior level

Fintech • Analytics

The role involves managing application services, driving improvements, handling incidents, and leveraging domain knowledge to enhance service quality and efficiency.

Top Skills: DatadogItrs

VERISIGN

Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office

Reston, VA, USA

136K-184K Annually

Senior level

136K-184K Annually

Senior level

Information Technology • Software

The Site Reliability Engineer will support critical services management and deployment, coordinate with teams, and participate in 24x7 on-call rotations.

Top Skills: AnsibleDockerJenkinsKubernetesLinuxOpenstackPythonRhelSelinux

Early Warning

Principal Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office

Scottsdale, AZ, USA

194K-237K Annually

Expert/Leader

194K-237K Annually

Expert/Leader

Fintech

The Principal Site Reliability Engineer designs , improves software and tools for performance, scalability, and availability, while leading incident management and collaborating with development teams.

Top Skills: AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm

Autodesk

Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office

Atlanta, GA, USA

117K-209K Annually

Senior level

117K-209K Annually

Senior level

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial

The Site Reliability Engineer will architect solutions for SaaS applications, maintain cloud infrastructure, implement security best practices, and collaborate with teams for product quality. Responsibilities include incident management, monitoring, and automation of processes.

Top Skills: AWSBashCloudFormationCloudwatchDockerDynatraceGrafanaJenkinsKubernetesMssqlMySQLNew RelicPerlPostgresPythonSplunkTerraform

Solidus Labs

DevOps/SRE

19 Days AgoSaved

Hybrid

New York, NY, USA

Mid level

Cryptocurrency

Own production reliability, availability, and performance for cloud-native systems. Operate and scale Kubernetes (EKS) clusters, manage AWS infrastructure, implement IaC with Terraform and Helm, improve CI/CD, build observability with Prometheus/Grafana/EFK, lead incident response and RCA, participate in on-call rotations, and support security and compliance.

Top Skills: AirflowAws BatchAws Ec2Aws LambdaAws OrganizationsBashClickhouseCloudwatchDatabricksDockerDynamoDBEfk (ElasticsearchEksElasticacheEmrFluentdGitlab Ci/CdGitopsGrafanaHelmHpaKafkaKarpenterKedaKibana)KubernetesLoad BalancingNatPostgresPrometheusPythonRdsRedisS3SnowflakeSparkSqsTerraformTlsVpcVpn

Rogue Fitness

DevOps Site Reliability Engineer

19 Days AgoSaved

In-Office

Columbus, OH, USA

Senior level

Fitness • Retail • Sports • Manufacturing

Design, implement, and maintain highly available, scalable infrastructure and CI/CD for applications. Automate deployments, monitor performance, troubleshoot incidents, manage IaC with Terraform, support disaster recovery, and collaborate with dev and ops teams to improve reliability and security.

Top Skills: Application InsightsAzureAzure DevopsBashBitbucketDockerGCPGcp Cloud MonitoringGitGrafanaHelmJenkinsKubernetesPowershellPrometheusTerraform

Resideo

Director, Site Reliability Engineering & Cloud Operations (SRE)

Reposted 19 Days AgoSaved

Hybrid

2 Locations

Expert/Leader

Internet of Things • Software • Manufacturing

Lead and oversee cloud operations and Site Reliability Engineering for a global IoT ecosystem, architecting strategies for performance, security, and innovation while mentoring a team of professionals in multi-cloud environments.

Top Skills: AnsibleAzureCi/CdCloudElkGrafanaIotKubernetesPrometheusSreTerraform

Astera

Site Reliability Engineer

Reposted 19 Days AgoSaved

Hybrid

Emeryville, CA, USA

Entry level

Artificial Intelligence • Machine Learning • Biotech • Generative AI

The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.

Top Skills: AnsibleDockerGrafanaKubernetesPrometheusPythonTailscaleTalos Linux

Lovelace AI

Software Engineer - Site Reliability Engineer (SRE)

Reposted 19 Days AgoSaved

In-Office

Lovelace, NC, USA

Senior level

Artificial Intelligence • Machine Learning • Security • Database • Analytics • Big Data Analytics

As a Site Reliability Engineer, you'll ensure the availability and performance of AI applications, maintain infrastructure, automate tasks, and troubleshoot issues in high-scale environments.

Top Skills: AnsibleAWSAzureBashCircleCICloudFormationDatadogDockerDynatraceEc2Elk StackGCPGitlab CiGoGrafanaJenkinsKubernetesLambdaLinuxPrometheusPythonS3TerraformUnix

Cohere AI

Site Reliability Engineer, Inference Infrastructure

Reposted 19 Days AgoSaved

In-Office or Remote

5 Locations

Senior level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI

The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.

Top Skills: AWSAzureC++GCPGoKubernetesOci

Regrello

Senior Site Reliability Engineer

Reposted YesterdaySaved

Remote

United States

150K-200K Annually

Mid level

150K-200K Annually

Mid level

Software

As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.

Top Skills: AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform

Coinbase

Senior Site Reliability Engineer, Workforce Identity

25 Days AgoSaved

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.

Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform

Coinbase

Senior Site Reliability Engineer, Core AI Infrastructure

25 Days AgoSaved

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.