Top Site Reliability Engineer Jobs

Reposted 17 Days AgoSaved
In-Office
Mountain View, CA, USA
252K-308K Annually
Senior level
252K-308K Annually
Senior level
Fintech • Payments • Financial Services
Lead EarnIn's AI-first reliability engineering, enhancing incident response, automation, and resilience in operations while mentoring engineers.
Top Skills: AIAWSCloudwatchDatadogGoKubernetesOpentelemetryPythonTerraform
Reposted 17 Days AgoSaved
In-Office
Star, TX, USA
Senior level
Senior level
Aerospace • Other
The role involves managing Kubernetes and Linux servers, supporting containerized applications, implementing automation solutions, and mentoring peers in a fast-paced environment.
Top Skills: AnsibleDockerGoGrafanaHelmInfluxdbJSONKubernetesLinuxPrometheusPythonRkeTerraformYaml
Reposted 17 Days AgoSaved
In-Office
Hawthorne, CA, USA
160K-220K Annually
Senior level
160K-220K Annually
Senior level
Aerospace • Other
The Sr. IT Linux Site Reliability Engineer will manage and optimize Kubernetes clusters, automate systems, and collaborate with teams to ensure system resilience and performance.
Top Skills: AnsibleDockerGoGrafanaKubernetesLinuxPrometheusPythonTerraform
18 Days AgoSaved
In-Office
Reston, VA, USA
148K-264K Annually
Senior level
148K-264K Annually
Senior level
Cloud • Fintech • HR Tech
Operate, monitor, and maintain Workdays Core Platform to ensure high availability and security. Automate infrastructure and CI/CD pipelines, improve observability and incident response, troubleshoot platform issues, document systems, and collaborate with engineering teams. Support federal contracts requiring U.S. citizenship and potential security clearance; may require onsite presence in the DC/MD/VA area.
Top Skills: Automated TestingCi/CdConcurrencyDistributed SystemsInfrastructure AutomationJavaKotlinLoad TestingMultithreadingObservabilityScala
18 Days AgoSaved
In-Office
2 Locations
145K-217K Annually
Senior level
145K-217K Annually
Senior level
Financial Services
Lead SRE designing and automating infrastructure and operations to ensure availability, scalability, and performance. Own incident response, monitoring/observability, capacity planning, performance optimization, CI/CD automation, and documentation while mentoring teams and driving automation-first improvements.
Top Skills: AnsibleAutosysAWSAzureBashCi/CdConfiguration ManagementControl-MDockerDremioEksElasticsearch ApmElasticsearch ObservabilityGCPGoJavaJenkinsKubernetesMongoDBOpentelemetryPythonQlik ReplicateSnowflakeSQLTerraform
Reposted 3 Days AgoSaved
In-Office or Remote
7 Locations
Senior level
Senior level
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills: KubernetesLinuxOpenstackPython
24 Days AgoSaved
Easy Apply
Remote
31 Locations
Easy Apply
130K-140K Annually
Senior level
130K-140K Annually
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Reposted 18 Days AgoSaved
Remote
U.S.
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
18 Days AgoSaved
Hybrid
New York, NY, USA
95K-125K Annually
Mid level
95K-125K Annually
Mid level
Artificial Intelligence • eCommerce • Retail • Software
Build and maintain CI/CD pipelines, manage and automate cloud infrastructure and configurations, implement monitoring/logging and alerting for reliability, enforce security and compliance practices, and collaborate with development teams to support scaling and operations.
Top Skills: Soc ISoc Ii
Reposted 18 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-275K Annually
Senior level
200K-275K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Reposted 18 Days AgoSaved
In-Office or Remote
3 Locations
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills: AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Reposted 18 Days AgoSaved
Remote
US
101K-161K Annually
Senior level
101K-161K Annually
Senior level
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 18 Days AgoSaved
In-Office
Saint Louis, MO, USA
Senior level
Senior level
Fintech • Analytics
The role involves managing application services, driving improvements, handling incidents, and leveraging domain knowledge to enhance service quality and efficiency.
Top Skills: DatadogItrs
Reposted 18 Days AgoSaved
In-Office
Reston, VA, USA
136K-184K Annually
Senior level
136K-184K Annually
Senior level
Information Technology • Software
The Site Reliability Engineer will support critical services management and deployment, coordinate with teams, and participate in 24x7 on-call rotations.
Top Skills: AnsibleDockerJenkinsKubernetesLinuxOpenstackPythonRhelSelinux
Reposted 18 Days AgoSaved
In-Office
Scottsdale, AZ, USA
194K-237K Annually
Expert/Leader
194K-237K Annually
Expert/Leader
Fintech
The Principal Site Reliability Engineer designs , improves software and tools for performance, scalability, and availability, while leading incident management and collaborating with development teams.
Top Skills: AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm
Reposted 18 Days AgoSaved
In-Office
Atlanta, GA, USA
117K-209K Annually
Senior level
117K-209K Annually
Senior level
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
The Site Reliability Engineer will architect solutions for SaaS applications, maintain cloud infrastructure, implement security best practices, and collaborate with teams for product quality. Responsibilities include incident management, monitoring, and automation of processes.
Top Skills: AWSBashCloudFormationCloudwatchDockerDynatraceGrafanaJenkinsKubernetesMssqlMySQLNew RelicPerlPostgresPythonSplunkTerraform
19 Days AgoSaved
Hybrid
New York, NY, USA
Mid level
Mid level
Cryptocurrency
Own production reliability, availability, and performance for cloud-native systems. Operate and scale Kubernetes (EKS) clusters, manage AWS infrastructure, implement IaC with Terraform and Helm, improve CI/CD, build observability with Prometheus/Grafana/EFK, lead incident response and RCA, participate in on-call rotations, and support security and compliance.
Top Skills: AirflowAws BatchAws Ec2Aws LambdaAws OrganizationsBashClickhouseCloudwatchDatabricksDockerDynamoDBEfk (ElasticsearchEksElasticacheEmrFluentdGitlab Ci/CdGitopsGrafanaHelmHpaKafkaKarpenterKedaKibana)KubernetesLoad BalancingNatPostgresPrometheusPythonRdsRedisS3SnowflakeSparkSqsTerraformTlsVpcVpn
19 Days AgoSaved
In-Office
Columbus, OH, USA
Senior level
Senior level
Fitness • Retail • Sports • Manufacturing
Design, implement, and maintain highly available, scalable infrastructure and CI/CD for applications. Automate deployments, monitor performance, troubleshoot incidents, manage IaC with Terraform, support disaster recovery, and collaborate with dev and ops teams to improve reliability and security.
Top Skills: Application InsightsAzureAzure DevopsBashBitbucketDockerGCPGcp Cloud MonitoringGitGrafanaHelmJenkinsKubernetesPowershellPrometheusTerraform
Reposted 19 Days AgoSaved
Hybrid
2 Locations
Expert/Leader
Expert/Leader
Internet of Things • Software • Manufacturing
Lead and oversee cloud operations and Site Reliability Engineering for a global IoT ecosystem, architecting strategies for performance, security, and innovation while mentoring a team of professionals in multi-cloud environments.
Top Skills: AnsibleAzureCi/CdCloudElkGrafanaIotKubernetesPrometheusSreTerraform
Reposted 19 Days AgoSaved
Hybrid
Emeryville, CA, USA
Entry level
Entry level
Artificial Intelligence • Machine Learning • Biotech • Generative AI
The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.
Top Skills: AnsibleDockerGrafanaKubernetesPrometheusPythonTailscaleTalos Linux
Reposted 19 Days AgoSaved
In-Office
Lovelace, NC, USA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Security • Database • Analytics • Big Data Analytics
As a Site Reliability Engineer, you'll ensure the availability and performance of AI applications, maintain infrastructure, automate tasks, and troubleshoot issues in high-scale environments.
Top Skills: AnsibleAWSAzureBashCircleCICloudFormationDatadogDockerDynatraceEc2Elk StackGCPGitlab CiGoGrafanaJenkinsKubernetesLambdaLinuxPrometheusPythonS3TerraformUnix
Reposted 19 Days AgoSaved
In-Office or Remote
5 Locations
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
Reposted YesterdaySaved
Remote
United States
150K-200K Annually
Mid level
150K-200K Annually
Mid level
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills: AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
25 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
25 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account