Get the job you really want.

Top Site Reliability Engineer Jobs

Reposted 2 Days AgoSaved
Easy Apply
In-Office
Fort Meade, MD, USA
Easy Apply
Senior level
Senior level
Information Technology • Security • Software
Manage daily operations of a classified NOC, focusing on Kubernetes services, incident response, system monitoring, and ensuring security and availability.
Top Skills: Aws GovcloudAzure GovernmentC2EC2SDockerElastic StackFluentdFluxGrafanaHelmJIRAJwccKubernetesOsticketPrometheusTerraform
2 Days AgoSaved
Hybrid
2 Locations
15-15 Annually
Expert/Leader
15-15 Annually
Expert/Leader
Software
The Principal Site Reliability Engineer will design and improve systems for reliability in payments software, guiding development cycles and incident response, while ensuring service health and organizational efficiency.
Top Skills: CassandraGoJavaKafkaOraclePostgresPythonRabbitMQShell
2 Days AgoSaved
Hybrid
2 Locations
Expert/Leader
Expert/Leader
Software
The Principal Site Reliability Engineer will enhance system reliability, promote SRE practices, lead organizational improvements, and ensure efficient software development and incident response processes.
Top Skills: CassandraGoJavaKafkaOraclePostgresPythonRabbitMQShell
2 Days AgoSaved
Easy Apply
In-Office
Seattle, WA, USA
Easy Apply
125K-150K Annually
Senior level
125K-150K Annually
Senior level
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
As a Site Reliability Engineer II, you'll develop automation workflows, manage cloud operations, and enhance service reliability while participating in incident response and code reviews.
Top Skills: ApmAWSAws CloudformationAzureC#Ci/CdGoJavaKubernetesObservability ToolsPythonTemporalTerraform
2 Days AgoSaved
In-Office
Houston, TX, USA
Senior level
Senior level
Other • Energy
Lead SRE practices for GCP-based data platforms, automate workflows, design reliable architectures, mentor engineers, and improve operational processes.
Top Skills: BigQueryCi/CdCloud LoggingCloud MonitoringCloud StorageCompute EngineDataflowDatastreamGithub ActionsGitlab CiGkeGoogle Cloud PlatformIamKubernetesPub/SubPythonTerraform
Reposted 2 Days AgoSaved
In-Office
West Chester, PA, USA
Mid level
Mid level
Aerospace • Energy
The Site Reliability Engineer ensures performance and availability of compute and network infrastructure, automating solutions and addressing potential issues proactively while mentoring junior staff.
Top Skills: .NetAnsibleAWSAzureChefDatadogFtpGoGrafanaJavaNode.jsPuppetPythonSaltSensuSnmpSplunkTcp/IpTerraform
Reposted 2 Days AgoSaved
In-Office
30005, Alpharetta, GA, USA
Mid level
Mid level
Fintech • Consulting
The SRE at Equifax ensures reliability and performance of large-scale systems, automating operational tasks and collaborating with dev and ops teams in a hybrid work environment.
Top Skills: AnsibleBashChefDockerGithub ActionsGoJavaJavaScriptJenkinsKubernetesNode.jsPythonTerraform
Reposted 2 Days AgoSaved
In-Office or Remote
11 Locations
160K-179K Annually
Senior level
160K-179K Annually
Senior level
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
2 Days AgoSaved
In-Office or Remote
26 Locations
107K-284K Annually
Senior level
107K-284K Annually
Senior level
Fitness • Healthtech • Retail • Pharmaceutical
The Senior Manager, SRE Release Engineering oversees Release Engineering for the Pharmacy & Consumer Wellness line, ensuring high-quality technology releases through collaboration with IT teams and managing end-to-end change releases.
Top Skills: AWSAzureDockerGCPKubernetesServicenowSharepoint
Reposted 2 Days AgoSaved
In-Office
Headquarters, AZ, USA
Expert/Leader
Expert/Leader
Retail • Sports
Lead global D2C Site Reliability and Platform Operations to ensure availability, performance, and scalability of eCommerce and omnichannel systems. Define SRE strategy, SLIs/SLOs, incident management, observability, cloud operations, FinOps, vendor management, and global on-call models while building and developing high-performing teams and operational playbooks.
Top Skills: AlertingCi/CdCloud InfrastructureError BudgetsFinopsIncident ManagementMonitoringObservabilitySite Reliability Engineering (Sre)SlasSlisSlos
Reposted 2 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
Senior level
Senior level
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills: AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Reposted 2 Days AgoSaved
Easy Apply
Remote
United States
Easy Apply
172K-215K Annually
Senior level
172K-215K Annually
Senior level
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills: AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 2 Days AgoSaved
Easy Apply
In-Office
Denver, CO, USA
Easy Apply
130K-170K Annually
Mid level
130K-170K Annually
Mid level
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing observability solutions using OpenTelemetry, managing platform engineering tasks, and ensuring site reliability through various engineering practices.
Top Skills: AWSAzureCi/CdCloudFormationDockerGCPGoJavaKubernetesNode.jsOpentelemetryPulumiPythonRustTerraform
Reposted 2 Days AgoSaved
Easy Apply
In-Office
Chicago, IL, USA
Easy Apply
130K-170K Annually
Senior level
130K-170K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing telemetry infrastructure, establishing SRE practices, and managing observability across cloud platforms.
Top Skills: ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsNode.jsOpentelemetryPowershellPulumiPythonRustTerraform
Reposted 2 Days AgoSaved
Easy Apply
Remote
United States
Easy Apply
150K-185K Annually
Mid level
150K-185K Annually
Mid level
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills: AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Reposted 2 Days AgoSaved
Hybrid
2 Locations
135K-285K Annually
Mid level
135K-285K Annually
Mid level
Software
As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.
Top Skills: Tensorrt
Reposted 2 Days AgoSaved
Remote
USA
110K-130K Annually
Senior level
110K-130K Annually
Senior level
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills: ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Reposted 2 Days AgoSaved
In-Office
St. Louis, MO, USA
100K-120K Annually
Senior level
100K-120K Annually
Senior level
Fintech • Analytics
As a Senior Site Reliability Engineer, you'll lead incident recovery, enhance production stability, automate processes, and collaborate with development teams to improve operational efficiency.
Top Skills: AWSAzureBigpandaCloud-Native ApplicationsDatadogDnsDockerGitHTTPKubernetesShell ScriptingTcp/IpUnix
Reposted 2 Days AgoSaved
In-Office
San Francisco, CA, USA
238K-290K Annually
Expert/Leader
238K-290K Annually
Expert/Leader
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform
Reposted 2 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-260K Annually
Mid level
200K-260K Annually
Mid level
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform
3 Days AgoSaved
In-Office
Overland Park, KS, USA
20-40 Hourly
Internship
20-40 Hourly
Internship
Other • Utilities
The intern will support the maintenance of critical internet systems, focusing on automation, monitoring, and testing for performance and uptime. Responsibilities include collaborating with teams, managing infrastructure, and conducting operational analysis to improve services.
Top Skills: Configuration ManagementDevops-Centric Automation Tools
Reposted 8 Days AgoSaved
Easy Apply
Hybrid
Austin, TX, USA
Easy Apply
129K-232K Annually
Senior level
129K-232K Annually
Senior level
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer, you'll ensure site reliability, improve infrastructure automation, manage incidents, and collaborate with engineering teams to enhance systems.
Top Skills: DockerGoKafkaKubernetesLinuxMongoDBPostgresRedisRubyTerraform
Reposted 8 Days AgoSaved
Easy Apply
Hybrid
Chicago, IL, USA
Easy Apply
129K-232K Annually
Senior level
129K-232K Annually
Senior level
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer, you will ensure the reliability of internal services, improving automation and infrastructure, and collaborating with engineering teams to resolve issues and enhance product performance.
Top Skills: DockerGoKafkaKubernetesMongoDBPostgresRedisRubyTerraform
Reposted 8 Days AgoSaved
Easy Apply
Hybrid
San Francisco, CA, USA
Easy Apply
129K-232K Annually
Senior level
129K-232K Annually
Senior level
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer at Braze, you'll ensure uptime for internal services, improve automation, and develop infrastructure tools, collaborating across teams to enhance reliability and scalability.
Top Skills: ChefDockerKafkaKubernetesMongoDBRedisRuby On RailsTerraform
3 Days AgoSaved
Easy Apply
Remote
US
Easy Apply
110K-175K Annually
Senior level
110K-175K Annually
Senior level
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills: AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account