Top Site Reliability Engineer Jobs

13 Days AgoSaved
In-Office or Remote
Eden Prairie, MN, USA
92K-164K Annually
Senior level
92K-164K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Site Reliability Engineer will manage and enhance cloud infrastructure, focusing on automation, performance, and security while collaborating with software and DevOps teams.
Top Skills: ArgocdAzureAzure MonitorDynatraceFluxGrafanaHelmKubernetesPrometheusPulumiRestful ServicesSplunkTerraform
7 Days AgoSaved
Hybrid
Arlington, TX, USA
Mid level
Mid level
Fintech • Financial Services
The Site Reliability Engineer I will support cloud infrastructure and assist in cloud transformation initiatives, focusing on performance and delivery of public cloud solutions, primarily in Azure. Responsibilities include troubleshooting, monitoring, automation, and contributing to operational readiness practices for cloud services.
Top Skills: .NetAnsibleAWSAzureAzure CliGCPJenkinsKubernetesLinuxPowershellTerraformWindows
Reposted 7 Days AgoSaved
Remote
United States
150K-185K Annually
Mid level
150K-185K Annually
Mid level
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills: AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Reposted 8 Days AgoSaved
Remote
2 Locations
Senior level
Senior level
Big Data • Cloud • Information Technology
The Site Reliability Engineer at Iron Mountain will troubleshoot escalated tickets, manage Windows Server builds, perform security patching, and collaborate with customers and vendors to resolve issues and maintain systems.
Top Skills: CloudComputeHyper-Converged InfrastructureLinuxMicrosoft Endpoint Configuration ManagerNetworkNutanixPowershellRubrikStorageVirtualizationWindows Server
8 Days AgoSaved
Remote
US
Senior level
Senior level
Information Technology • Software • Cybersecurity • Automation
Design, build, and operate an agentic platform to automate vulnerability remediation and incident response while ensuring reliability in security operations.
Top Skills: DatadogGitGrafanaLinearLlmsOpentelemetryPrometheusSlack
8 Days AgoSaved
In-Office
Birmingham, AL, USA
Senior level
Senior level
Automotive • Hardware • Logistics
The Manager of Site Reliability Engineering leads a team to enhance cloud infrastructure reliability, automate processes, and collaborate with various teams to improve service delivery and operations.
Top Skills: ArgocdCi/CdDatadogDynatraceGCPGoogle Cloud PlatformKubernetesTerraform
8 Days AgoSaved
In-Office
Pittsburgh, PA, USA
146K-162K Annually
Senior level
146K-162K Annually
Senior level
Financial Services
The Lead Site Reliability Engineer will establish the SRE operating model, implement AI-enabled reliability use cases, manage reliability metrics, and oversee operational readiness while collaborating with teams and mentoring engineers.
Top Skills: Ai/MlAnsibleAzure DevopsDockerGithub ActionsGitlab CiJenkinsKubernetesTerraformVMware
Reposted 13 Days AgoSaved
Hybrid
New York City, NY, USA
205K-225K Annually
Senior level
205K-225K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Social Impact • Analytics • Financial Services • Automation
As a Senior SRE, you'll ensure reliable and scalable systems, develop observability solutions and infrastructure as code, and lead incident response efforts.
Top Skills: AWSCloudFormationDatadogElkPrometheusTerraform
8 Days AgoSaved
In-Office
Palo Alto, CA, USA
232K-263K Annually
Senior level
232K-263K Annually
Senior level
Cybersecurity
As a Sr. Staff Site Reliability Engineer, you will define the reliability vision for a multi-tenant SaaS platform, lead the architecture of detection systems, and partner across teams to improve incident management and system resilience, ensuring issues are resolved before affecting customers.
Top Skills: ArgocdAWSGCPGitlab Ci/CdGrafanaHelmKubernetesPrometheus
Reposted 8 Days AgoSaved
In-Office
Chicago, IL, USA
130K-170K Annually
Senior level
130K-170K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing OpenTelemetry solutions, optimizing telemetry infrastructure, establishing SRE practices, and managing observability across cloud platforms.
Top Skills: ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsNode.jsOpentelemetryPowershellPulumiPythonRustTerraform
Reposted 8 Days AgoSaved
In-Office
Houston, TX, USA
Senior level
Senior level
Other • Energy
Lead SRE practices for GCP-based data platforms, automate workflows, design reliable architectures, mentor engineers, and improve operational processes.
Top Skills: BigQueryCi/CdCloud LoggingCloud MonitoringCloud StorageCompute EngineDataflowDatastreamGithub ActionsGitlab CiGkeGoogle Cloud PlatformIamKubernetesPub/SubPythonTerraform
Reposted 8 Days AgoSaved
In-Office
San Francisco, CA, USA
238K-290K Annually
Expert/Leader
238K-290K Annually
Expert/Leader
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 8 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-260K Annually
Mid level
200K-260K Annually
Mid level
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.
Top Skills: AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform
Reposted 8 Days AgoSaved
Hybrid
San Francisco, CA, USA
190K-220K Annually
Senior level
190K-220K Annually
Senior level
Artificial Intelligence • Big Data • Software
You will manage the infrastructure for the Data Replication team, focusing on Kubernetes, reliability standards, and integrating product features with infrastructure. You'll enhance observability and tooling using AI, ensuring engineers can effectively manage their stack.
Top Skills: AIAWSCi/CdDatadogGCPGrafanaKubernetesPrometheusTerraform
Reposted 8 Days AgoSaved
In-Office or Remote
11 Locations
160K-179K Annually
Senior level
160K-179K Annually
Senior level
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
9 Days AgoSaved
Remote
USA
180K-210K Annually
Senior level
180K-210K Annually
Senior level
Artificial Intelligence • Insurance • Software • Automation
The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.
Top Skills: AWSKubernetesPostgresTerraform
Reposted 9 Days AgoSaved
In-Office
Secaucus, NJ, USA
150K-170K Annually
Expert/Leader
150K-170K Annually
Expert/Leader
Healthtech • Database
Seeking a Principal Site Reliability Engineer to build a SRE practice, enhance reliability, mentor teams, and drive performance engineering to optimize Quest products and services.
Top Skills: AnsibleAuroraAWSAzureBigtableCassandraCi/CdCloud Pub/SubCloud SpannerCloud SqlDockerDynamoDBDynatraceGitlabGoGCPJavaJmsKafkaKinesisKubernetesMqPerlPythonRdsRubyShell ScriptingTerraform
9 Days AgoSaved
In-Office
Atlanta, GA, USA
99K-124K Annually
Senior level
99K-124K Annually
Senior level
Fintech • Insurance • Financial Services
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, develop software for reliability, implement CI/CD pipelines, monitor performance, collaborate on AI/ML workloads, and lead incident response efforts.
Top Skills: AnsibleAWSAzureDynatraceGitJavaPythonTerraform
9 Days AgoSaved
Hybrid
Arvada, CO, USA
160K-200K Annually
Mid level
160K-200K Annually
Mid level
Aerospace • Cloud • Software • Defense • Automation
Design and automate cloud systems for U.S. Government, focusing on DevSecOps, reliability, deployment automation, and observability. Participate in on-call rotations, supporting production environments and improving system resilience.
Top Skills: Aws EksDatadogGitlabGrafanaKubernetesLinux/UnixPythonTerraform
9 Days AgoSaved
In-Office
Los Angeles, CA, USA
130K-145K Annually
Mid level
130K-145K Annually
Mid level
Events
The Site Reliability Engineer II designs and maintains scalable systems, focusing on automation, monitoring, incident response, and collaboration with developers to enhance operational practices and efficiency.
Top Skills: BashCloud Service OperationsContainersContinuous DeliveryContinuous IntegrationGoInfrastructure As CodeOrchestration PlatformsPython
Reposted 9 Days AgoSaved
In-Office
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Software
The Site Reliability Engineer ensures the reliability and performance of products Devin and Windsurf, managing incident response, CI/CD pipelines, infrastructure as code, and fostering a reliability culture within the engineering team.
Top Skills: AWSAzureCi/CdGCPKubernetesTerraform
Reposted 9 Days AgoSaved
In-Office
Overland Park, KS, USA
Senior level
Senior level
Healthtech • Professional Services • Software
The Sr Software Engineer leads complex software development, ensuring solution scalability, collaborating with teams, solving technical problems, and advocating for high-quality software solutions.
Top Skills: AngularArgo CdAzure DevopsCi/CdGoogle Cloud PlatformKubernetesNew RelicOpentelemetryRuby On RailsTerraform
Reposted 9 Days AgoSaved
In-Office
New York, NY, USA
177K-265K Annually
Senior level
177K-265K Annually
Senior level
Fintech • Financial Services
The Site Reliability Engineer Lead oversees daily operations and architectural resilience, driving SRE principles for application performance and efficiency, and fostering a culture of technical excellence.
Top Skills: AnsibleAppdynamicsGoGrafanaJavaKubernetesLokiMimirOpenshiftPrometheusPythonTempoTerrraform
Reposted 9 Days AgoSaved
Remote
United States
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Reposted 9 Days AgoSaved
In-Office
6 Locations
90K-122K Annually
Mid level
90K-122K Annually
Mid level
Fintech • Analytics
The Site Reliability Engineer will manage production monitoring, incident response, and enhance automation using various tools. They will ensure observability and participate in SRE process improvements.
Top Skills: AWSCucumberDatadog ApmDatadog DbmDynamoDBEc2EcsElkJavaJenkinsPagerdutyPlaywrightRdsS3Secrets ManagerSeleniumServicenowSplunkSpring Boot
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account