Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Automotive • Hardware • Logistics
Builds and supports large-scale, distributed, fault-tolerant systems to improve reliability and automation. Administers networks and databases, monitors system health, configures load and data communications, coordinates equipment and vendor orders, and participates in change management to reduce incidents and support cloud transformations.
Top Skills:
CloudDistributed ComputingInternet SecurityMonitoring ToolsOracle ErpUnixVersion Control SystemsWindows 2000Windows 98Windows Nt
Information Technology
Design and build resilient infrastructure, implement monitoring and SLIs/SLOs, automate operations and self-healing, reduce toil with scripting, support enterprise-scale application reliability, act as subject matter expert for engineering teams, and meet government vetting and U.S. citizenship requirements.
Top Skills:
AWSCi/CdCloud-NativeCloudtrailCloudwatchGitGithub ActionsGitlab RunnersItsiJenkinsLinuxMicroservicesPaasPagerdutySaaSSplunkUnix
Artificial Intelligence • Cloud • Software • Cybersecurity
Operate and tune AWS environments to meet SLAs, build observability and alerts, automate infrastructure with IaC and CI/CD, define SLIs/SLOs, support security/compliance within a FISMA Moderate boundary, design resilience and DR plans, and own incident response and post-mortems.
Top Skills:
AnsibleAWSAws CloudwatchAws Trusted AdvisorCi/CdCloudFormationDockerGitlab CiJenkinsNew RelicPythonSplunkTerraform
Fintech
Build production-quality software to improve reliability, reduce operational toil, and scale systems. Own end-to-end features, participate in on-call rotations, analyze incidents, implement observability, and build automations using Node.js/TypeScript, Python, and AI-assisted tools.
Top Skills:
Ai-Assisted Development ToolsAWSAzureCi/CdGithub CopilotJavaScriptNode.jsPowershellPythonSQLTypescriptVs Code
Information Technology • Software • Consulting
Join Solvd as an Infrastructure/SRE Engineer to design, manage cloud infrastructure, build CI/CD pipelines, automate deployments, and ensure system reliability through observability and performance tuning.
Top Skills:
ArgocdAWSAzureBashDatadogDockerFluxGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesMemcachedNew RelicOpentofuPostgresPrometheusPythonRdsRedisTerraform
Digital Media • Social Media • Software • Sports
Lead the technical architecture and execution of migration to AWS, drive developer enablement, and automate infrastructure using code-first principles.
Top Skills:
Aws EksDatadogGithub ActionsGoIstioK6KubernetesNode.jsTerraform
Healthtech
The Senior Software Engineer will enhance system reliability, manage Kubernetes and AWS environments, oversee incident responses, and implement observability measures.
Top Skills:
AWSCloudwatchElbGithub ActionsKubernetesObservability ToolingTerraformVpc
Reposted 10 Days AgoSaved
Fintech • Analytics
As a Senior Site Reliability Engineer, you'll lead incident recovery, enhance production stability, automate processes, and collaborate with development teams to improve operational efficiency.
Top Skills:
AWSAzureBigpandaCloud-Native ApplicationsDatadogDnsDockerGitHTTPKubernetesShell ScriptingTcp/IpUnix
Fintech • Analytics
The Site Reliability Engineer will support and automate critical Real Time applications, ensuring service availability and quality across cloud and on-premise deployments, while also collaborating with various teams on operational documentation and incident management.
Top Skills:
AWSAzureDatadogDockerGitKubernetesPythonUnix/Linux
Cloud • Software • Database
The Site Reliability Engineer will optimize and scale managed services across cloud providers, automate infrastructure, enhance monitoring, and ensure system reliability.
Top Skills:
AWSAzureBashGCPGrafanaKubernetesLokiMimirPrometheusPython
Information Technology • Consulting
As a Senior Staff Site Reliability Engineer, you will lead the SRE team, advocate best practices, ensure resilience in cloud architecture, and mentor team members.
Top Skills:
ArgocdCircleCIGoogle Cloud PlatformKubernetesPulumiTerraformTypescript
Computer Vision • Machine Learning • Software
As a Site Reliability Engineer, ensure the reliability, performance, and scalability of Ditto's cloud infrastructure by developing observability solutions, leading incident management, and collaborating with product engineering teams.
Top Skills:
AWSAzureCDatadogGCPGoGrafanaHelmJavaKubernetesPrometheusRustTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Artificial Intelligence • Software
As a Site Reliability Engineer at Mercor, you will ensure production reliability, develop SRE function, and collaborate with engineering teams to maintain system performance.
Top Skills:
AWSKubernetesSpaceliftTerraform
Enterprise Web • Information Technology • Software
As a Platform Engineer, you will enhance reliability and performance, design operational processes, and build monitoring systems while collaborating with a talented team.
Top Skills:
AIAssistantsBackendDeveloper ToolsFrontendInfrastructureMcpsMonitoring SystemsSkills
Financial Services
The Staff Site Reliability Engineer will lead Platform Engineering's SRE efforts by defining technical strategy, overseeing architecture, and enhancing operational excellence through mentorship and governance.
Top Skills:
ArgocdGCPGkeGoKafkaNode.jsPythonTerraform
Aerospace • Artificial Intelligence
The Site Reliability Engineer will architect and manage ground infrastructure for satellite systems, ensuring high availability, automating deployments, and optimizing data management systems.
Top Skills:
AnsibleAWSAzureC++CloudFormationEksElkGCPGrafanaHelmKubernetesPrometheusPythonTerraform
Software
Join a passionate team to enhance reliability and performance of the AI control plane, manage deployments, and respond to production incidents while ensuring service quality for customers.
Top Skills:
Ai Control PlaneDeveloper ToolsInfrastructure
Other • Energy
The Site Reliability Engineer will build and maintain reliable systems on Google Cloud Platform, automate operations, and improve system performance and reliability.
Top Skills:
AirflowBigQueryCloud MonitoringDataflowDatastreamDockerGithub ActionsGitlab CiGoGoogle Cloud PlatformGrafanaIamJavaKubernetesPrometheusPythonTerraform
AdTech • Big Data • Marketing Tech • Software
Responsible for owning and optimizing the Internal Developer Platform, improving reliability, scalability, and usability while supporting engineering teams and standardizing operational processes through automation and best practices.
Top Skills:
ArmAWSAzureBashCloudFormationConsulDockerGithub ActionsHashicorpJenkinsKubernetesLinuxNomadPowershellPythonSplunkSumo LogicTerraformVaultWindows
Fintech • Payments • Financial Services
Build, operate, and scale AWS-based infrastructure using IaC (Terraform), manage EKS and serverless environments, create CI/CD pipelines, implement observability (OpenTelemetry/Prometheus/New Relic), support Postgres/RDS (Aurora), lead incident response and define SRE practices (SLIs/SLOs/error budgets).
Top Skills:
AuroraAWSAws RdsAzureCloudFormationEcsEksGithub ActionsGitlabGoGCPJavaKubernetesNew RelicOpentelemetryOpentofuPostgresPrometheusPythonRubyServerlessTerraformTerragrunt
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
Lead architecture and implementation of reliability platforms and SRE practices for a production SaaS. Build self-service reliability tooling, drive AIOps automation, advance observability (monitoring, tracing, profiling), lead incident response and postmortems, mentor engineers, and embed production readiness across teams to achieve 99.99% uptime.
Top Skills:
AWSAzureContinuous ProfilingDatadogDnsElkGCPGoGrafanaHttp/SKubernetesLoad BalancingOpentelemetryPrometheusPythonTcp/Ip
Software
Operate and improve Accela's cloud-based SaaS platform to ensure availability, performance, security, and scalability. Build automation and tooling, monitor observability and SLOs, participate in incident response and RCA, support deployments and change management, and help maintain compliance for regulated environments.
Top Skills:
AnsibleArgo CdBashClaude CodeFluxGitGitGithub CopilotKubernetesLinuxAzureOpentelemetryPowershellPythonTerraform
AdTech • Beauty • Marketing Tech • Retail • Pharmaceutical
Lead incident response and root cause analysis, maintain platform reliability and performance, implement and improve observability solutions, collaborate with vendor teams, and contribute to continuous improvement of incident management and operational processes.
Top Skills:
DatabricksGrafanaPrometheusSpyglass
Legal Tech • Software
Lead Site Reliability Engineer responsible for platform availability and reliability of RelativityOne. Drive SRE best practices, build tools, lead projects, coach SREs, work with stakeholders, support incidents, run postmortems, and improve monitoring, automation, and operational efficiency.
Top Skills:
Ci/CdDevOpsJenkinsJIRAKubernetesAzureMonitoring And AlertingNew RelicNoSQLPowershellRelativity ServerRelativityoneSQLTableau
Information Technology • Insurance • Software
Responsible for the reliability and performance of production services, managing SLIs and SLOs, and leading incident responses while collaborating with various teams.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results







.jpg)













.png)












