Get the job you really want.

Top Senior Site Reliability Engineer Jobs

Reposted 14 Days AgoSaved
Easy Apply
In-Office
San Francisco, CA, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
As a Staff Software Engineer - SRE, you'll manage cloud infrastructure, improve application reliability, collaborate across teams, and support back-office systems.
Top Skills: AWSDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRdsRedshiftShell/BashSparkTerraform
Reposted 14 Days AgoSaved
Easy Apply
In-Office
Los Angeles, CA, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
Manage and support customer applications, improve system reliability, collaborate with teams on infrastructure needs, and help drive architectural decisions.
Top Skills: Auto ScalingAWSCdnsDatadogDnsDockerKafkaKibanaKubernetesLinuxLoad BalancersPostgresProxy ServersPythonRdsRedshiftShell/BashSparkTerraformWafs
Reposted 14 Days AgoSaved
Easy Apply
In-Office
New York, NY, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
The Staff SRE DevOps Engineer will manage customer applications, improve system reliability, collaborate on architecture discussions, and support infrastructure needs across teams.
Top Skills: AWSBashDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRedshiftSparkTerraform
Reposted 14 Days AgoSaved
In-Office or Remote
San Diego, CA, USA
207K-261K Annually
Senior level
207K-261K Annually
Senior level
Information Technology
Lead technical strategy for observability, operational intelligence, and reliability. Architect telemetry and automation platforms, drive AIOps and large-scale IaC, lead incident response, mentor senior engineers, and standardize SLO/SLI and reliability practices across AWS cloud-native environments.
Top Skills: AlbAws (VpcBashCloudFormationDatadogDnsDynamoDBEc2EcsEksGitopsGoGrafanaIamKmsKubernetesLinuxMulti-Account Architectures)New RelicNlbOpentelemetryPolicy-As-CodePrometheusPythonRdsRoute 53S3Tcp/IpTerraformTls
Reposted 14 Days AgoSaved
Remote or Hybrid
US
132K-195K Annually
Senior level
132K-195K Annually
Senior level
Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Natural Language Processing • Software • Cybersecurity
Maintain and improve the internal developer platform, observability stack, and AWS infrastructure (Terraform); manage Kubernetes at scale; troubleshoot distributed systems; drive security, reliability, cost and performance improvements; partner with product teams and participate in on-call support.
Top Skills: AWSCkaContainersGoKubernetesLgtm StackLinuxOpensearchPythonServerlessTcp/IpTerraform
15 Days AgoSaved
In-Office
San Francisco, CA, USA
152K-253K Annually
Senior level
152K-253K Annually
Senior level
Cloud • Security • Software • Cybersecurity
Design and maintain reliable infrastructure solutions for a cloud data protection platform. Ensure application scalability and support through CI/CD and monitoring tools while collaborating in a global team.
Top Skills: AppinsightsAws CloudformationAzure Api ManagementAzure Arm TemplatesAzure Cosmos DbAzure DevopsAzure Entra IdAzure FunctionsAzure MonitorAzure Storage ServicesBashBitbucketElastic StackGitGoMicrosoft TfsPowershellPythonServerless FrameworkTerraform
15 Days AgoSaved
In-Office
Secaucus, NJ, USA
111K-130K Annually
Mid level
111K-130K Annually
Mid level
Healthtech • Database
Responsible for reliability engineering, monitoring system performance, automating processes, and collaborating with development teams to enhance operational efficiency.
Top Skills: AWSAzureBashCi/CdCloudFormationDockerDynatraceGCPGoJmeterKubernetesNeoloadPythonSplunkTerraform
Reposted 15 Days AgoSaved
In-Office
New York, NY, USA
200K-250K Annually
Expert/Leader
200K-250K Annually
Expert/Leader
Payments • Software • Automation
Lead platform and infrastructure direction on AWS, evolve CI/CD and ephemeral environments, set observability and SLO standards, drive incident response and postmortems, mentor engineers, and build automation to reduce operational risk.
Top Skills: AWSCi/CdDistributed SystemsEcsEphemeral Environments/Preview DeploysFargateGithub ActionsLogsObservability (MetricsSlos/Slis/Error BudgetsTracing)
Reposted 15 Days AgoSaved
Hybrid
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software
Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.
Top Skills: BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript
Reposted 15 Days AgoSaved
Remote
United States
115K-135K Annually
Mid level
115K-135K Annually
Mid level
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Reposted 15 Days AgoSaved
In-Office
Washington, DC, USA
Mid level
Mid level
Aerospace • Defense • Manufacturing
As Lead Site Reliability Engineer, you'll ensure reliability and performance of AI infrastructure, manage deployments, and mentor junior engineers.
Top Skills: AnsibleBmcCi/CdCudaIdracImpiKubernetesLinuxNvidia GpusOpenshiftTerraform
Reposted 15 Days AgoSaved
In-Office
San Francisco, CA, USA
Mid level
Mid level
Information Technology • Software • Big Data Analytics
The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.
Top Skills: ApacheJava
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 15 Days AgoSaved
Easy Apply
In-Office
Boston, MA, USA
Easy Apply
Senior level
Senior level
Hardware • Quantum Computing
Maintain and integrate hardware and software systems for quantum controls, manage lab and test infrastructure (HIL, K8s, networking, rack servers), automate provisioning and CI/CD, implement monitoring/alerting and observability, support incident response and root-cause analysis, and define operational procedures to ensure reliability across development and production environments.
Top Skills: AnsibleBashDebianDhcpDnsDockerElk StackGitGitlab CiGoGrafanaHardware-In-The-Loop (Hil)JenkinsKubernetesLanPrometheusPythonRack Mount ServersRed HatRoutersSwitchesTcp/IpTerraformUbuntuVlanWanWindows
Reposted 15 Days AgoSaved
In-Office
Honolulu, HI, USA
100K-170K Annually
Junior
100K-170K Annually
Junior
Fintech
The Site Reliability Engineer will manage and optimize Kubernetes clusters and cloud infrastructure, focusing on reliability, monitoring, and automation processes.
Top Skills: AWSAzureC/C++CloudFormationDockerGCPHelmJavaJavaScriptKubernetesLinuxPostgresPythonRubyTerraform
Reposted 15 Days AgoSaved
In-Office or Remote
Los Angeles, CA, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
The Senior Site Reliability Engineer will ensure the reliability and scalability of enterprise applications, lead incident management, develop automation tools, mentor team members, and collaborate with cross-functional teams.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office
Chicago, IL, USA
85K-135K Annually
Mid level
85K-135K Annually
Mid level
Artificial Intelligence • Automotive • Internet of Things • Software
The Site Reliability Engineer will ensure application reliability, performance, and availability, emphasizing incident response and collaboration with development teams.
Top Skills: ActivemqAnsibleAppdynamicsAws LambdaCloudFormationCloudwatchEksGitGitJavaJavaScriptJenkinsJqueryKafkaKubernetesMskMySQLPostgresPythonRabbit MqRest ApisSignalsSpinnakerSQLTerraformVue
Reposted 15 Days AgoSaved
In-Office or Remote
Honolulu, HI, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
As a Senior Software Engineer, you'll ensure the scalability and reliability of enterprise applications, leading incident management, automation, and strategic engineering efforts while mentoring team members.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office or Remote
Portland, OR, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
The Senior Software Engineer - SRE will ensure the reliability and scalability of enterprise applications, handle incident management, and mentor team members, requiring expertise in Java and open-source technologies.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office or Remote
Boston, MA, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
As a Senior Site Reliability Engineer, ensure system reliability and scalability, lead incident management, develop automation tools, and mentor team members.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office or Remote
Boston, MA, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
As a Senior Site Reliability Engineer at Veeva, you will enhance the reliability and scalability of applications, lead incident management, and mentor team members while working with modern technologies.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office or Remote
Bend, OR, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
The role involves ensuring the scalability and reliability of enterprise applications through operational experience in Java environments, incident management, and full-stack diagnostics, in collaboration with cross-functional teams.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
Reposted 15 Days AgoSaved
In-Office or Remote
San Luis Obispo, CA, USA
110K-270K Annually
Senior level
110K-270K Annually
Senior level
Big Data • Cloud • Healthtech • Software • Big Data Analytics
The Senior Site Reliability Engineer will ensure the scalability and reliability of enterprise applications, manage incidents, automate operations, mentor team members, and support cross-team collaborations across a technology stack, primarily focusing on backend development.
Top Skills: AnsibleAWSBashDockerGitGoHibernateJavaKubernetesLinuxMavenMySQLPythonRubyShellSolrSpringTomcatVagrant
16 Days AgoSaved
Remote
United States
208K-330K Annually
Senior level
208K-330K Annually
Senior level
Fintech
The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.
Top Skills: ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript
Reposted 16 Days AgoSaved
In-Office or Remote
Location, WV, USA
Mid level
Mid level
Healthtech • Telehealth
Seeking a Site Reliability Engineer to ensure availability and performance of cloud infrastructure. Responsibilities include observability solutions, incident response, and collaboration with teams to improve reliability and service health.
Top Skills: AnsibleAWSAzureAzure MonitorBashCloudwatchDynatraceElasticGrafanaPowershellPythonTerraform
16 Days AgoSaved
Hybrid
New York, NY, USA
158K-278K Annually
Senior level
158K-278K Annually
Senior level
Artificial Intelligence • Software • Generative AI
As a Staff Site Reliability Engineer, you will enhance system reliability and performance for WRITER's AI platform, utilizing cloud technologies, programming languages, and incident management practices.
Top Skills: AWSAzureDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account