Top Site Reliability Engineer Jobs

24 Days AgoSaved
Easy Apply
Remote
USA
Easy Apply
218K-257K Annually
Senior level
218K-257K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, monitoring, and incident response for AI infrastructure; build automation and CI/CD tooling; manage Kubernetes/Docker production workloads; partner with infrastructure, security, and compliance; improve observability and documentation; develop internal full‑stack tooling in Go or Python.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxLog AggregationNetwork SecurityPuppetPythonRubySaltTerraform
Reposted 24 Days AgoSaved
Hybrid
O'Fallon, MO, USA
122K-207K Annually
Senior level
122K-207K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Lead Site Reliability Engineer responsible for platform stability, mentoring, and improving application performance through automation, configuration management, and operational readiness.
Top Skills: GoJavaPythonSpring Framework
25 Days AgoSaved
Easy Apply
In-Office
Long Beach, CA, USA
Easy Apply
140K-214K Annually
Senior level
140K-214K Annually
Senior level
Aerospace • Hardware • Robotics • Software • Manufacturing
Design, implement, and maintain a scalable SRE/DevOps platform across cloud and on-prem sites. Ensure uptime, automate deployments with IaC, define SLOs, leverage configuration management, and partner with development and manufacturing teams to increase automation and reliability.
Top Skills: GCPHelmKubernetesTerraform
Reposted 2 Days AgoSaved
In-Office
New York City, NY, USA
237K-369K Annually
Senior level
237K-369K Annually
Senior level
Artificial Intelligence • Legal Tech • Software
As a Senior Site Reliability Engineer, you'll operate foundational platform services, enhance reliability standards, automate processes, and work with engineering teams to improve systems.
Top Skills: Cloud InfrastructureKubernetesObservability Tools
Reposted 2 Days AgoSaved
Remote or Hybrid
Salt Lake City, UT, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Lead reliability, scalability, and production operations for a greenfield enterprise application. Influence design for production readiness, own incident response, define SLIs/SLOs, build observability and automation, enhance CI/CD, and improve developer experience across infrastructure and application stacks.
Top Skills: AWSChatgptClaudeCopilotDockerElasticsearchGithub ActionsGoGrafanaKubernetesOpensearchOpsgeniePrometheusSpring Boot
Reposted 2 Days AgoSaved
Remote or Hybrid
United States
190K-235K Annually
Senior level
190K-235K Annually
Senior level
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills: Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 4 Days AgoSaved
In-Office
New York, NY, USA
160K-230K Annually
Senior level
160K-230K Annually
Senior level
Artificial Intelligence • Information Technology • Software
The role involves designing and managing multi-cloud infrastructure, implementing CI/CD pipelines, ensuring platform reliability, scalability, and security, while optimizing performance for a SaaS platform used by enterprise customers.
Top Skills: ArgoAWSAzureDatadogDockerGCPGithub ActionsGoKubernetesPythonTerraform
Reposted 4 Days AgoSaved
Easy Apply
Remote or Hybrid
9 Locations
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Reposted 4 Days AgoSaved
Hybrid
New York City, NY, USA
205K-225K Annually
Senior level
205K-225K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Social Impact • Analytics • Financial Services • Automation
As a Senior SRE, you'll ensure reliable and scalable systems, develop observability solutions and infrastructure as code, and lead incident response efforts.
Top Skills: AWSCloudFormationDatadogElkPrometheusTerraform
Reposted 58 Minutes AgoSaved
In-Office
Bedford, NH, USA
Senior level
Senior level
Agency • Marketing Tech • Software • Consulting
Lead and maintain performance, security, and reliability of client hosting environments across multi-cloud platforms. Architect resilient infrastructure, manage IaC and CI/CD, administer Windows/IIS and WP Engine environments, handle SSL/DNS/SSO, participate in on-call rotations, and engage with clients as senior escalation and trusted advisor.
Top Skills: App ServicesApplication InsightsAWSAzureAzure DevopsAzure SqlCi/CdDnsEc2IamIisKey VaultsPowershellRdsS3Ssl/TlsSsoTerraformVulnerability ManagementWindows ServerWp Engine
2 Hours AgoSaved
In-Office
Charlotte, NC, USA
79K-128K Annually
Senior level
79K-128K Annually
Senior level
Fintech • Insurance • Financial Services
Provide SRE support for platform-level applications: incident management, performance/availability monitoring, root cause analysis, automation to reduce toil, disaster recovery participation, and technical leadership for reliability improvements.
Top Skills: AWSAzureSQL ServerWindows
5 Days AgoSaved
Hybrid
Reston, VA, USA
Senior level
Senior level
Digital Media • Information Technology • News + Entertainment
Responsible for ensuring reliability, scalability, and performance of data platforms: monitoring, incident response, automation, performance tuning, capacity planning, security/compliance, documentation, and cross-team collaboration to support large-scale data pipelines and backend data systems.
Top Skills: AerospikeAnsibleAWSAws S3AzureCassandraCi/CdContainerizationDockerElk StackGCPGoGrafanaHadoopHdfsJavaKafkaKubernetesMicroservicesMySQLNoSQLPostgresPrometheusPythonScalaSnowflakeSparkTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
5 Days AgoSaved
Hybrid
Chicago, IL, USA
118K-176K Annually
Senior level
118K-176K Annually
Senior level
Digital Media • Information Technology • News + Entertainment
Responsible for ensuring reliability, scalability, and performance of data platforms. Design monitoring and alerting, automate deployments and recovery, optimize storage and query performance, troubleshoot incidents, plan capacity and scaling, document operations, enforce security/compliance, and collaborate with data engineering, product, and data science teams to maintain high availability of large-scale data systems.
Top Skills: AnsibleAWSAzureCi/CdDockerElk StackGCPGoGrafanaJavaKubernetesMySQLNoSQLPostgresPrometheusPythonScalaTerraform
14 Hours AgoSaved
In-Office
Palo Alto, CA, USA
165K-190K Annually
Mid level
165K-190K Annually
Mid level
Cybersecurity
Ensure reliability, scalability, observability, and cost efficiency of a customer-facing SaaS security platform. Manage Kubernetes/Helm deployments, CI/CD (GitLab/ArgoCD), monitoring, and service verification. Embed with engineering teams, optimize developer CI/CD workflows, monitor and debug production on AWS/GCP, and participate in a 24/7 on-call rotation.
Top Skills: ArgocdAWSGCPGitlab Ci/CdGrafanaHelmKubernetesMicroservicesPrometheus
21 Hours AgoSaved
In-Office
Dallas, TX, USA
Expert/Leader
Expert/Leader
Information Technology
Design and architect highly available OSS/BSS and mainframe systems using SRE principles. Lead reliability, observability, automation, disaster recovery, incident management, and cross-functional transformations across hybrid cloud and on‑prem environments for telecom operations.
Top Skills: AppdynamicsCi/CdCicsDb2DevOpsDynatraceGrafanaHybrid CloudIbm MainframeIbm NetcoolIbm Z/OsImsInfrastructure As CodeInstanaJclLinuxSolarisSplunkSreTelcordia FacsTelcordia SoacTelcordia SwitchTelcordia TirksTelcordia WfaVsam
21 Hours AgoSaved
In-Office
San Francisco, CA, USA
Mid level
Mid level
Artificial Intelligence • Big Data • Information Technology • Software • Analytics
Own reliability for a live fleet of Linux-based edge sensors and cloud infrastructure. Triage and recover field hardware, perform SSH-based diagnostics, build fleet management and OTA systems, implement observability and alerting, automate operational tasks, develop runbooks, and participate in on-call rotations to prevent and resolve incidents.
Top Skills: AWSBashCDnsDockerFirewallsGoIamKubernetesLinuxPythonRustSshVpn
Reposted 21 Hours AgoSaved
Hybrid
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software
Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.
Top Skills: BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript
Reposted 21 Hours AgoSaved
In-Office
Aurora, CO, USA
87K-198K Annually
Senior level
87K-198K Annually
Senior level
Information Technology
As a Senior Site Reliability Engineer, you'll enhance system resilience, automate tasks, and improve infrastructure for the Intelligence Community. You'll need significant Linux experience and programming knowledge.
Top Skills: ConfluenceDockerGitGoHpJavaJenkinsJIRAKubernetesLinuxNessusPackerPythonRust
Reposted 21 Hours AgoSaved
In-Office
Washington, DC, USA
Mid level
Mid level
Aerospace • Defense • Manufacturing
As Lead Site Reliability Engineer, you'll ensure reliability and performance of AI infrastructure, manage deployments, and mentor junior engineers.
Top Skills: AnsibleBmcCi/CdCudaIdracImpiKubernetesLinuxNvidia GpusOpenshiftTerraform
Reposted 21 Hours AgoSaved
In-Office
Boston, MA, USA
Senior level
Senior level
Hardware • Quantum Computing
Maintain and integrate hardware and software systems for quantum controls, manage lab and test infrastructure (HIL, K8s, networking, rack servers), automate provisioning and CI/CD, implement monitoring/alerting and observability, support incident response and root-cause analysis, and define operational procedures to ensure reliability across development and production environments.
Top Skills: AnsibleBashDebianDhcpDnsDockerElk StackGitGitlab CiGoGrafanaHardware-In-The-Loop (Hil)JenkinsKubernetesLanPrometheusPythonRack Mount ServersRed HatRoutersSwitchesTcp/IpTerraformUbuntuVlanWanWindows
Reposted 21 Hours AgoSaved
In-Office
San Francisco, CA, USA
230K-310K Annually
Senior level
230K-310K Annually
Senior level
Artificial Intelligence • Software
Own the reliability and performance of backend systems at Gamma, building automation and tooling while leading incident response and improving system stability.
Top Skills: AWSCloudFormationDockerGoKafkaKubernetesNode.jsPythonTerraformTypescript
Reposted 21 Hours AgoSaved
Remote or Hybrid
2 Locations
250K-295K Annually
Senior level
250K-295K Annually
Senior level
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills: ClickhouseGoPostgresPythonTypescript
Reposted 21 Hours AgoSaved
Remote
United States
115K-135K Annually
Mid level
115K-135K Annually
Mid level
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Reposted 21 Hours AgoSaved
In-Office
Raleigh, NC, USA
Senior level
Senior level
Software
The role involves ensuring reliable SIP connectivity, conducting interoperability testing, troubleshooting SIP issues, and automating operations tasks, while mentoring junior staff.
Top Skills: AnsibleBashEmpirixHepicLinuxPythonRtpSbcsSdpsSipSippVoipWireshark
Reposted 21 Hours AgoSaved
Remote
2 Locations
Senior level
Senior level
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account