Top Site Reliability Engineer Jobs

5 Hours AgoSaved
Hybrid
Oak Brook, IL, USA
103K-193K Annually
Senior level
103K-193K Annually
Senior level
Automotive • Hardware • Internet of Things • Mobile • Software • App development • PropTech
Design, implement, and optimize global cloud infrastructure and platforms for an IoT service. Lead platform improvement initiatives, automate infrastructure (IaC/GitOps), ensure observability and security, troubleshoot incidents, mentor SRE team members, and collaborate with executives, architects, and security stakeholders to execute the infrastructure roadmap.
Top Skills: Active DirectoryArgocdAWSBashDatadogEdge FirewallsGitopsGoGrafanaIacKubernetesLinuxNew RelicPowershellPrometheusPythonSIEMTerraformVpcWindows
Reposted 20 Hours AgoSaved
Hybrid
6 Locations
113K-188K Annually
Senior level
113K-188K Annually
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Staff Site Reliability Engineer will lead reliability strategies, manage high-risk initiatives, and enhance engineering standards while ensuring system reliability and operational excellence within a hybrid work environment.
Top Skills: BashCi/CdDatabase ArchitectureGoGoogle Cloud PlatformInfrastructure-As-CodeKubernetesMonitoring PlatformsPulumiPythonTerraform
Reposted 20 Hours AgoSaved
Hybrid
North Bergen, NJ, USA
103K-185K Annually
Senior level
103K-185K Annually
Senior level
eCommerce • Fashion • Retail • Sales • Wearables • Design
The Lead Site Reliability Engineer is responsible for ensuring system reliability, uptime, and performance across Tapestry brands. They will develop tools and automation, oversee monitoring solutions, and implement SRE practices to promote reliability. Additionally, the role involves production support and collaboration across engineering teams to manage the Salesforce Commerce Cloud platform effectively.
Top Skills: AppdynamicsAWSAzureBlue TriangleConfluenceGCPJavaJIRANode.jsPythonQuantum MetricSplunk
Reposted 20 Hours AgoSaved
Hybrid
2 Locations
184K-275K Annually
Senior level
184K-275K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Staff Engineer will define reliability architecture, automate foundational utilities, develop observability tools, ensure environment integrity, and mentor colleagues.
Top Skills: AnsibleChefDhcpKubernetesLinuxNtpPxe
Reposted 20 Hours AgoSaved
Remote
USA
150K-220K Annually
Senior level
150K-220K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills: AWSBashGoKubernetesPythonSlurmTerraform
Reposted 20 Hours AgoSaved
Easy Apply
Remote or Hybrid
10 Locations
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 20 Hours AgoSaved
Hybrid
Menlo Park, CA, USA
169K-224K Annually
Senior level
169K-224K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech
Lead the design and operation of a fault-tolerant cloud infrastructure, implement infrastructure-as-code, manage Kubernetes reliability, and mentor engineers.
Top Skills: AnsibleAWSAzureBashCloudFormationDatadogGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpentelemetryPowershellPrometheusPythonTerraform
Reposted YesterdaySaved
Easy Apply
Hybrid
Somerville, MA, USA
Easy Apply
150K-185K Annually
Senior level
150K-185K Annually
Senior level
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills: GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
Reposted YesterdaySaved
Hybrid
Fort Meade, MD, USA
150K-254K Annually
Expert/Leader
150K-254K Annually
Expert/Leader
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Design, build, and maintain scalable, reliable infrastructure by applying software engineering to operations. Improve automation, observability, CI/CD, and security hardening; troubleshoot Linux/Unix systems and networking; reduce operational toil while supporting mission-critical defense systems requiring TS/SCI clearance and polygraph.
Top Skills: DnsGitGithub ActionsGitlab CiGoJavaJenkinsLinuxPythonTcp/IpUnix
Reposted 2 Days AgoSaved
In-Office
New York, NY, USA
100K-250K Annually
Mid level
100K-250K Annually
Mid level
Fintech • Payments • Financial Services
Design, build, and maintain highly reliable, observable production services. Automate operations, performance-tune cloud deployments, own incident response/on-call, mentor engineers, and improve system reliability and scalability.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKubernetesRustTerraform
2 Days AgoSaved
Hybrid
Chicago, IL, USA
85K-130K Annually
Mid level
85K-130K Annually
Mid level
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
Join the SRE team to design, deploy, and operate resilient cloud infrastructure. Recommend solutions, automate workflows, configure Terraform and CI, implement monitoring and alerts, and support developers and users.
Top Skills: AnsibleAurora MysqlAWSAzureBashChefCloudFormationComposerConcourseDataprocDockerGCPGoHipaaHitrustIsoKubernetesPackerPostgresPuppetPythonRubySaltSlackTerraform
Reposted 2 Days AgoSaved
In-Office
New York City, NY, USA
332K-449K Annually
Senior level
332K-449K Annually
Senior level
Artificial Intelligence • Legal Tech • Software
The Staff Site Reliability Engineer will architect reliability strategies, manage observability, and drive operational excellence across teams, particularly in large-scale production systems.
Top Skills: Cloud InfrastructureDistributed Systems
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 2 Days AgoSaved
Hybrid
Lebanon, NJ, USA
100K-143K Annually
Senior level
100K-143K Annually
Senior level
Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Design, automate, and maintain scalable, secure AWS infrastructure and CI/CD pipelines. Lead observability, incident response, and reliability improvements while modernizing mainframe workloads to AWS and collaborating with engineering, security, and data teams.
Top Skills: Ai ToolsAWSCi/CdCloudFormationData AnalyticsDatabasesEvent-Driven ArchitectureMainframe DevelopmentMessaging ServicesObservability And Alerting ToolsTerraform
Reposted 2 Days AgoSaved
Easy Apply
Remote or Hybrid
US
Easy Apply
200K-230K Annually
Senior level
200K-230K Annually
Senior level
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
Reposted 2 Days AgoSaved
Remote
United States
223K-302K Annually
Expert/Leader
223K-302K Annually
Expert/Leader
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
3 Days AgoSaved
In-Office
Richardson, TX, USA
157K-210K Annually
Senior level
157K-210K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Lead SRE responsible for reliability, security, and efficiency of cloud environments for an Enterprise Imaging SaaS platform. Design, develop, automate, and operate cloud infrastructure (GCP/AWS/Azure), implement CI/CD and IaC (Terraform), manage Kubernetes, perform 24x7 on-call support, and automate IAM, monitoring, and security reporting.
Top Skills: AnsibleAWSAzureBashCi/CdCloud LoggingGCPGcp Pub/SubGitGithub ActionsGitlabJenkinsKubernetesNode.jsPingonePowershellPythonSplunkTerraform
3 Days AgoSaved
In-Office
Eden Prairie, MN, USA
131K-195K Annually
Mid level
131K-195K Annually
Mid level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design, develop, and maintain proprietary Optum software and systems with emphasis on Salesforce (Aura, Lightning, Apex). Ensure security, compliance, performance, scalability, and accessibility. Implement data models, custom objects, triggers, batch automation, and REST integrations. Use GIT workflows, troubleshoot incidents, support post-deployment, and collaborate with users and QA to meet SLAs and project protocols.
Top Skills: AlmApexApex RestAura ComponentsGitJIRARest ApisSalesforceSalesforce Lightning ExperienceServicenow
3 Days AgoSaved
Remote or Hybrid
United States
200K-250K Annually
Senior level
200K-250K Annually
Senior level
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Lead long-term strategy and architecture for cloud and on‑prem platform infrastructure, driving Kubernetes and multi‑cloud reliability, IaC/GitOps automation, observability, SLO/SLI/error‑budget practices, incident leadership, AI‑augmented tooling adoption, and mentorship of senior engineers to improve platform resilience and developer experience.
Top Skills: Amazon Elastic Kubernetes Service (Eks)AutoscalingAWSCapacity PlanningCi/CdGitopsGoGoogle Cloud PlatformGoogle Kubernetes Engine (Gke)Identity And Access ManagementInfrastructure As CodeKubernetesLinuxNetworkingObservabilityOperatorsPulumiPythonRke2StorageTerraform
Reposted 3 Days AgoSaved
Easy Apply
Hybrid
New York, NY, USA
Easy Apply
187K-240K Annually
Senior level
187K-240K Annually
Senior level
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
The role involves developing AI-assisted product experiences for Datadog by building systems for chat, remediations, and codefixes, alongside collaboration with cross-functional teams to enhance user outcomes.
Top Skills: Ai Coding ToolsGoKubernetesLlm-Based Systems
Reposted 4 Days AgoSaved
In-Office
6 Locations
125K-350K Annually
Mid level
125K-350K Annually
Mid level
Information Technology • Software • Financial Services • Quantitative Trading
The Site Reliability Engineer will provide support and diagnose issues within a real-time, distributed environment, focusing on large-scale application and infrastructure management, with basic required skills in UNIX/Linux, networking, SQL, and scripting languages.
Top Skills: BashPythonSQLTcp/IpUdpUnix/Linux
Reposted 4 Days AgoSaved
Remote or Hybrid
4 Locations
175K-175K Annually
Senior level
175K-175K Annually
Senior level
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills: AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
5 Days AgoSaved
Hybrid
Reston, VA, USA
Mid level
Mid level
Digital Media • Information Technology • News + Entertainment
Own and support production infrastructure for FreeWheel's Streaming Hub. Design and implement cloud and Kubernetes infrastructure, automate with Terraform/Ansible and Python/Go, troubleshoot incidents, improve observability and CI/CD, and ensure platform reliability and scalability during high-traffic streaming events.
Top Skills: Amazon EksAnsibleAWSDockerGo (Golang)IamJenkinsKubernetesLoad BalancerOracle Cloud Infrastructure (Oci)PythonRoute 53Security GroupsTerraformVpc
5 Days AgoSaved
Hybrid
Reston, VA, USA
Expert/Leader
Expert/Leader
Digital Media • Information Technology • News + Entertainment
Lead Data SRE responsible for ensuring reliability, scalability, and performance of data platforms. Design monitoring, automate operations, optimize performance, respond to incidents, plan capacity, enforce security/compliance, document systems, and collaborate with engineering, data science, and product teams.
Top Skills: AerospikeAnsibleApache KafkaAWSAws S3AzureCassandraContainerizationDockerElk StackGCPGoGrafanaHadoopHdfsJavaKubernetesMicroservicesMySQLNoSQLPostgresPrometheusPythonScalaSnowflakeSparkTerraform
5 Days AgoSaved
Hybrid
Reston, VA, USA
Mid level
Mid level
Digital Media • Information Technology • News + Entertainment
Own and support production infrastructure for a high-scale streaming advertising platform. Design and implement cloud and Kubernetes infrastructure, automate with IaC and scripting, troubleshoot incidents, ensure reliability and performance, collaborate with engineering and offshore operations, and support live events and traffic spikes.
Top Skills: Amazon EksAnsibleAWSDockerGoIamJenkinsKubernetesLoad BalancerNetworkingOracle Cloud InfrastructurePythonRoute 53Security GroupsTerraformVpc
Reposted 5 Days AgoSaved
Hybrid
O'Fallon, MO, USA
Mid level
Mid level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Lead Site Reliability Engineer will ensure reliability, scalability, and performance of Mastercard's applications, enhancing operational practices and developer collaboration in a proactive environment.
Top Skills: Ci/CdDevOpsGoJavaPythonSpring Framework
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account