Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Automotive • Hardware • Internet of Things • Mobile • Software • App development • PropTech
Design, implement, and optimize global cloud infrastructure and platforms for an IoT service. Lead platform improvement initiatives, automate infrastructure (IaC/GitOps), ensure observability and security, troubleshoot incidents, mentor SRE team members, and collaborate with executives, architects, and security stakeholders to execute the infrastructure roadmap.
Top Skills:
Active DirectoryArgocdAWSBashDatadogEdge FirewallsGitopsGoGrafanaIacKubernetesLinuxNew RelicPowershellPrometheusPythonSIEMTerraformVpcWindows
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Staff Site Reliability Engineer will lead reliability strategies, manage high-risk initiatives, and enhance engineering standards while ensuring system reliability and operational excellence within a hybrid work environment.
Top Skills:
BashCi/CdDatabase ArchitectureGoGoogle Cloud PlatformInfrastructure-As-CodeKubernetesMonitoring PlatformsPulumiPythonTerraform
eCommerce • Fashion • Retail • Sales • Wearables • Design
The Lead Site Reliability Engineer is responsible for ensuring system reliability, uptime, and performance across Tapestry brands. They will develop tools and automation, oversee monitoring solutions, and implement SRE practices to promote reliability. Additionally, the role involves production support and collaboration across engineering teams to manage the Salesforce Commerce Cloud platform effectively.
Top Skills:
AppdynamicsAWSAzureBlue TriangleConfluenceGCPJavaJIRANode.jsPythonQuantum MetricSplunk
Reposted 20 Hours AgoSaved
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Staff Engineer will define reliability architecture, automate foundational utilities, develop observability tools, ensure environment integrity, and mentor colleagues.
Top Skills:
AnsibleChefDhcpKubernetesLinuxNtpPxe
Reposted 20 Hours AgoSaved
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills:
AWSBashGoKubernetesPythonSlurmTerraform
Reposted 20 Hours AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech
Lead the design and operation of a fault-tolerant cloud infrastructure, implement infrastructure-as-code, manage Kubernetes reliability, and mentor engineers.
Top Skills:
AnsibleAWSAzureBashCloudFormationDatadogGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpentelemetryPowershellPrometheusPythonTerraform
Enterprise Web • Hardware • Internet of Things • Software
The Senior Site Reliability Engineer will mentor teams on observability practices, architect systems for growth, automate developer tasks, and debug production issues.
Top Skills:
GoKubernetesLgtm StackOpentelemetryPrometheusTypescript
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Design, build, and maintain scalable, reliable infrastructure by applying software engineering to operations. Improve automation, observability, CI/CD, and security hardening; troubleshoot Linux/Unix systems and networking; reduce operational toil while supporting mission-critical defense systems requiring TS/SCI clearance and polygraph.
Top Skills:
DnsGitGithub ActionsGitlab CiGoJavaJenkinsLinuxPythonTcp/IpUnix
Fintech • Payments • Financial Services
Design, build, and maintain highly reliable, observable production services. Automate operations, performance-tune cloud deployments, own incident response/on-call, mentor engineers, and improve system reliability and scalability.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKubernetesRustTerraform
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
Join the SRE team to design, deploy, and operate resilient cloud infrastructure. Recommend solutions, automate workflows, configure Terraform and CI, implement monitoring and alerts, and support developers and users.
Top Skills:
AnsibleAurora MysqlAWSAzureBashChefCloudFormationComposerConcourseDataprocDockerGCPGoHipaaHitrustIsoKubernetesPackerPostgresPuppetPythonRubySaltSlackTerraform
Artificial Intelligence • Legal Tech • Software
The Staff Site Reliability Engineer will architect reliability strategies, manage observability, and drive operational excellence across teams, particularly in large-scale production systems.
Top Skills:
Cloud InfrastructureDistributed Systems
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Reposted 2 Days AgoSaved
Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Design, automate, and maintain scalable, secure AWS infrastructure and CI/CD pipelines. Lead observability, incident response, and reliability improvements while modernizing mainframe workloads to AWS and collaborating with engineering, security, and data teams.
Top Skills:
Ai ToolsAWSCi/CdCloudFormationData AnalyticsDatabasesEvent-Driven ArchitectureMainframe DevelopmentMessaging ServicesObservability And Alerting ToolsTerraform
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills:
Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills:
Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Lead SRE responsible for reliability, security, and efficiency of cloud environments for an Enterprise Imaging SaaS platform. Design, develop, automate, and operate cloud infrastructure (GCP/AWS/Azure), implement CI/CD and IaC (Terraform), manage Kubernetes, perform 24x7 on-call support, and automate IAM, monitoring, and security reporting.
Top Skills:
AnsibleAWSAzureBashCi/CdCloud LoggingGCPGcp Pub/SubGitGithub ActionsGitlabJenkinsKubernetesNode.jsPingonePowershellPythonSplunkTerraform
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design, develop, and maintain proprietary Optum software and systems with emphasis on Salesforce (Aura, Lightning, Apex). Ensure security, compliance, performance, scalability, and accessibility. Implement data models, custom objects, triggers, batch automation, and REST integrations. Use GIT workflows, troubleshoot incidents, support post-deployment, and collaborate with users and QA to meet SLAs and project protocols.
Top Skills:
AlmApexApex RestAura ComponentsGitJIRARest ApisSalesforceSalesforce Lightning ExperienceServicenow
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Lead long-term strategy and architecture for cloud and on‑prem platform infrastructure, driving Kubernetes and multi‑cloud reliability, IaC/GitOps automation, observability, SLO/SLI/error‑budget practices, incident leadership, AI‑augmented tooling adoption, and mentorship of senior engineers to improve platform resilience and developer experience.
Top Skills:
Amazon Elastic Kubernetes Service (Eks)AutoscalingAWSCapacity PlanningCi/CdGitopsGoGoogle Cloud PlatformGoogle Kubernetes Engine (Gke)Identity And Access ManagementInfrastructure As CodeKubernetesLinuxNetworkingObservabilityOperatorsPulumiPythonRke2StorageTerraform
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
The role involves developing AI-assisted product experiences for Datadog by building systems for chat, remediations, and codefixes, alongside collaboration with cross-functional teams to enhance user outcomes.
Top Skills:
Ai Coding ToolsGoKubernetesLlm-Based Systems
Information Technology • Software • Financial Services • Quantitative Trading
The Site Reliability Engineer will provide support and diagnose issues within a real-time, distributed environment, focusing on large-scale application and infrastructure management, with basic required skills in UNIX/Linux, networking, SQL, and scripting languages.
Top Skills:
BashPythonSQLTcp/IpUdpUnix/Linux
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Digital Media • Information Technology • News + Entertainment
Own and support production infrastructure for FreeWheel's Streaming Hub. Design and implement cloud and Kubernetes infrastructure, automate with Terraform/Ansible and Python/Go, troubleshoot incidents, improve observability and CI/CD, and ensure platform reliability and scalability during high-traffic streaming events.
Top Skills:
Amazon EksAnsibleAWSDockerGo (Golang)IamJenkinsKubernetesLoad BalancerOracle Cloud Infrastructure (Oci)PythonRoute 53Security GroupsTerraformVpc
Digital Media • Information Technology • News + Entertainment
Lead Data SRE responsible for ensuring reliability, scalability, and performance of data platforms. Design monitoring, automate operations, optimize performance, respond to incidents, plan capacity, enforce security/compliance, document systems, and collaborate with engineering, data science, and product teams.
Top Skills:
AerospikeAnsibleApache KafkaAWSAws S3AzureCassandraContainerizationDockerElk StackGCPGoGrafanaHadoopHdfsJavaKubernetesMicroservicesMySQLNoSQLPostgresPrometheusPythonScalaSnowflakeSparkTerraform
Digital Media • Information Technology • News + Entertainment
Own and support production infrastructure for a high-scale streaming advertising platform. Design and implement cloud and Kubernetes infrastructure, automate with IaC and scripting, troubleshoot incidents, ensure reliability and performance, collaborate with engineering and offshore operations, and support live events and traffic spikes.
Top Skills:
Amazon EksAnsibleAWSDockerGoIamJenkinsKubernetesLoad BalancerNetworkingOracle Cloud InfrastructurePythonRoute 53Security GroupsTerraformVpc
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Lead Site Reliability Engineer will ensure reliability, scalability, and performance of Mastercard's applications, enhancing operational practices and developer collaboration in a proactive environment.
Top Skills:
Ci/CdDevOpsGoJavaPythonSpring Framework
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results






.png)

























