Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Mobile • Software
Site Reliability Engineers will work on production infrastructure, focusing on AWS and Kubernetes while ensuring high availability and customer satisfaction.
Top Skills:
AirflowAWSCircleCICloudwatchEksGrafanaMongoDBPagerdutyPingdomRustScala SparkTerraformTypescript
Hardware • Healthtech • Software • Analytics
The Site Reliability Engineer will ensure high availability of Sage's platform, lead incident response, design reliable systems, and improve operational workflows.
Top Skills:
Amazon Web ServicesDatadogGoGoogle Cloud PlatformGrafanaJavaKubernetesMySQLPostgresPrometheusPulumiPythonTerraform
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.
Top Skills:
Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves managing production infrastructure across multiple cloud providers and Kubernetes, building CI/CD pipelines, ensuring system reliability, and implementing security practices.
Top Skills:
ArgocdFluxGitopsGoGrafanaJaegerKubernetesOpentelemetryPrometheusPulumiTerraform
Machine Learning • Payments • Security • Software • Financial Services
The Technology Engineer - Mainframe Systems at PNC supports and enhances mainframe environments, ensuring system stability and performance, collaborating with various teams, and managing batch processes and file transfers.
Top Skills:
CobolDb2File-AidIbm Mainframe TechnologiesTsoVsam
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves creating scalable solutions using Linux and Kubernetes, troubleshooting performance issues, maintaining security, and writing automation tools.
Top Skills:
AnsibleBashDockerFirewall TechnologiesGoKubernetesKvmLinuxMulti-Factor AuthenticationOpenstackPgpPkiPythonSshUnix
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Lead Site Reliability Engineer will ensure the reliability and performance of Mastercard's applications, mentor junior engineers, and improve service lifecycle through automation and DevOps practices.
Top Skills:
GoJavaPythonSpring Framework
Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.
Top Skills:
Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript
Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
The Lead DevOps Site Reliability Engineer drives automation in software development, manages cloud stacks, supports containerization, and leads response to outages.
Top Skills:
AnsibleAWSBashDockerDynatraceGitlab CiGoGCPHelmJavaJenkinsKubernetesLinuxOciPHPPythonRancherSpringTerraform
Artificial Intelligence • Marketing Tech • Software
Lead technical reliability initiatives across a multi-cloud, multi-region active-active content platform. Architect and evolve core services, observability and logging, automation and capacity planning. Mentor engineers, drive cross-team reliability projects, define standards (IaC, SLOs, on-call) and proactively improve platform scalability and incident outcomes.
Top Skills:
Apache KafkaApache PulsarAWSCassandraChefEksGCPGkeGoGrafana AlloyGrafana LokiKubernetesLinuxNode.jsPrometheusPythonRubyScylladbShell ScriptingTempoTerraformThanos
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Ensure reliability, scalability, and performance of Mastercard applications by implementing observability, automation, CI/CD, and cloud infrastructure best practices. Support production readiness, triage incidents, perform root-cause analysis and blameless post-mortems, mentor developers, and drive operational standards, capacity planning, and risk/compliance activities to maximize service availability and customer experience.
Top Skills:
AWSAzureBashBitbucketCi/CdContainerizationDynatraceGCPGoJenkinsLinux/UnixOrchestrationPcfPythonSplunkXlr
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Define and scale SRE standards across teams, implement SLOs/SLIs/error budgets, build observability and resiliency patterns, drive automation and AIOps, improve reliability for large-scale Azure cloud systems, and influence engineering and platform teams.
Top Skills:
Ai/MlAiopsAutomationAzureError BudgetsIncident ManagementLogsObservability (MetricsOpentelemetrySlisSlosTracing)
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Information Technology • Insurance • Software
The Principal Site Reliability Engineer will lead the enterprise's reliability, scalability, and performance efforts, influencing architecture, managing incidents, and fostering a proactive engineering culture across teams.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactRelational DatabasesWindows
Information Technology • Insurance • Software
The Site Reliability Engineer II ensures system reliability, participates in incident responses, and automates tasks to enhance operational health in production environments.
Top Skills:
.NetAWSC#Ci/Cd PipelinesGitlabJavaJenkinsPythonReact
Big Data • Cloud • Software • Database
Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.
Top Skills:
AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical, second-line oversight of SRE and cloud engineering practices. Perform deep-dive risk analyses of cloud architectures, resiliency, CI/CD, observability, and Gen AI integrations. Produce data-driven risk findings, mitigation recommendations, and executive-facing reports while partnering with first-line engineers and leadership to ensure robust controls and operational reliability.
Top Skills:
AWSAzureCi/CdCloud-NativeContainerizationDatadogElkGCPGenerative AiKubernetesPagerdutyPrometheusSplunk
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Information Technology • Security • Software • Cybersecurity
As an intern, manage operational tasks in classified environments, develop automation tools, create documentation, and enhance services for Zscaler's cloud security platform.
Top Skills:
Aws EcsKubernetesPython
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Security • Software • Cybersecurity • Automation
As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.
Top Skills:
AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical risk advisory for SRE and cloud-native engineering, assess resiliency, SLIs/SLOs, CI/CD, and observability, perform independent risk reviews, drive AI/automation adoption, and deliver executive-facing risk reporting and remediation guidance.
Top Skills:
AutomationAWSAzureCi/CdCloud-Native ArchitecturesContainerizationDatadogElkGCPGen AiObservabilityPagerdutyPrometheusSplunk
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Site Reliability Engineer will design, develop, and support a secure cloud infrastructure while collaborating with development and DevOps teams, ensuring high performance and reliability of systems.
Top Skills:
AWSAzureDynatraceGrafanaKubernetesPrometheusPulumiSplunkTerraform
Big Data • Cloud • Software • Database
The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.
Top Skills:
AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls
Reposted 22 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills:
AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Logistics • Mobile • Productivity • Software • Transportation
The Senior Site Reliability Engineer will manage the reliability of Zello's data tier, contribute to monitoring and incident response while improving cloud infrastructure and database performance.
Top Skills:
BashDockerElasticsearchGoKubernetesLokiMongoDBMySQLPrometheusPythonRedisScylladbTempo
Artificial Intelligence • Cloud • Enterprise Web • Natural Language Processing • Software • App development • Automation
Design and implement large-scale distributed systems that integrate AI safely and reliably, focusing on infrastructure, observability, and security.
Top Skills:
Cloud NetworkingContainersDistributed SystemsEvent Driven RuntimesKedaKnativeKubernetesMulti Cloud ArchitectureOperating SystemsScalability
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Popular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results















.jpeg)











