Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Reposted 23 Hours AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will manage and improve Wikimedia's infrastructure, handle operational tasks, automate processes, and provide mentorship while participating in a 24/7 on-call rotation.
Top Skills:
AnsibleBashDebianGoGrafanaHhvmKubernetesMemcachedPHPPrometheusPuppetPythonRedisRuby
Artificial Intelligence • Software
As a Senior SRE, you'll enhance data infrastructure, optimize performance, build reliability, automate processes, and manage incident responses while supporting enterprise clients' uptime requirements.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Artificial Intelligence • Marketing Tech • Software • Big Data Analytics
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, improve system reliability, manage CI/CD pipelines, and collaborate across teams for operational excellence.
Top Skills:
AnsibleArgocdAWSBashDatadogDockerElkGithub ActionsGrafanaKubernetesLinuxOpentelemetryPrometheusPythonTerraform
AdTech • Marketing Tech • Design
As a Senior Site Reliability Engineer, you will manage and improve infrastructure reliability, build cloud components, enhance developer experience, support ML infrastructure, and ensure security compliance, targeting high performance and uptime for Vibe’s streaming platform.
Top Skills:
Ci/CdGoObservabilityPythonTerraform
Information Technology
The Senior Site Reliability Engineer is responsible for architecting reliability strategies, implementing SRE frameworks, mentoring engineers, and ensuring system resilience and performance in government systems.
Top Skills:
Cloud ArchitectureDevsecopsGoInfrastructure As Code (Iac)JavaKubernetesLinuxNist 800-53PythonRmf
Information Technology • Security • Software • Consulting
The Senior Site Reliability Engineer will manage AWS infrastructure, deploy Kubernetes workloads, build CI/CD pipelines, debug production issues, and collaborate across teams to improve processes and standards.
Top Skills:
AWSCi/CdGithub ActionsGitlab CiJenkinsKubernetesPostgresTerraform
Cloud • Security • Software • Cybersecurity
Design and maintain reliable infrastructure solutions for a cloud data protection platform. Ensure application scalability and support through CI/CD and monitoring tools while collaborating in a global team.
Top Skills:
AppinsightsAws CloudformationAzure Api ManagementAzure Arm TemplatesAzure Cosmos DbAzure DevopsAzure Entra IdAzure FunctionsAzure MonitorAzure Storage ServicesBashBitbucketElastic StackGitGoMicrosoft TfsPowershellPythonServerless FrameworkTerraform
Cloud • Software
Design, implement, and support Kubernetes and compute platforms in a private cloud. Oversee architecture and standardization across hardware, OS, and cloud orchestration.
Top Skills:
AnsibleBashCi/CdHelmKubernetesLinuxOpenstackPythonTerraformUbuntu
Information Technology • Software • Cryptocurrency • Web3
The Senior Site Reliability Engineer will design, build, and manage Azure infrastructure for HashSphere, ensuring secure and scalable deployments while enhancing system reliability and operational excellence in partnership with cross-functional teams.
Top Skills:
ArgoAzureGoGrafanaKubernetesPrometheusPythonSpaceliftTerraform
Big Data • Cloud • Marketing Tech • Social Impact • Software
The Senior Site Reliability Engineer will support global product deployments, provide 24/7 operational support, maintain CI/CD tooling, and optimize system performance. They will utilize their SRE practices and leadership abilities to improve product reliability and guide other engineers.
Top Skills:
AWSCassandraCircleCIDynamoDBGCPGoJenkinsKubernetesNosql DatabasesPythonScylladbSinglestore DbTerraform
Big Data • Cloud • Marketing Tech • Social Impact • Software
As a Senior Site Reliability Engineer, you will manage global product deployments, provide operational support, enhance CI/CD tooling, and optimize system performance, collaborating closely with distributed engineering teams.
Top Skills:
AWSCassandraCircleCIDynamoDBGCPGoJenkinsKubernetesNosql DatabasesPythonScylladbSinglestore DbTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Own and scale a multi-tenant CI-as-a-Service platform (GitLab CI and GitHub Actions) and Kubernetes substrate. Ensure reliability, autoscaling, SLO-driven capacity, observability, self-service pipelines, and developer experience across GPU/ARM/CPU build and test workloads.
Top Skills:
AnsibleArcArgo CdBashCluster-AutoscalerContainerizationDockerElkFluxGithub ActionsGitlab CiGitlab RunnerGoGrafanaHelmHpaIngressKubernetesLinuxLokiNetwork PoliciesOpentelemetryPrometheusPythonRbacService MeshStorage ClassesTerraformVpa
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Software
Design, build, and operate multi-account cloud infrastructure using IaC. Automate customer deployments, manage CI/CD, troubleshoot production across infra/data/app layers, and handle networking, security, and compliance for regulated environments while collaborating with platform and professional services teams.
Top Skills:
AirflowAuth0AWSAzureDbtDockerEcsGCPGithub ActionsLlmsOktaPackerPostgresSnowflakeTailscaleTerraformWireguard
Financial Services
The Site Reliability Engineer III designs secure, scalable technology solutions, ensures operational resiliency, and collaborates with teams to maintain high availability across environments.
Top Skills:
AutomicAWSAzureBambooBigQueryDockerGitGoogle Cloud PlatformGrafanaJavaJIRAKubernetesLinuxOpentelemetryOraclePostgresPrometheusPythonSplunkUc4Unix
Fintech • Financial Services
As a Site Reliability Engineer focusing on frontend performance, you build infrastructure for self-service performance monitoring, optimize AI operations, and architect resilient systems for high-traffic applications.
Top Skills:
Apm InstrumentationAWSCli ToolsDatadogJavaKubernetesRumTypescript
Fintech
The Principal Site Reliability Engineer at Fidelity will enhance system reliability, manage large-scale infrastructures, and automate processes using various technologies.
Top Skills:
AnsibleAWSCi/CdDatadogGrafanaJenkinsPythonTerraformYugabyte
Artificial Intelligence • Software • Generative AI
The Lead Site Reliability Engineer will drive technical strategy, ensure high service availability, manage cloud infrastructure, and lead a team to optimize systems and automate processes.
Top Skills:
AWSAzureDockerGoogle Cloud PlatformKubernetesTerraform
Software
Lead platform reliability for a multi-region SaaS Kubernetes platform: define SLOs/SLAs, build observability, run incident response and on-call, drive reliability improvements, partner with engineering to bake reliability into features.
Top Skills:
Argo CdAWSBashDatadogEc2EksGitopsGoGrafanaIamKargoKubernetesNlbOpentelemetryPrometheusPythonRdsRoute53S3TerraformVpc
Artificial Intelligence • Healthtech • HR Tech • Software
Own the Heroku-to-GCP migration, maintain Postgres and data pipelines, optimize high‑traffic code paths, build monitoring/alerting, lead incident response and post‑mortems, reduce costs and scale proactively, and coach other infrastructure engineers.
Top Skills:
AppsignalBigQueryBugsnagCannyClaude CodeFivetranGoogle Cloud PlatformHerokuHexHotwireInfrastructure-As-CodePostgresRuby On Rails
24 Days AgoSaved
Fintech • Information Technology • Software • Financial Services
Design, build, and maintain real-time, secure distributed systems and observability UIs/APIs. Implement CI/CD, containerized deployments (Docker/Kubernetes/OpenShift), integrate observability stack (Elasticsearch/Logstash/Grafana), and apply secure coding and API security standards to ensure reliability, performance, and incident automation. Collaborate in Agile teams and explore AI to improve resiliency.
Top Skills:
Agentic AiCi/CdDockerElasticsearchGrafanaJava Spring BootKafkaKubernetesLogstashMariadbNode.jsOauth2OpenshiftReactSecrets Management
Information Technology • Consulting
Design, build, customize, and support Oracle E-Business Suite and related financial applications. Gather requirements, implement EBS customizations and integrations, optimize performance and SQL, perform testing and upgrades, and provide tier-3 support and issue resolution with Oracle Support.
Top Skills:
Invoice AutomationOracle CloudOracle E-Business SuiteOracle Ebs ApisOracle FinancialsOracle Integration CloudOracle Supplier Portal CloudSQLWebcenter Content Imaging
Big Data • Fintech • Mobile • Payments • Financial Services • Data Privacy
Design and implement reliability engineering capabilities for Azure platforms: define SLIs/SLOs, drive observability and automation, reduce toil, lead incident triage and remediation, develop reusable IaC and CI/CD patterns, mentor SREs, and ensure operational readiness and resilience.
Top Skills:
AnsibleAzureAzure Log AnalyticsAzure Resource GraphCi/CdDynatraceGo (Golang)IamInfrastructure As Code (Iac)KubernetesPolicy-As-CodePrivate DnsPythonService MeshTerraform
Software
Lead the modernization of AWS cloud infrastructure, implement automation, ensure system reliability, and manage performance with a focus on security and incident response.
Top Skills:
AngularApexAWSC#ElasticacheNew RelicNode.jsNpmPm2PythonRedisShell ScriptingTerraform
Aerospace • Hardware • Software • Defense • Manufacturing
Build scalable automated solutions for device fleet management, own and optimize MDM platforms, write OS-level scripts for self-healing, gather telemetry to prevent end-user disruption, translate compliance (CMMC) into code-managed baselines, and create dashboards and alerts measuring end-user SLOs.
Top Skills:
AnsibleBashChefFleet DmIntuneJAMFOsqueryPowershellPulumiPuppetPythonSaltTerraformWorkspace One
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
Own reliability, observability, and operational excellence for the Unified Call (911) platform. Design monitoring, alerting, incident response, deployment automation, and dashboards. Improve system resiliency, analyze architecture for operational risks, and build tooling and practices to enable reliable production operations across a Kubernetes-based cloud environment.
Top Skills:
AWSDatadogKafkaKubernetesRabbitMQ
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results


.png)




























.png)