Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Cloud • Fintech • HR Tech
Support U.S. federal government contracts by managing operations of services. Collaborate with development teams to enhance architecture and ensure service reliability.
Top Skills:
Cloud InfrastructureDistributed SystemsIac ToolsObservabilityProgramming Languages
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior Site Reliability Engineer will enhance service reliability, implement CI/CD using various tools, automate processes, and mentor junior resources.
Top Skills:
ArtifactoryBitbucketCC++ChefGitGoJavaJenkinsMavenPerlPythonRuby
Software
Define and drive reliability for Saviynt's SaaS platform by designing, building, and operating scalable, reusable platform services. Lead Kubernetes platform engineering, multi-region cloud architectures, event-driven systems, CI/CD pipelines, observability, service mesh, and shared relational data services. Provide tooling, APIs, on-call support, and cross-team guidance.
Top Skills:
ArgocdAWSAzureDatadogElk (Elasticsearch/Logstash/Kibana)EnvoyGCPGitlab CiGoGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLNatsPostgresPrometheusPythonRabbitmq (Rmq)Restful ApisService Mesh
Software • Web3
Lead reliability practices across teams: embed early in projects, define SLIs/SLOs, build multi-cloud paved roads with Terraform, run on-call, drive org-wide incident maturity and tooling.
Top Skills:
AWSAzureGCPRuby On RailsTerraformTypescriptWebcontainers
Healthtech • Pharmaceutical • Manufacturing
Support and maintain production Core Speech systems: deploy, monitor, alert, perform capacity planning, respond to on-call incidents, and drive system performance and architecture improvements.
Top Skills:
AnsibleAws CloudfrontAws DocumentdbAws Ec2Aws EfsAws EksAws RdsAws S3ContainerdDockerElasticsearchFilebeatGitGitGitlabGoGocdGrafanaJavaJythonKibanaKubernetesLogstashMongoDBPostgresPythonRedisShellSolrTerraform
Fintech • Financial Services
Design, build, and maintain reliable, scalable virtual desktop infrastructure (VDI) and supporting platforms. Lead incident response, automate deployments and operations with IaC and CI/CD, implement secure configurations, monitor system health, collaborate cross-functionally, and drive continuous improvement and operational excellence.
Top Skills:
Active DirectoryAnsibleArm/BicepAzure DevopsCitrix CloudCitrix GatewayCvadDnsDscGithub ActionsGitlab CiGposJenkinsPowershellSsl/Tls CertificatesTerraformVdi Profile ManagementWindows 11 Multi-SessionWindows Server
Fintech • Financial Services
The SRE Application Support Engineer is responsible for ensuring operational reliability, stability, and optimizing performance of production systems, managing outages, troubleshooting issues, and developing documentation and standards for production applications.
Top Skills:
AuroraAWSEc2EcsFargateGrafanaJavaKibanaLambdaPostgresPrometheusPythonS3Splunk
Aerospace • Other
Design, operate, scale, and automate HPC clusters and services for silicon design workflows. Manage infrastructure-as-code, CI/CD pipelines, observability, and storage automation. Collaborate with cross-functional teams to eliminate performance bottlenecks and accelerate simulation and regression turnaround times.
Top Skills:
AnsibleAnsysBambooBashCadenceClaude CodeDockerGrafanaGrokJenkinsKeysightKubernetesLinuxLsfMySQLNetapp OntapNfsPostgresPrometheusPuppetPythonRest ApiSiemensSlurmSqliteSynopsysTcp/IpTerraform
Fintech • Payments • Software • Financial Services
Lead the design and implementation of an automated patching service, ensuring reliability and compliance while driving continuous improvement and cross-functional collaboration.
Top Skills:
Ansible Automation PlatformCi/Cd OrchestrationCloudbeesLinuxPower BIPythonRhelServicenow
Financial Services
Provide frontline desktop support for employees (remote and in-person), triage and resolve hardware, Windows, application, phone, and market-data feed issues, manage tickets, perform firmware/patch deployments, and collaborate with IT teams. Support trading desk/C-suite users and maintain endpoint security and configuration management.
Top Skills:
Active DirectoryBiometric DevicesBloombergCisco Phone SystemsCisco PhonesData EncryptionEndpoint ManagerFidessaFirmware UpdatesGlobal RelayIceMicrosoft Office/Office 365Ms-900OnedrivePatch ManagementPrintersRedi+ScannersServicenowSoftphonesSpyware/Malware ToolsSystem Center Configuration ManagerThomson ReutersTrading TurretsVpnWifiWindows 10Windows 11Zoom
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior Site Reliability Engineer supports app and service operations, focusing on lifecycle management, system health, and automation. Responsibilities include incident response, CI/CD pipeline support, and mentoring junior resources.
Top Skills:
Apache NifiCC++DynatraceGitGoJavaJenkinsPerlPythonRubyShell ScriptingSplunkSQLUnixXlr
Cloud • Information Technology • Machine Learning
As a Senior Site Reliability Engineer, you'll ensure the reliability and performance of a Kubernetes-based data platform, focusing on scaling infrastructure, enhancing security, and optimizing deployment processes.
Top Skills:
AirflowArgo CdFlinkGithub ActionsGrafanaHelmIstioKafkaKubernetesLinkerdOpentelemetryPrometheusPulumiSparkTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Fintech
Lead enterprise reliability strategy and architect resilient cloud and on-prem platforms. Manage DR events, observability (Datadog, Splunk), SSL lifecycle, performance testing (Java/JMeter), CI/CD automation, and Kubernetes operations. Advise leadership, mentor engineers, perform complex root-cause analysis, and build performance test frameworks with actionable reporting.
Top Skills:
Ansible AwxAviAWSAws Route53AzureAzure Load BalancerCi/CdCloud-TestDatadogDeployment As A Service (Daas)F5GrafanaJavaJenkins CoreJmeterKubernetesPythonRush-HourShellSplunkTerraformUdeploy
Cloud • Fintech • HR Tech
Operate and support production services as an SRE: drive reliability, performance, capacity planning, observability, automation (IaC), and incident response. Partner with development and infrastructure teams, handle on-call duties, and work on federal contracts requiring US-citizen personnel and clearance eligibility.
Top Skills:
Capacity PlanningCasp+Comptia Cysa+Distributed SystemsDod 8570/8140GicspIat Level IiIncident ManagementInfrastructure As Code (Iac) ToolsLoggingObservability (Metrics CollectionPublic CloudTracing)
Fintech • Payments
Entry-level Site Reliability Engineer supporting system reliability, monitoring, incident triage, and root-cause analysis. Develop basic automation and scripts, follow deployment/change processes, collaborate with senior engineers, and contribute to observability and incident/problem management to improve system resilience and scalability.
Top Skills:
BashDockerKubernetesLinuxPowershellPythonUnix
Hardware • Semiconductor • Manufacturing
The Site Reliability Engineer will design, implement, and manage reliable infrastructure and services, ensuring operational excellence and uptime.
Top Skills:
AWSBashDockerGrafanaKubernetesLinuxAzureOpenshiftPrometheusProxmoxPythonVmware Vsphere
Artificial Intelligence • Healthtech • Software • Telehealth
Own and evolve Fabric's AWS/EKS infrastructure, build Terraform-managed infrastructure, improve observability with Datadog, lead incident response and SLOs, automate operations with AI/agentic workflows, optimize AWS resources, and ensure HIPAA-compliant, high-availability platform architecture while mentoring engineers.
Top Skills:
Agentic WorkflowsAi-Assisted ToolingAWSBashDatadogEc2EksGithub ActionsGoKubernetesPythonRdsRubyS3SemaphoreTerraform
Other
In this role, you will manage day-to-day operations of Internet-based enterprise systems, identify operational issues, develop tools for maintenance, and collaborate on infrastructure documentation and project execution.
Top Skills:
.NetAnsibleApacheAzureChefIisJbossPerlPowershellPuppetPythonRubyTomcat
Other
Responsible for monitoring, provisioning, and customer interactions, with a focus on maintaining high availability in complex web environments.
Top Skills:
.NetAnsibleApacheCfengineChefDyanatraceGoIisJavaJbossNasNew RelicPerlPowershellPuppetPythonRaidRubySanSplunkSumo LogicTomcatWindows
Fintech • Financial Services
The Site Reliability Engineer I will support cloud infrastructure and assist in cloud transformation initiatives, focusing on performance and delivery of public cloud solutions, primarily in Azure. Responsibilities include troubleshooting, monitoring, automation, and contributing to operational readiness practices for cloud services.
Top Skills:
.NetAnsibleAWSAzureAzure CliGCPJenkinsKubernetesLinuxPowershellTerraformWindows
Fintech • Financial Services
The role involves shaping release engineering practices, implementing AI-driven solutions, and ensuring software reliability through collaboration and automation.
Top Skills:
Ai-Powered ToolsAzureBashC#Github CopilotJavaPowershell
Fintech • Consulting
The Site Reliability Engineer at Equifax manages system uptime, builds infrastructure as code, develops CI/CD pipelines, automates deployment, solves complex issues, and leads postmortems for system reliability.
Top Skills:
AnsibleAWSBashChefDockerGCPGoJavaJavaScriptJenkinsKubernetesNode.jsPythonTerraform
Artificial Intelligence • Information Technology • Cybersecurity • Defense
As a Site Reliability Engineer, you'll ensure system reliability in a government environment, manage incidents, and collaborate with engineering teams on operational tasks and improvements while maintaining security compliance.
Top Skills:
AWSBashDockerDocker ComposeGrafanaLinux/UnixLokiMimirPrometheusPythonTerraform
Artificial Intelligence • Healthtech • Software • Automation
Design and own platform reliability: define SLOs, build observability, lead incident response and postmortems, evolve IaC and deployment pipelines, automate toil, and collaborate with engineers to improve operability and architecture for scaling.
Top Skills:
AlertingCloud InfrastructureContainerized ServicesDeployment PipelineIncident ResponseInfrastructure-As-CodeJavaMonitoringObservabilityPythonSlosTypescript
Artificial Intelligence
The Site Reliability Engineer II will enhance infrastructure and software reliability, write efficient code, collaborate across teams, and maintain platforms and monitoring tools.
Top Skills:
AWSCi/CdCoralogixDockerJavaScriptKubernetesPythonSentryTerraformUnix Shell
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results















.png)












