Top Site Reliability Engineer Jobs

Reposted 2 Days AgoSaved
In-Office
Reston, VA, USA
124K-222K Annually
Mid level
124K-222K Annually
Mid level
Cloud • Fintech • HR Tech
Support U.S. federal government contracts by managing operations of services. Collaborate with development teams to enhance architecture and ensure service reliability.
Top Skills: Cloud InfrastructureDistributed SystemsIac ToolsObservabilityProgramming Languages
Reposted 8 Days AgoSaved
Hybrid
O'Fallon, MO, USA
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior Site Reliability Engineer will enhance service reliability, implement CI/CD using various tools, automate processes, and mentor junior resources.
Top Skills: ArtifactoryBitbucketCC++ChefGitGoJavaJenkinsMavenPerlPythonRuby
3 Days AgoSaved
Hybrid
2 Locations
240K-250K Annually
Expert/Leader
240K-250K Annually
Expert/Leader
Software
Define and drive reliability for Saviynt's SaaS platform by designing, building, and operating scalable, reusable platform services. Lead Kubernetes platform engineering, multi-region cloud architectures, event-driven systems, CI/CD pipelines, observability, service mesh, and shared relational data services. Provide tooling, APIs, on-call support, and cross-team guidance.
Top Skills: ArgocdAWSAzureDatadogElk (Elasticsearch/Logstash/Kibana)EnvoyGCPGitlab CiGoGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLNatsPostgresPrometheusPythonRabbitmq (Rmq)Restful ApisService Mesh
3 Days AgoSaved
Remote
USA
Senior level
Senior level
Software • Web3
Lead reliability practices across teams: embed early in projects, define SLIs/SLOs, build multi-cloud paved roads with Terraform, run on-call, drive org-wide incident maturity and tooling.
Top Skills: AWSAzureGCPRuby On RailsTerraformTypescriptWebcontainers
3 Days AgoSaved
Remote
2 Locations
124K-171K Annually
Senior level
124K-171K Annually
Senior level
Healthtech • Pharmaceutical • Manufacturing
Support and maintain production Core Speech systems: deploy, monitor, alert, perform capacity planning, respond to on-call incidents, and drive system performance and architecture improvements.
Top Skills: AnsibleAws CloudfrontAws DocumentdbAws Ec2Aws EfsAws EksAws RdsAws S3ContainerdDockerElasticsearchFilebeatGitGitGitlabGoGocdGrafanaJavaJythonKibanaKubernetesLogstashMongoDBPostgresPythonRedisShellSolrTerraform
3 Days AgoSaved
In-Office
New York, NY, USA
120K-175K Annually
Senior level
120K-175K Annually
Senior level
Fintech • Financial Services
Design, build, and maintain reliable, scalable virtual desktop infrastructure (VDI) and supporting platforms. Lead incident response, automate deployments and operations with IaC and CI/CD, implement secure configurations, monitor system health, collaborate cross-functionally, and drive continuous improvement and operational excellence.
Top Skills: Active DirectoryAnsibleArm/BicepAzure DevopsCitrix CloudCitrix GatewayCvadDnsDscGithub ActionsGitlab CiGposJenkinsPowershellSsl/Tls CertificatesTerraformVdi Profile ManagementWindows 11 Multi-SessionWindows Server
Reposted 3 Days AgoSaved
In-Office
New York, NY, USA
120K-165K Annually
Senior level
120K-165K Annually
Senior level
Fintech • Financial Services
The SRE Application Support Engineer is responsible for ensuring operational reliability, stability, and optimizing performance of production systems, managing outages, troubleshooting issues, and developing documentation and standards for production applications.
Top Skills: AuroraAWSEc2EcsFargateGrafanaJavaKibanaLambdaPostgresPrometheusPythonS3Splunk
3 Days AgoSaved
In-Office
Redmond, WA, USA
125K-175K Annually
Junior
125K-175K Annually
Junior
Aerospace • Other
Design, operate, scale, and automate HPC clusters and services for silicon design workflows. Manage infrastructure-as-code, CI/CD pipelines, observability, and storage automation. Collaborate with cross-functional teams to eliminate performance bottlenecks and accelerate simulation and regression turnaround times.
Top Skills: AnsibleAnsysBambooBashCadenceClaude CodeDockerGrafanaGrokJenkinsKeysightKubernetesLinuxLsfMySQLNetapp OntapNfsPostgresPrometheusPuppetPythonRest ApiSiemensSlurmSqliteSynopsysTcp/IpTerraform
Reposted 3 Days AgoSaved
In-Office
Manassas, VA, USA
122K-226K Annually
Expert/Leader
122K-226K Annually
Expert/Leader
Fintech • Payments • Software • Financial Services
Lead the design and implementation of an automated patching service, ensuring reliability and compliance while driving continuous improvement and cross-functional collaboration.
Top Skills: Ansible Automation PlatformCi/Cd OrchestrationCloudbeesLinuxPower BIPythonRhelServicenow
3 Days AgoSaved
In-Office
San Francisco, CA, USA
85K-115K Annually
Mid level
85K-115K Annually
Mid level
Financial Services
Provide frontline desktop support for employees (remote and in-person), triage and resolve hardware, Windows, application, phone, and market-data feed issues, manage tickets, perform firmware/patch deployments, and collaborate with IT teams. Support trading desk/C-suite users and maintain endpoint security and configuration management.
Top Skills: Active DirectoryBiometric DevicesBloombergCisco Phone SystemsCisco PhonesData EncryptionEndpoint ManagerFidessaFirmware UpdatesGlobal RelayIceMicrosoft Office/Office 365Ms-900OnedrivePatch ManagementPrintersRedi+ScannersServicenowSoftphonesSpyware/Malware ToolsSystem Center Configuration ManagerThomson ReutersTrading TurretsVpnWifiWindows 10Windows 11Zoom
Reposted 8 Days AgoSaved
Hybrid
O'Fallon, MO, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior Site Reliability Engineer supports app and service operations, focusing on lifecycle management, system health, and automation. Responsibilities include incident response, CI/CD pipeline support, and mentoring junior resources.
Top Skills: Apache NifiCC++DynatraceGitGoJavaJenkinsPerlPythonRubyShell ScriptingSplunkSQLUnixXlr
Reposted 8 Days AgoSaved
In-Office
2 Locations
165K-242K Annually
Senior level
165K-242K Annually
Senior level
Cloud • Information Technology • Machine Learning
As a Senior Site Reliability Engineer, you'll ensure the reliability and performance of a Kubernetes-based data platform, focusing on scaling infrastructure, enhancing security, and optimizing deployment processes.
Top Skills: AirflowArgo CdFlinkGithub ActionsGrafanaHelmIstioKafkaKubernetesLinkerdOpentelemetryPrometheusPulumiSparkTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
3 Days AgoSaved
In-Office
Durham, NC, USA
Senior level
Senior level
Fintech
Lead enterprise reliability strategy and architect resilient cloud and on-prem platforms. Manage DR events, observability (Datadog, Splunk), SSL lifecycle, performance testing (Java/JMeter), CI/CD automation, and Kubernetes operations. Advise leadership, mentor engineers, perform complex root-cause analysis, and build performance test frameworks with actionable reporting.
Top Skills: Ansible AwxAviAWSAws Route53AzureAzure Load BalancerCi/CdCloud-TestDatadogDeployment As A Service (Daas)F5GrafanaJavaJenkins CoreJmeterKubernetesPythonRush-HourShellSplunkTerraformUdeploy
3 Days AgoSaved
In-Office
Reston, VA, USA
124K-222K Annually
Mid level
124K-222K Annually
Mid level
Cloud • Fintech • HR Tech
Operate and support production services as an SRE: drive reliability, performance, capacity planning, observability, automation (IaC), and incident response. Partner with development and infrastructure teams, handle on-call duties, and work on federal contracts requiring US-citizen personnel and clearance eligibility.
Top Skills: Capacity PlanningCasp+Comptia Cysa+Distributed SystemsDod 8570/8140GicspIat Level IiIncident ManagementInfrastructure As Code (Iac) ToolsLoggingObservability (Metrics CollectionPublic CloudTracing)
3 Days AgoSaved
In-Office
4 Locations
75K-95K Annually
Entry level
75K-95K Annually
Entry level
Fintech • Payments
Entry-level Site Reliability Engineer supporting system reliability, monitoring, incident triage, and root-cause analysis. Develop basic automation and scripts, follow deployment/change processes, collaborate with senior engineers, and contribute to observability and incident/problem management to improve system resilience and scalability.
Top Skills: BashDockerKubernetesLinuxPowershellPythonUnix
Reposted 3 Days AgoSaved
In-Office
Sunnyvale, CA, USA
145K-175K Annually
Senior level
145K-175K Annually
Senior level
Hardware • Semiconductor • Manufacturing
The Site Reliability Engineer will design, implement, and manage reliable infrastructure and services, ensuring operational excellence and uptime.
Top Skills: AWSBashDockerGrafanaKubernetesLinuxAzureOpenshiftPrometheusProxmoxPythonVmware Vsphere
Reposted 3 Days AgoSaved
In-Office or Remote
New York City, NY, USA
135K-160K Annually
Senior level
135K-160K Annually
Senior level
Artificial Intelligence • Healthtech • Software • Telehealth
Own and evolve Fabric's AWS/EKS infrastructure, build Terraform-managed infrastructure, improve observability with Datadog, lead incident response and SLOs, automate operations with AI/agentic workflows, optimize AWS resources, and ensure HIPAA-compliant, high-availability platform architecture while mentoring engineers.
Top Skills: Agentic WorkflowsAi-Assisted ToolingAWSBashDatadogEc2EksGithub ActionsGoKubernetesPythonRdsRubyS3SemaphoreTerraform
Reposted 3 Days AgoSaved
In-Office
Seattle, WA, USA
Senior level
Senior level
Other
In this role, you will manage day-to-day operations of Internet-based enterprise systems, identify operational issues, develop tools for maintenance, and collaborate on infrastructure documentation and project execution.
Top Skills: .NetAnsibleApacheAzureChefIisJbossPerlPowershellPuppetPythonRubyTomcat
Reposted 3 Days AgoSaved
In-Office
Seattle, WA, USA
Senior level
Senior level
Other
Responsible for monitoring, provisioning, and customer interactions, with a focus on maintaining high availability in complex web environments.
Top Skills: .NetAnsibleApacheCfengineChefDyanatraceGoIisJavaJbossNasNew RelicPerlPowershellPuppetPythonRaidRubySanSplunkSumo LogicTomcatWindows
Reposted 3 Days AgoSaved
Hybrid
Arlington, TX, USA
Mid level
Mid level
Fintech • Financial Services
The Site Reliability Engineer I will support cloud infrastructure and assist in cloud transformation initiatives, focusing on performance and delivery of public cloud solutions, primarily in Azure. Responsibilities include troubleshooting, monitoring, automation, and contributing to operational readiness practices for cloud services.
Top Skills: .NetAnsibleAWSAzureAzure CliGCPJenkinsKubernetesLinuxPowershellTerraformWindows
Reposted 3 Days AgoSaved
Hybrid
2 Locations
Senior level
Senior level
Fintech • Financial Services
The role involves shaping release engineering practices, implementing AI-driven solutions, and ensuring software reliability through collaboration and automation.
Top Skills: Ai-Powered ToolsAzureBashC#Github CopilotJavaPowershell
Reposted 3 Days AgoSaved
In-Office
2 Locations
60K-90K Annually
Senior level
60K-90K Annually
Senior level
Fintech • Consulting
The Site Reliability Engineer at Equifax manages system uptime, builds infrastructure as code, develops CI/CD pipelines, automates deployment, solves complex issues, and leads postmortems for system reliability.
Top Skills: AnsibleAWSBashChefDockerGCPGoJavaJavaScriptJenkinsKubernetesNode.jsPythonTerraform
Reposted 3 Days AgoSaved
In-Office
Arlington, VA, USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Cybersecurity • Defense
As a Site Reliability Engineer, you'll ensure system reliability in a government environment, manage incidents, and collaborate with engineering teams on operational tasks and improvements while maintaining security compliance.
Top Skills: AWSBashDockerDocker ComposeGrafanaLinux/UnixLokiMimirPrometheusPythonTerraform
Reposted 3 Days AgoSaved
In-Office
Tempe, AZ, USA
Senior level
Senior level
Artificial Intelligence • Healthtech • Software • Automation
Design and own platform reliability: define SLOs, build observability, lead incident response and postmortems, evolve IaC and deployment pipelines, automate toil, and collaborate with engineers to improve operability and architecture for scaling.
Top Skills: AlertingCloud InfrastructureContainerized ServicesDeployment PipelineIncident ResponseInfrastructure-As-CodeJavaMonitoringObservabilityPythonSlosTypescript
Reposted 3 Days AgoSaved
In-Office
Waltham, MA, USA
135K-165K Annually
Mid level
135K-165K Annually
Mid level
Artificial Intelligence
The Site Reliability Engineer II will enhance infrastructure and software reliability, write efficient code, collaborate across teams, and maintain platforms and monitoring tools.
Top Skills: AWSCi/CdCoralogixDockerJavaScriptKubernetesPythonSentryTerraformUnix Shell
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account