Top Site Reliability Engineer Jobs

Reposted 21 Hours AgoSaved
In-Office or Remote
San Diego, CA, USA
Mid level
Mid level
Information Technology
The Site Reliability Engineer will implement reliability engineering practices, develop automation, maintain CI/CD pipelines, and ensure system health through monitoring.
Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
Reposted 21 Hours AgoSaved
In-Office
2 Locations
147K-230K Annually
Senior level
147K-230K Annually
Senior level
Insurance
The Principal Product Manager will lead the development of reliability platforms, focusing on observability, incident management, and system availability while fostering a culture of operational excellence across engineering teams.
Top Skills: AWSAzureCloud InfrastructureDeveloper ToolsGrafanaKubernetesObservabilitySite Reliability Engineering
Reposted 21 Hours AgoSaved
In-Office
Secaucus, NJ, USA
80K-115K Annually
Mid level
80K-115K Annually
Mid level
Healthtech • Database
Responsible for reliability engineering, monitoring system performance, automating processes, and collaborating with development teams to enhance operational efficiency.
Top Skills: AWSAzureBashCi/CdCloudFormationDockerDynatraceGCPGoJmeterKubernetesNeoloadPythonSplunkTerraform
Reposted 21 Hours AgoSaved
In-Office
2 Locations
175K-225K Annually
Mid level
175K-225K Annually
Mid level
Fintech • Payments • Financial Services
The Site Reliability Engineer will automate processes, manage server deployments, and collaborate with teams to enhance operational workflows in a trading environment.
Top Skills: AnsibleC++ChefCloud InfrastructureDistributed SystemsDockerGoGrafanaHashicorp NomadHpc ClustersKubernetesLinuxPerlPodmanPrometheusPuppetPythonRancherRustSalt
Reposted YesterdaySaved
Remote
United States
Mid level
Mid level
Blockchain • Software
Build, operate, and scale production Kubernetes infrastructure using GitOps and declarative IaC. Design CI/CD workflows, observability, and secure-by-default systems. Troubleshoot networking/storage, participate in on-call rotations, automate operational workflows, and drive postmortems and reliability improvements.
Top Skills: ArbitrumArgocdArgocd ApplicationsetsAWSAzureBashCloudwatchCodebuildGCPGithub ActionsGitopsGoGrafanaK9SKubernetesLinuxLokiMimirPrometheusPrysmPythonTerraformYamlZerodev
YesterdaySaved
In-Office
92660-6419, Newport Beach, CA, USA
150K-160K Annually
Mid level
150K-160K Annually
Mid level
Transportation • Travel • Hospitality
Ensure reliability, scalability, performance, and availability of production systems by monitoring, incident response, root cause analysis, automation, IaC, container orchestration, observability, and partnering with engineering to improve deployment and operational practices. Participate in on-call rotations and maintain runbooks and operational standards.
Top Skills: AWSAzureBashCi/CdCloudFormationDockerGCPGoJavaKubernetesLinuxPulumiPythonTerraformUnix
Reposted YesterdaySaved
In-Office
Dallas, TX, USA
50-53 Hourly
Senior level
50-53 Hourly
Senior level
Information Technology
Provide level-4 SWAT support for APSRE, perform production and lower-lane triage, execute restoral steps, identify root causes, and collaborate with ITIL and partner teams to improve environment stability.
Top Skills: ApsreItil
YesterdaySaved
In-Office
Plano, TX, USA
Senior level
Senior level
Big Data • Fintech • Mobile • Payments • Financial Services • Data Privacy
Lead SRE partnering with development and infrastructure teams to implement monitoring, automation, reliability tooling, alerting, and on-call routines; develop reliability scripts and libraries; triage major incidents; reduce toil and improve observability; decompose work and mentor SRE resources.
Top Skills: AnsibleCi/Cd PipelinesConfiguration Management SystemsGoIdentity SystemsInfrastructure As Code (Iac)Monitoring SystemsNetworkingPythonService Mesh PlatformsTerraformVirtualization
Reposted 7 Days AgoSaved
Hybrid
O'Fallon, MO, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior BizOps Engineer is responsible for ensuring platform stability and resilience, guiding teams in product development, and facilitating operational excellence throughout the software lifecycle.
Top Skills: ArtifactoryBitbucketCC++ChefDynatraceGitGoJavaJenkinsMavenOraclePerlPl/SqlPostgresPythonRubySplunkSQL
Reposted 7 Days AgoSaved
Hybrid
O'Fallon, MO, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior BizOps Engineer role involves improving service lifecycles, supporting CI/CD pipelines, and engaging in DevOps automation practices. Responsibilities include system design consulting, operational feedback, incident response, and mentoring junior resources.
Top Skills: ArtifactoryBitbucketCC++ChefGitGoJavaJenkinsMavenPerlPythonRuby
7 Days AgoSaved
Hybrid
O'Fallon, MO, USA
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Drive reliability, scalability, and performance of Mastercard applications as the production-readiness steward. Implement observability, automation, capacity planning, and monitoring. Support incident triage, root cause analysis and blameless post-mortems. Collaborate with developers to embed operational design, enforce standards, improve CI/CD, container orchestration, and cloud infrastructure, while managing risk, compliance, and continuous improvement.
Top Skills: AWSAzureBashCi/CdContainerizationGCPGoLinux/UnixMonitoring/ObservabilityOrchestrationPython
Reposted 7 Days AgoSaved
Hybrid
O'Fallon, MO, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The role involves ensuring application health, automating deployment processes, leading DevOps initiatives, and fostering collaboration between development and operations teams to maintain system resilience and minimize downtime.
Top Skills: AutomationCi/CdDevOpsMonitoringScriptingSoftware Design
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 7 Days AgoSaved
Hybrid
Austin, TX, USA
160K-250K Annually
Senior level
160K-250K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
In this role, you'll ensure the reliability and scalability of the NG-SIEM platform, manage incident responses, and collaborate across teams to enhance system performance.
Top Skills: BashC++GoJavaKafkaPythonRust
Reposted YesterdaySaved
In-Office
New York, NY, USA
200K-250K Annually
Expert/Leader
200K-250K Annually
Expert/Leader
Payments • Software • Automation
Lead platform and infrastructure direction on AWS, evolve CI/CD and ephemeral environments, set observability and SLO standards, drive incident response and postmortems, mentor engineers, and build automation to reduce operational risk.
Top Skills: AWSCi/CdDistributed SystemsEcsEphemeral Environments/Preview DeploysFargateGithub ActionsLogsObservability (MetricsSlos/Slis/Error BudgetsTracing)
Reposted YesterdaySaved
Remote
United States
Senior level
Senior level
Automotive
Design and implement scalable cloud infrastructure, monitor performance, automate processes, ensure security and compliance, and lead a DevOps team.
Top Skills: AWSBashCi/CdDockerElk StackGCPGrafanaKubernetesPrometheusPythonTerraform
Reposted YesterdaySaved
In-Office
Wacker, IL, USA
132K-220K Annually
Senior level
132K-220K Annually
Senior level
Financial Services
As a Staff Site Reliability Engineer, you will enhance system reliability, architect solutions, drive automation, and implement SRE principles within development processes.
Top Skills: BashChefCloudFormationGkeGoJavaOpentelemetryPrometheusPythonRustTerraformTypescript
Reposted YesterdaySaved
In-Office
Annapolis Junction, MD, USA
165K-230K Annually
Expert/Leader
165K-230K Annually
Expert/Leader
Information Technology • Software • Automation
The Senior Site Reliability Engineer will manage AWS environments, develop Infrastructure as Code, and automate operational tasks to ensure high availability in cloud systems.
Top Skills: Amazon Web Services (Aws)AnsibleAws Certified Developer-AssociateAws Certified Solutions Architect-AssociateAws Certified Solutions Architect-ProfessionalAws Certified Sysops Administrator-AssociateCertified Kubernetes Administrator (Ckad)Ci/CdDockerElastic Certified EngineerElastic Certified Observability EngineerKubernetesTerraform
Reposted YesterdaySaved
In-Office
Reston, VA, USA
124K-222K Annually
Mid level
124K-222K Annually
Mid level
Cloud • Fintech • HR Tech
Support U.S. federal government contracts by managing operations of services. Collaborate with development teams to enhance architecture and ensure service reliability.
Top Skills: Cloud InfrastructureDistributed SystemsIac ToolsObservabilityProgramming Languages
Reposted 7 Days AgoSaved
Hybrid
O'Fallon, MO, USA
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior Site Reliability Engineer will enhance service reliability, implement CI/CD using various tools, automate processes, and mentor junior resources.
Top Skills: ArtifactoryBitbucketCC++ChefGitGoJavaJenkinsMavenPerlPythonRuby
2 Days AgoSaved
Hybrid
2 Locations
240K-250K Annually
Expert/Leader
240K-250K Annually
Expert/Leader
Software
Define and drive reliability for Saviynt's SaaS platform by designing, building, and operating scalable, reusable platform services. Lead Kubernetes platform engineering, multi-region cloud architectures, event-driven systems, CI/CD pipelines, observability, service mesh, and shared relational data services. Provide tooling, APIs, on-call support, and cross-team guidance.
Top Skills: ArgocdAWSAzureDatadogElk (Elasticsearch/Logstash/Kibana)EnvoyGCPGitlab CiGoGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLNatsPostgresPrometheusPythonRabbitmq (Rmq)Restful ApisService Mesh
2 Days AgoSaved
Remote
USA
Senior level
Senior level
Software • Web3
Lead reliability practices across teams: embed early in projects, define SLIs/SLOs, build multi-cloud paved roads with Terraform, run on-call, drive org-wide incident maturity and tooling.
Top Skills: AWSAzureGCPRuby On RailsTerraformTypescriptWebcontainers
2 Days AgoSaved
Remote
2 Locations
124K-171K Annually
Senior level
124K-171K Annually
Senior level
Healthtech • Pharmaceutical • Manufacturing
Support and maintain production Core Speech systems: deploy, monitor, alert, perform capacity planning, respond to on-call incidents, and drive system performance and architecture improvements.
Top Skills: AnsibleAws CloudfrontAws DocumentdbAws Ec2Aws EfsAws EksAws RdsAws S3ContainerdDockerElasticsearchFilebeatGitGitGitlabGoGocdGrafanaJavaJythonKibanaKubernetesLogstashMongoDBPostgresPythonRedisShellSolrTerraform
2 Days AgoSaved
In-Office
New York, NY, USA
120K-175K Annually
Senior level
120K-175K Annually
Senior level
Fintech • Financial Services
Design, build, and maintain reliable, scalable virtual desktop infrastructure (VDI) and supporting platforms. Lead incident response, automate deployments and operations with IaC and CI/CD, implement secure configurations, monitor system health, collaborate cross-functionally, and drive continuous improvement and operational excellence.
Top Skills: Active DirectoryAnsibleArm/BicepAzure DevopsCitrix CloudCitrix GatewayCvadDnsDscGithub ActionsGitlab CiGposJenkinsPowershellSsl/Tls CertificatesTerraformVdi Profile ManagementWindows 11 Multi-SessionWindows Server
Reposted 2 Days AgoSaved
In-Office
New York, NY, USA
120K-165K Annually
Senior level
120K-165K Annually
Senior level
Fintech • Financial Services
The SRE Application Support Engineer is responsible for ensuring operational reliability, stability, and optimizing performance of production systems, managing outages, troubleshooting issues, and developing documentation and standards for production applications.
Top Skills: AuroraAWSEc2EcsFargateGrafanaJavaKibanaLambdaPostgresPrometheusPythonS3Splunk
2 Days AgoSaved
In-Office
Redmond, WA, USA
125K-175K Annually
Junior
125K-175K Annually
Junior
Aerospace • Other
Design, operate, scale, and automate HPC clusters and services for silicon design workflows. Manage infrastructure-as-code, CI/CD pipelines, observability, and storage automation. Collaborate with cross-functional teams to eliminate performance bottlenecks and accelerate simulation and regression turnaround times.
Top Skills: AnsibleAnsysBambooBashCadenceClaude CodeDockerGrafanaGrokJenkinsKeysightKubernetesLinuxLsfMySQLNetapp OntapNfsPostgresPrometheusPuppetPythonRest ApiSiemensSlurmSqliteSynopsysTcp/IpTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account