Get the job you really want.

Top Site Reliability Engineer Jobs

22 Days AgoSaved
Remote
United States
Mid level
Mid level
Security • Software • Analytics
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Top Skills: Aws,Docker,Kubernetes,Amazon Eks,Terraform,Pulumi,Linux,Github Actions,Gitlab,Circleci,Llms,Golang,Monitoring And Observability Tools
22 Days AgoSaved
In-Office
City of Broomfield, CO, USA
155K-233K Annually
Senior level
155K-233K Annually
Senior level
Cloud • Information Technology • Security • Software
Lead and grow a global Cloud Support/SRE team to ensure SaaS and self-hosted infrastructure reliability. Own incident response for Severity 1 events, refine support workflows, track KPIs (CSAT, MTTR, first-response), and collaborate with Product, Engineering, and Solutions teams to drive product improvements and operational excellence.
Top Skills: Aws,Azure,Gcp,Linux,Kubernetes,Tcp/Ip,Dns,Load Balancing,Ssl/Tls,Python,Bash,Go
Reposted 22 Days AgoSaved
In-Office
6 Locations
136K-204K Annually
Senior level
136K-204K Annually
Senior level
Fintech • Payments • Financial Services
The Senior Data Platform Administrator will engineer and maintain scalable big data platforms, providing operational excellence and technical guidance while fostering a culture of collaboration and innovation within the team.
Top Skills: AnsibleAws EmrBashCloudFormationInfrastructure As CodePythonSparkSQLTerraform
Reposted 22 Days AgoSaved
In-Office
Saratoga, CA, USA
100K-165K Annually
Senior level
100K-165K Annually
Senior level
Other
As a Platform Engineer/Dev Ops, you will expand cloud infrastructure, implement monitoring systems, manage databases, and leverage CI/CD tools, working collaboratively with various teams.
Top Skills: AWSAzureBashDatadogElk StackKubernetesOpentofuPrometheusPythonTerraform
Reposted 22 Days AgoSaved
In-Office
Frisco, TX, USA
Mid level
Mid level
Security • Software • Cybersecurity
The Site Reliability Engineer will manage software development tools, optimize configurations, respond to incidents, drive automation, and support migrations for efficiency.
Top Skills: ArtifactoryAWSAzureBashClickupConfluenceDockerFigmaFullstoryGCPGitGitGrafanaIamJIRAKubernetesOktaPower BIPrometheusPythonSplunkTerraform
Reposted 22 Days AgoSaved
In-Office
Atlanta, GA, USA
99K-124K Annually
Senior level
99K-124K Annually
Senior level
Fintech • Insurance • Financial Services
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, develop software solutions for reliability, manage CI/CD pipelines, and collaborate with AI teams to enhance operational excellence.
Top Skills: AnsibleAWSAzureCi/CdDynatraceGitJavaPythonTerraform
23 Days AgoSaved
Remote or Hybrid
US
132K-195K Annually
Senior level
132K-195K Annually
Senior level
Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Natural Language Processing • Software • Cybersecurity
Maintain and improve the internal developer platform, observability stack, and AWS infrastructure (Terraform); manage Kubernetes at scale; troubleshoot distributed systems; drive security, reliability, cost and performance improvements; partner with product teams and participate in on-call support.
Top Skills: AWSCkaContainersGoKubernetesLgtm StackLinuxOpensearchPythonServerlessTcp/IpTerraform
23 Days AgoSaved
In-Office
Whippany, NJ, USA
140K-175K Annually
Senior level
140K-175K Annually
Senior level
Fintech • Financial Services
Design, automate, and maintain reliable, scalable systems; monitor and respond to incidents; perform capacity planning and performance tuning; build operational tooling; collaborate with development teams and lead/coach staff to improve resilience and operational practices.
23 Days AgoSaved
In-Office or Remote
Kansas City, MO, USA
Senior level
Senior level
Fintech • Software
Ensure availability, performance, scalability, and reliability of production systems by defining SLIs/SLOs, implementing monitoring and incident response, automating operations and CI/CD, managing cloud/hybrid infrastructure, capacity planning, and collaborating with engineering and security teams to improve reliability.
Top Skills: AnsibleAWSAzureBashCi/CdDatadogDnsDockerElkGCPGoGrafanaInfrastructure As CodeKubernetesLinuxLoad BalancingPrometheusPuppetPythonSplunkTerraformUnixVMwareWindows
23 Days AgoSaved
Easy Apply
Remote
2 Locations
Easy Apply
215K-250K Annually
Senior level
215K-250K Annually
Senior level
Security • Cybersecurity
Lead the design and implementation of observability, SLO/SLA frameworks, and AI-enabled infrastructure automation. Architect scalable AWS infrastructure, improve incident management and on-call practices, and drive organization-wide adoption of telemetry and reliability standards.
Top Skills: Ai-Assisted ToolingAWSCi/CdClaudeCodexCursorGrafanaHoneycombInfrastructure-As-CodeObservabilityPulumiSupabaseTelemetryTerraformVercel
23 Days AgoSaved
In-Office or Remote
San Diego, CA, USA
207K-261K Annually
Senior level
207K-261K Annually
Senior level
Information Technology
Lead technical strategy for observability, operational intelligence, and reliability. Architect telemetry and automation platforms, drive AIOps and large-scale IaC, lead incident response, mentor senior engineers, and standardize SLO/SLI and reliability practices across AWS cloud-native environments.
Top Skills: AlbAws (VpcBashCloudFormationDatadogDnsDynamoDBEc2EcsEksGitopsGoGrafanaIamKmsKubernetesLinuxMulti-Account Architectures)New RelicNlbOpentelemetryPolicy-As-CodePrometheusPythonRdsRoute 53S3Tcp/IpTerraformTls
23 Days AgoSaved
In-Office
New York, NY, USA
197K-207K Annually
Expert/Leader
197K-207K Annually
Expert/Leader
Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Lead technical design and architecture for internal private and multi-cloud infrastructure, manage OpenShift/OpenStack platforms, automate operations, advise customers, and represent Red Hat at open-source events.
Top Skills: Linux,Osi Layers,Cisco,Juniper,Python,Golang,Rust,Openstack,Openshift,Openshift Virtualization,Rosa,Red Hat Openshift,Kubernetes,Dell,Cisco Ucs,Redfish,Netapp,Aws,Ibm Cloud,Azure,Ci/Cd,Sast,Linting,Unit Testing
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 23 Days AgoSaved
In-Office
Atlanta, GA, USA
Senior level
Senior level
Healthtech • Payments • Software
The SRE Specialist will design reliability solutions, enhance system observability, respond to incidents, and collaborate with engineering teams to improve data platforms.
Top Skills: Apache AirflowAWSAzureCloudFormationGCPGrafanaKafkaKubernetesPowershellPrometheusPythonSparkSplunkTerraform
23 Days AgoSaved
Easy Apply
In-Office or Remote
Remote, OR, USA
Easy Apply
Senior level
Senior level
Natural Language Processing • Software • Conversational AI
Maintain and improve reliability of the Echo platform by operating GKE production workloads, implementing GitOps deployments, defining SLOs/SLIs, enhancing observability with OpenTelemetry, troubleshooting incidents, and collaborating with developers on safe CI/CD and progressive delivery.
Top Skills: Google Kubernetes Engine (Gke),Kubernetes,Gitops,Argocd,Flux,Opentelemetry (Otel),Ci/Cd,Service Mesh,Ingress,Load Balancing,Dns,Cloud Networking
Reposted 23 Days AgoSaved
In-Office
6 Locations
90K-122K Annually
Mid level
90K-122K Annually
Mid level
Fintech • Analytics
The Site Reliability Engineer will manage production monitoring, incident response, and enhance automation using various tools. They will ensure observability and participate in SRE process improvements.
Top Skills: AWSCucumberDatadog ApmDatadog DbmDynamoDBEc2EcsElkJavaJenkinsPagerdutyPlaywrightRdsS3Secrets ManagerSeleniumServicenowSplunkSpring Boot
Reposted 23 Days AgoSaved
In-Office
Aurora, CO, USA
87K-198K Annually
Senior level
87K-198K Annually
Senior level
Information Technology
As a Senior Site Reliability Engineer, you will enhance system resilience, automate tasks, and ensure robust infrastructure for national security.
Top Skills: ConfluenceDockerGitGoJavaJenkinsJIRAKubernetesLinuxNessusPackerPythonRust
Reposted 12 Hours AgoSaved
In-Office
Seattle, WA, USA
Senior level
Senior level
Other
The Sr. Site Reliability Engineer will maintain and administer enterprise systems, troubleshoot operational issues, and develop scripts. This role requires collaboration across teams and participation in project planning and execution.
Top Skills: AnsibleApacheAzureC#ChefIisJavaJbossPerlPowershellPuppetPythonRubyTomcat
Reposted 12 Hours AgoSaved
In-Office
South San Francisco, CA, USA
160K-200K Annually
Senior level
160K-200K Annually
Senior level
Aerospace • Hardware • Logistics • Robotics • Software • Transportation
The Senior Site Reliability Engineer will lead cloud infrastructure initiatives, develop best practices, write software, and manage systems while working closely with developers. They will also participate in an on-call rotation and set high technical standards for interviews.
Top Skills: AWSKafkaKubernetes
Reposted 12 Hours AgoSaved
In-Office
3 Locations
160K-190K Annually
Senior level
160K-190K Annually
Senior level
Aerospace • Hardware • Defense
Lead design, build, and operation of scalable, reliable cloud infrastructure; mentor engineers; make architecture and technology decisions; introduce new tools; lead cross-team initiatives; participate in on-call rotations and incident response.
Top Skills: AlertingAWSEc2GitopsInfrastructure-As-Code (Iac)KubernetesLambdaMonitoringOn-CallS3Service MeshService RegistrationTerraformVpc
Reposted 12 Hours AgoSaved
Easy Apply
Remote
United States
Easy Apply
Senior level
Senior level
Real Estate • Software
As a Senior Site Reliability Engineer, you'll enhance system performance, reliability, and cost efficiency in a large-scale production environment, shifting manual operations to AI-assisted engineering.
Top Skills: AnsibleDatadogElkGrafanaKubernetesLinuxPrometheusPythonRubyTerraform
Reposted 23 Days AgoSaved
Easy Apply
In-Office
New York, NY, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
The Staff SRE DevOps Engineer will manage customer applications, improve system reliability, collaborate on architecture discussions, and support infrastructure needs across teams.
Top Skills: AWSBashDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRedshiftSparkTerraform
Reposted 23 Days AgoSaved
Easy Apply
In-Office
San Francisco, CA, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
As a Staff Software Engineer - SRE, you'll manage cloud infrastructure, improve application reliability, collaborate across teams, and support back-office systems.
Top Skills: AWSDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRdsRedshiftShell/BashSparkTerraform
Reposted 23 Days AgoSaved
Easy Apply
In-Office
Los Angeles, CA, USA
Easy Apply
180K-210K Annually
Senior level
180K-210K Annually
Senior level
AdTech • Marketing Tech • Analytics
Manage and support customer applications, improve system reliability, collaborate with teams on infrastructure needs, and help drive architectural decisions.
Top Skills: Auto ScalingAWSCdnsDatadogDnsDockerKafkaKibanaKubernetesLinuxLoad BalancersPostgresProxy ServersPythonRdsRedshiftShell/BashSparkTerraformWafs
Reposted YesterdaySaved
In-Office
30005, Alpharetta, GA, USA
Senior level
Senior level
Fintech • Consulting
The Senior Site Reliability Engineer at Equifax ensures service reliability and performance, builds infrastructure as code, manages cloud systems, and leads incident resolution efforts.
Top Skills: AnsibleArgocdAWSBashChefDatadogDockerGCPGithub ActionsGoJavaJavaScriptJenkinsKubernetesNode.jsPythonTerraform
Reposted YesterdaySaved
In-Office
San Jose, CA, USA
130K-192K Annually
Senior level
130K-192K Annually
Senior level
Fintech • Payments
Senior Site Reliability Engineers at PayPal ensure the reliability and performance of mobile and backend systems, implementing standards, automation, and observability while managing incidents and mentoring junior staff.
Top Skills: AWSAzureDatadogFirebase CrashlyticsGCPGoPythonSentry
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account