Top Site Reliability Engineer Jobs

Reposted 24 Days AgoSaved
In-Office
San Francisco, CA, USA
116K-200K Annually
Mid level
116K-200K Annually
Mid level
Information Technology • Mobile • Software
As a Site Reliability Engineer, you'll ensure system reliability and scalability, automate processes, optimize performance, and collaborate on system design.
Top Skills: AWSAzureBashCloudFormationDatadogDockerElkGoGoogle Cloud PlatformGrafanaHelmKubernetesNew RelicPrometheusPulumiPythonTerraform
Reposted 24 Days AgoSaved
In-Office
Saratoga, CA, USA
100K-165K Annually
Senior level
100K-165K Annually
Senior level
Other
As a Platform Engineer/Dev Ops, you will expand cloud infrastructure, implement monitoring systems, manage databases, and leverage CI/CD tools, working collaboratively with various teams.
Top Skills: AWSAzureBashDatadogElk StackKubernetesOpentofuPrometheusPythonTerraform
Reposted 24 Days AgoSaved
In-Office or Remote
2 Locations
250K-295K Annually
Senior level
250K-295K Annually
Senior level
Artificial Intelligence • Software
As a Senior Staff SRE Tech Lead, you'll oversee reliability and scalability, mentor engineers, optimize systems, and enhance data infrastructure.
Top Skills: ClickhouseGoPostgresPythonTypescript
Reposted 24 Days AgoSaved
In-Office
5 Locations
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills: AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Reposted 24 Days AgoSaved
In-Office
5 Locations
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The Senior Site Reliability Engineer will enhance the Splunk ecosystem and develop an Observability Platform by automating infrastructure and managing complex distributed systems, while optimizing log collection and incident response.
Top Skills: AWSGCPGoKubernetesLinuxOpentelemetryPythonRubySplunkTerraform
Reposted 24 Days AgoSaved
Remote
United States
212K-265K Annually
Expert/Leader
212K-265K Annually
Expert/Leader
Real Estate • Travel • PropTech
The Engineering Manager for Storage SRE will lead a team to ensure reliable database operations, improve developer experience, and expand tooling and operational models, focusing on mission-critical systems.
Top Skills: Cloud InfrastructureDatabasesSite Reliability EngineeringStorage Systems
Reposted 24 Days AgoSaved
In-Office
San Francisco, CA, USA
181K-263K Annually
Senior level
181K-263K Annually
Senior level
Big Data • Cloud • Marketing Tech • Social Impact • Software
The Senior Staff Site Reliability Engineer at LiveRamp will define the SRE strategy, oversee critical automation, and lead operational excellence in a global infrastructure, influencing architectural decisions and mentoring teams.
Top Skills: Aws)CassandraCircleCICloud Security (GcpDynamoDBGoJenkinsKubernetesPythonScylladbSinglestoreTerraform
2 Days AgoSaved
In-Office
2 Locations
128K-216K Annually
Senior level
128K-216K Annually
Senior level
eCommerce • Fintech • Information Technology • Payments • Financial Services
Design, deploy, and operate highly available, scalable cloud infrastructure on AWS. Manage Kubernetes clusters, build CI/CD with GitHub Actions, automate via Terraform, optimize data layers (RDBMS/document stores), implement observability, and lead design reviews while mentoring teams on SRE practices.
Top Skills: AWSBashDatadogDnsDockerDocument StorageDynatraceGitGithub ActionsKubernetesKubernetes OperatorsLinux/UnixLoad BalancingNew RelicNode.jsPythonRdbmsRuby On RailsTerraformVirtual Networking
25 Days AgoSaved
In-Office
San Francisco, CA, USA
200K-275K Annually
Senior level
200K-275K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Payments • Software • Financial Services
Lead SRE responsible for reliability strategy, architecting resilient AWS/Kubernetes infrastructure, building observability, driving incident response and postmortems, improving deployment safety and automation, mentoring engineers, and partnering across product and engineering to scale platform reliability.
Top Skills: AWSCi/CdDockerEvent-Driven ArchitecturesFastapiInfrastructure As CodeKubernetes (Eks)LogsObservability (MetricsPostgresql (Rds)PythonTraces)TypescriptVue
Reposted 25 Days AgoSaved
In-Office
San Jose, CA, USA
149K-361K Annually
Senior level
149K-361K Annually
Senior level
News + Entertainment
As a Senior Machine Learning Engineer, you will develop advanced machine learning and deep learning models and platforms for optimizing advertising performance and conduct complex experiments.
Top Skills: AIControl SystemsDeep LearningMachine LearningReinforcement LearningStatistical Techniques
25 Days AgoSaved
In-Office
Austin, TX, USA
Senior level
Senior level
News + Entertainment
Design, operate, and scale cloud-native ML infrastructure across GCP and AWS (GPU/TPU), build CI/CD for models, maintain low-latency real-time inference systems, define observability and monitoring for ML models, participate in on-call incident response, and partner with data scientists to improve MLOps and platform usability.
Top Skills: AerospikeApache AirflowApache FlinkSparkAWSChrononDatadogEksGCPGitlab RunnerGkeGpuGrafanaJavaJenkinsKafkaKubernetesKv StoreMlflowPrometheusPythonRayScalaTerraformTpuVector Database
25 Days AgoSaved
In-Office
2 Locations
62K-141K Annually
Mid level
62K-141K Annually
Mid level
Information Technology
Design, build, and maintain resilient cloud infrastructure for the Intelligence Community. Implement redundancy, monitoring, automation, patching, and hardening. Reduce toil via scripting and self-repair, and support security posture improvements using cloud and container tooling.
Top Skills: AWSConfluenceDockerGitJenkinsJIRAKubernetesLinuxNessusPackerRhel
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
25 Days AgoSaved
In-Office
Redmond, WA, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and scale on‑premise compute and core infrastructure for Starlink. Develop automation, manage databases/monitoring/distributed storage, collaborate with software teams, troubleshoot end-to-end, and improve deployment and developer velocity.
Top Skills: AnsibleBashCC++DatabasesDistributed StorageDockerGoHypervisor TechnologiesKubernetesLinuxMonitoringPythonTcp/IpTerraformVirtualization
25 Days AgoSaved
In-Office
Redmond, WA, USA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Aerospace • Other
Design, deploy, and scale on-prem Kubernetes clusters and core infrastructure for Starlink. Build automation, manage databases, monitoring, and distributed storage. Collaborate with engineers to improve service lifecycle, availability, and performance; troubleshoot across the Starlink stack and drive reliability improvements.
Top Skills: AnsibleBashBazelC++GoKubernetesLinuxMakefilesOci ContainersPythonTcp/IpTerraform
2 Days AgoSaved
In-Office
Atlanta, GA, USA
178K-205K Annually
Senior level
178K-205K Annually
Senior level
Cloud • Fintech • HR Tech
Design, automate, patch, and monitor infrastructure and services to keep production and non-production environments running. Create and maintain scripts and automation (CI/CD), manage containerized deployments (Docker/Kubernetes), provision baremetal and cloud infrastructure, and improve monitoring, alerting, and tracing to meet SLAs.
Top Skills: AnsibleBaremetalCi/CdDockerGitGradleJenkinsKafkaKubernetesMavenMonitoringPackerPrivate CloudPublic CloudPythonShell ScriptingTerraformTracing
Reposted 2 Days AgoSaved
In-Office
Seattle, WA, USA
Senior level
Senior level
Other
The Sr. Site Reliability Engineer will maintain and administer enterprise systems, troubleshoot operational issues, and develop scripts. This role requires collaboration across teams and participation in project planning and execution.
Top Skills: AnsibleApacheAzureC#ChefIisJavaJbossPerlPowershellPuppetPythonRubyTomcat
Reposted 2 Days AgoSaved
In-Office
San Mateo, CA, USA
130K-200K Annually
Senior level
130K-200K Annually
Senior level
Edtech
The Senior Site Reliability Engineer ensures system reliability and performance, develops monitoring solutions, identifies problems, and partners with engineering teams for scalable solutions.
Top Skills: AWSBashCC++DockerGCPJavaKubernetesPerlPython
Reposted 2 Days AgoSaved
In-Office
Austin, TX, USA
Senior level
Senior level
Financial Services
The Senior Site Reliability Engineer will own the operational reliability of developer tooling ecosystems and improve developer productivity through efficient processes and automation.
Top Skills: .NetBashPowershellPython
Reposted 2 Days AgoSaved
Remote
5 Locations
150K-200K Annually
Senior level
150K-200K Annually
Senior level
Artificial Intelligence • Blockchain • Information Technology • Consulting
Lead design and build of production-grade Azure infrastructure using Terraform, ensuring scalable, secure, and repeatable deployments. Provide technical leadership, platform enhancements, observability and incident response improvements, and Tier 2 infrastructure support while collaborating with engineering, security, and product teams to meet enterprise readiness and feature parity goals.
Top Skills: ArgoAzureGoGrafanaKubernetesPrometheusPythonSpaceliftTerraform
Reposted 2 Days AgoSaved
In-Office or Remote
Atlanta, GA, USA
Senior level
Senior level
Healthtech • Pharmaceutical
The Senior Site Reliability Engineer will ensure systems operate smoothly, improve performance, automate tasks, and coordinate with teams in a hybrid environment.
Top Skills: AWSAzureAzure DevopsBambooDockerElixirGCPGithub ActionsGoJenkinsKubernetesPython
Reposted 2 Days AgoSaved
Hybrid
Palo Alto, CA, USA
140K-220K Annually
Senior level
140K-220K Annually
Senior level
Software
The Senior SRE will ensure reliability of production systems, design monitoring processes, and build automation tools while collaborating in a regulated environment. The role blends operational tasks and coding responsibilities.
Top Skills: AWSCdkCloudfrontEcsGithub ActionsHoneycombLambdaNixOpentelemetryPythonRdsSQLTerraformTypescript
Reposted 2 Days AgoSaved
In-Office
Washington, DC, USA
Senior level
Senior level
Big Data • Analytics • Business Intelligence • Big Data Analytics
Seeking a Site Reliability Engineer to manage AI platform reliability, automate tasks, optimize ML pipelines, and lead incident response in a hybrid engineering role.
Top Skills: ArgocdBigQueryCloud BuildDockerDvcGithub ActionsGoGrafanaKubeflowKubernetesMlflowPrometheusPub/SubPythonTerraformVertex Ai
Reposted 2 Days AgoSaved
In-Office
3 Locations
106K-156K Annually
Senior level
106K-156K Annually
Senior level
Fintech
Responsible for enhancing application infrastructure, ensuring reliability and scalability, automating processes, implementing observability, and collaborating with software development teams.
Top Skills: AWSDockerGitGoJavaJavaScriptKubernetesLinuxPythonRubySwarm
Reposted 2 Days AgoSaved
Remote
USA
Senior level
Senior level
Digital Media • Software • Sports
Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.
Top Skills: AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
3 Days AgoSaved
Remote
USA
164K-220K Annually
Senior level
164K-220K Annually
Senior level
Robotics • Software
Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.
Top Skills: AWSBashContainerizationDockerGoGrafanaIamJetsonKubernetesLinuxPrometheusPythonRosRos 2Terraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account