Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Fintech
Lead adoption of SRE practices to improve reliability, observability, automation, and incident response. Implement and maintain observability tooling, instrumentation, CI/CD, and infrastructure-as-code. Partner with developers, participate in on-call rotations, drive postmortems, and reduce operational overhead through automation.
Top Skills:
AnthropicAWSAws EcsAws EksAzureC#DockerGitlab CiGrafanaLinuxOpenaiPrometheusPuppetPythonSplunkTerraformTypescriptWindows
Aerospace • Hardware • Software • Defense • Manufacturing
As a Site Reliability Engineer, you'll ensure robotics system reliability, build telemetry integration, and develop tools for diagnostics and automation, collaborating with engineering teams for enhanced production reliability.
Top Skills:
C++DatadogGoKubernetesOpentelemetryPrometheusPythonRos2TelegrafTypescript
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills:
AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
Fintech • Financial Services
The Site Reliability Engineer will support cloud infrastructure, automate deployments, and ensure operational efficiency and governance across public cloud platforms.
Top Skills:
AnsibleAWSAzureAzure CliAzure FunctionsAzure Kubernetes ServiceCosmodbGCPGitJenkinsKubernetesLinuxPowershellTerraformWindows
Transportation
Design and develop Waabi's observability stack, optimize performance, build automation tooling, and support application requirements while leading projects and mentoring teams.
Top Skills:
AWSC/C++DockerGoGrafanaJavaKubernetesOpentelemetryPythonRust
Fintech
The Site Reliability Engineer will manage AWS infrastructures, oversee application deployments, and ensure system reliability and security while collaborating with teams.
Top Skills:
AWSBashCodebuildCodedeployCodepipelineEc2IamPythonRdsRoute 53S3TerraformVpc
Artificial Intelligence • Healthtech
The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.
Top Skills:
AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Observability Engineer maintains monitoring systems, designs log aggregation solutions, automates tasks with scripts, and ensures platform performance.
Top Skills:
AnsibleBashDynatraceElasticsearchElkFilebeatFluentbitFluentdGrafanaLinuxLogstashOtelPowershellPrometheusPythonTerraform
Fintech • Information Technology • Professional Services • Software
The Site Reliability Engineer serves as a consultant for Taxwell, focusing on ensuring the reliability and performance of their tax preparation software.
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.
Top Skills:
AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform
Aerospace • Other
The Site Reliability Engineer, GNC at SpaceX oversees mission-critical GNC products, operates servers, maintains HPC clusters, and enhances services and infrastructure to support space operations.
Top Skills:
AnsibleBazelDockerGradleKubernetesLinuxMakeNpmPipPuppetPythonTerraformVagrant
Artificial Intelligence • Hardware • Machine Learning • Natural Language Processing • Software • Generative AI
As a Cloud Site Reliability Engineer, you will ensure the reliability, performance, and scalability of AI inferencing services, participate in on-call rotations, manage cloud infrastructure, and automate CI/CD processes while collaborating on incident management and capacity planning.
Top Skills:
ArgocdCloudFormationDatadogDockerElk StackGithub ActionsGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Hardware • Information Technology • Other • Software • Analytics
Architect and operate ML/agent pipelines and infrastructure, deploy and monitor models at scale, pioneer MLOps/Agent Ops best practices, collaborate with domain experts, and test/optimize ML systems for production reliability and cost efficiency.
Top Skills:
Bash ScriptingContainerization (E.G.Docker)Git/GithubLinuxModel VersioningMonitoringNumpyPandasPythonPyTorchScikit-Learn
Information Technology • Software • Web3 • Infrastructure as a Service (IaaS)
Operate and improve the Pod platform: respond to incidents, investigate root causes, build automation and observability, design monitoring/alerting, reduce alert fatigue, and drive reliability improvements across production systems.
Top Skills:
BashCi/CdCloudDockerGrafanaLinuxPagerdutyPrometheusPythonRust
Artificial Intelligence • Software • Generative AI
Lead reliability, scalability, and operational health of a production platform. Evolve Kubernetes, CI/CD, IaC, and observability. Build tooling and automation, improve monitoring/incident response, partner with engineering to identify and mitigate scaling risks, and influence platform direction across reliability, security, performance, and cost.
Top Skills:
Ci/CdCloud-Native ArchitectureContainer OrchestrationGitopsGpu ProvisioningIncident ResponseInfrastructure As CodeKubernetesLoggingMetricsMulti-CloudObservabilityPythonTracingTypescript
Fintech • Financial Services
The Director of Splunk Platform Engineering & SRE owns the enterprise Splunk platform, drives incident resolution, optimizes systems, and mentors engineers, focusing on automation and performance.
Top Skills:
AnsibleGitGoJavaKubernetesLinux/UnixMoogPrometheusPythonSplunk
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills:
Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
Information Technology • Logistics • Transportation • Analytics • Business Intelligence • 3PL: Third Party Logistics • Industrial
As a Site Reliability Engineer, you'll enhance reliability for Phenix WMS and automation systems, focusing on incident reduction and system health through observability and automation. Responsibilities include defining SLIs and SLOs, participating in incident response, and testing disaster recovery plans.
Top Skills:
AnsibleAzureBashCi/CdKubernetesPowershellPythonTerraform
Financial Services
As a Site Reliability Engineer II, you will build, operate, and scale systems for CME Group's Clearing portfolio. Responsibilities include collaborating with teams, monitoring services, scripting for efficiency, and improving system performance, particularly during the migration to Google Cloud Platform.
Top Skills:
BashGoogle Cloud PlatformGrafanaKubernetesLinuxOpentelemetryPrometheusPythonSplunk
Security
The Director of DevSecOps and SRE will lead teams in SRE, Cloud Infrastructure, and DevOps practices, focusing on automation, infrastructure reliability, and security policies while mentoring engineers and managing software projects.
Top Skills:
Aws Cloud TechnologiesGitlabGrafanaJavaKubernetesLokiMaterial UiPostgresPrometheusRabbitMQReactReduxSentrySpringTailwindTerraform
Artificial Intelligence • Robotics • Automation • Manufacturing
Responsible for managing and setting up internal systems infrastructure, migrating SaaS to self-hosted solutions, implementing monitoring systems, and ensuring security compliance.
Top Skills:
AnsibleAWSAzureCloudFormationDatadogDnsGCPGrafanaHTTPLinux/UnixPrometheusTcp/IpTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Responsible for developing incident management guidelines, supporting production systems, defining reliability metrics, and driving automation for high service availability.
Top Skills:
GoGrafanaPerlPrometheusPythonRuby
Information Technology
As a Site Reliability Engineer, you'll develop resilient infrastructure, automate tasks, handle incident response, and support classified environments for the Intelligence Community.
Top Skills:
ArgocdBitbucketElasticsearchGitlabJava SpringbootKafkaKubernetesMongoDBNifi
Information Technology
The Site Reliability Engineer will enhance infrastructure resilience, automate processes, and implement monitoring tools to support the Intelligence Community.
Top Skills:
AWSConfluenceDockerGitJenkinsJIRAKubernetesLinuxNessusPacker
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills:
AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results



































