Top Site Reliability Engineer Jobs

Reposted 9 Days AgoSaved
In-Office
New York, NY, USA
177K-265K Annually
Senior level
177K-265K Annually
Senior level
Fintech • Financial Services
The Site Reliability Engineer Lead oversees daily operations and architectural resilience, driving SRE principles for application performance and efficiency, and fostering a culture of technical excellence.
Top Skills: AnsibleAppdynamicsGoGrafanaJavaKubernetesLokiMimirOpenshiftPrometheusPythonTempoTerrraform
Reposted 9 Days AgoSaved
Remote
United States
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Reposted 9 Days AgoSaved
In-Office
6 Locations
90K-122K Annually
Mid level
90K-122K Annually
Mid level
Fintech • Analytics
The Site Reliability Engineer will manage production monitoring, incident response, and enhance automation using various tools. They will ensure observability and participate in SRE process improvements.
Top Skills: AWSCucumberDatadog ApmDatadog DbmDynamoDBEc2EcsElkJavaJenkinsPagerdutyPlaywrightRdsS3Secrets ManagerSeleniumServicenowSplunkSpring Boot
Reposted 10 Days AgoSaved
In-Office
St. Petersburg, FL, USA
86K-109K Annually
Senior level
86K-109K Annually
Senior level
Information Technology • Consulting
The Site Reliability Engineer will drive the observability roadmap, standardize monitoring practices, optimize alerting tools, and collaborate with teams to enhance operational efficiency and system reliability.
Top Skills: .NetAsp.Net CoreAWSAzureC#DatadogDockerGCPGrafanaKubernetesNew RelicPowershellPrometheusReactSplunkWeb Apis
10 Days AgoSaved
Remote
USA
Mid level
Mid level
Information Technology • Software
As a DevOps/Site Reliability Engineer, you will manage cloud infrastructure, CI/CD pipelines, and improve system reliability and performance while supporting AI data pipelines.
Top Skills: AWSDatadogEc2EksGithub ActionsGoGrafanaIamKubernetesPrometheusPythonRdsS3Terraform
10 Days AgoSaved
In-Office
Norfolk, VA, USA
92K-167K Annually
Mid level
92K-167K Annually
Mid level
Information Technology • Software
The SRE Product Owner leads the SRE team, manages product strategy, engages stakeholders, optimizes reliability, and enhances automation processes.
Top Skills: AnsibleAtlassian ProductsDod 8570.01 Iat Level IiPowershellPython
10 Days AgoSaved
Remote
United States
120K-160K Annually
Senior level
120K-160K Annually
Senior level
Healthtech • Other • Software
The role involves managing PostgreSQL services, ensuring high availability and performance, driving incident response, automating tasks, and improving observability for a 24x7 SaaS platform.
Top Skills: AnsibleBashDatadogGrafanaHaproxyNew RelicPgbackrestPgbouncerPostgresPowershellPrometheusPythonRepmgrTerraform
Reposted 15 Days AgoSaved
Hybrid
O'Fallon, MO, USA
96K-163K Annually
Senior level
96K-163K Annually
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
The Senior BizOps Engineer is responsible for ensuring platform stability and resilience, guiding teams in product development, and facilitating operational excellence throughout the software lifecycle.
Top Skills: ArtifactoryBitbucketCC++ChefDynatraceGitGoJavaJenkinsMavenOraclePerlPl/SqlPostgresPythonRubySplunkSQL
Reposted 10 Days AgoSaved
Remote
USA
Mid level
Mid level
Software • Analytics
The role involves automating and managing AWS infrastructure, ensuring reliability and scalability of stateful systems, and optimizing deployment processes. You'll also handle incident responses and improve operational tooling.
Top Skills: AWSKubernetesTerraformTerragrunt
Reposted 10 Days AgoSaved
In-Office
Aliso Viejo, CA, USA
146K-219K Annually
Senior level
146K-219K Annually
Senior level
Gaming
The role involves ensuring production quality, owning system reliability, and participating in decision-making. Responsibilities include incident response and lifecycle management in cloud gaming technologies.
Top Skills: BashC++ElasticsearchGoIstioJavaKafkaKong Api GatewayKubernetesKumaLinkerdMongoDBMySQLPostgresPythonRedisRust
Reposted 10 Days AgoSaved
In-Office
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Software
As a Site Reliability Engineer at Mercor, you will ensure production reliability, develop SRE function, and collaborate with engineering teams to maintain system performance.
Top Skills: AWSKubernetesSpaceliftTerraform
Reposted 10 Days AgoSaved
Remote
US
136K-177K Annually
Senior level
136K-177K Annually
Senior level
Big Data • Machine Learning • Software • Analytics
As a Lead Site Reliability Engineer, you will drive the reliability strategy, improve system health, lead incident management, and mentor engineers for a multi-region SaaS platform.
Top Skills: ArgocdC++Ci/CdCloud PlatformsDatadogGitopsGrafanaInfrastructure As CodeJavaJavaScriptKubernetesPython
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 10 Days AgoSaved
Remote
2 Locations
Junior
Junior
Computer Vision • Information Technology • Machine Learning • Natural Language Processing • Real Estate • Software
The SRE will maintain infrastructure for SaaS products on AWS, support developers, manage platform components, and handle IT tasks.
Top Skills: AWSComputer VisionIacLarge Language ModelsNlpTerraform
Reposted 10 Days AgoSaved
In-Office
St. Louis, MO, USA
Senior level
Senior level
Fintech • Analytics
The Site Reliability Engineer will support and automate critical Real Time applications, ensuring service availability and quality across cloud and on-premise deployments, while also collaborating with various teams on operational documentation and incident management.
Top Skills: AWSAzureDatadogDockerGitKubernetesPythonUnix/Linux
Reposted 10 Days AgoSaved
Remote
United States
205K-270K Annually
Senior level
205K-270K Annually
Senior level
Artificial Intelligence • Other • Sales • Software
The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.
Top Skills: ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform
11 Days AgoSaved
In-Office
Arlington, VA, USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Cybersecurity • Defense
As a Site Reliability Engineer, you'll ensure system reliability in a government environment, manage incidents, and collaborate with engineering teams on operational tasks and improvements while maintaining security compliance.
Top Skills: AWSBashDockerDocker ComposeGrafanaLinux/UnixLokiMimirPrometheusPythonTerraform
11 Days AgoSaved
Remote
United States
66K-88K Annually
Mid level
66K-88K Annually
Mid level
Cloud • Information Technology
The Site Reliability Engineer I is responsible for supporting Backblaze’s infrastructure stability by addressing customer issues, monitoring system health, and improving operational processes through documentation and automation.
Top Skills: AnsibleLinuxZabbix
Reposted 11 Days AgoSaved
In-Office
Jefferson Park, NJ, USA
170K-230K Annually
Mid level
170K-230K Annually
Mid level
Fintech • Financial Services
The role involves developing and delivering software solutions, collaborating cross-functionally, ensuring secure coding practices, managing multi-faceted projects, and mentoring team members.
Top Skills: FrameworksProgramming LanguagesTools
11 Days AgoSaved
In-Office
Houston, TX, USA
Mid level
Mid level
Other • Energy
The Site Reliability Engineer will build and maintain reliable systems on Google Cloud Platform, automate operations, and improve system performance and reliability.
Top Skills: AirflowBigQueryCloud MonitoringDataflowDatastreamDockerGithub ActionsGitlab CiGoGoogle Cloud PlatformGrafanaIamJavaKubernetesPrometheusPythonTerraform
11 Days AgoSaved
Remote or Hybrid
United States
165K-190K Annually
Mid level
165K-190K Annually
Mid level
Artificial Intelligence • Healthtech • Information Technology • Software
As the first Site Reliability Engineer in the US, you'll ensure platform stability and oversee incident responses during PST hours, bridging infrastructure and code, while improving operability and compliance in a medical-device environment.
Top Skills: AWSElixirKubernetesTerraform
11 Days AgoSaved
Hybrid
2 Locations
Mid level
Mid level
AdTech • Big Data • Marketing Tech • Software
Responsible for owning and optimizing the Internal Developer Platform, improving reliability, scalability, and usability while supporting engineering teams and standardizing operational processes through automation and best practices.
Top Skills: ArmAWSAzureBashCloudFormationConsulDockerGithub ActionsHashicorpJenkinsKubernetesLinuxNomadPowershellPythonSplunkSumo LogicTerraformVaultWindows
11 Days AgoSaved
Remote
5 Locations
320K-489K Annually
Expert/Leader
320K-489K Annually
Expert/Leader
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Lead the design and operation of large scale Kubernetes clusters, ensuring high availability and performance while supporting system lifecycle and reliability improvements.
Top Skills: ContainersGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
Reposted 11 Days AgoSaved
Remote or Hybrid
7 Locations
Senior level
Senior level
Artificial Intelligence • Information Technology • Software
The role involves defining and evolving technical foundations for AI evaluation, optimizing performance, designing resilient systems, and collaborating with various teams for infrastructure improvements.
Top Skills: Node.jsPostgresServerless EnvironmentsTypescript
Reposted 11 Days AgoSaved
In-Office
Miami, FL, USA
Senior level
Senior level
Healthtech
The Senior Software Engineer will enhance system reliability, manage Kubernetes and AWS environments, oversee incident responses, and implement observability measures.
Top Skills: AWSCloudwatchElbGithub ActionsKubernetesObservability ToolingTerraformVpc
Reposted 11 Days AgoSaved
Hybrid
Atlanta, GA, USA
Mid level
Mid level
Fintech • Payments • Financial Services
Build, operate, and scale AWS-based infrastructure using IaC (Terraform), manage EKS and serverless environments, create CI/CD pipelines, implement observability (OpenTelemetry/Prometheus/New Relic), support Postgres/RDS (Aurora), lead incident response and define SRE practices (SLIs/SLOs/error budgets).
Top Skills: AuroraAWSAws RdsAzureCloudFormationEcsEksGithub ActionsGitlabGoGCPJavaKubernetesNew RelicOpentelemetryOpentofuPostgresPrometheusPythonRubyServerlessTerraformTerragrunt
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account