Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Site Reliability Engineer Jobs

Vertafore

Sr. Site Reliability Engineer

Reposted 16 Days AgoSaved

Hybrid

Denver, CO, USA

110K-145K Annually

Senior level

110K-145K Annually

Senior level

Information Technology • Insurance • Software

Responsible for the reliability and performance of production services, managing SLIs and SLOs, and leading incident responses while collaborating with various teams.

Top Skills: .NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows

Vertafore

Sr. Site Reliability Engineer

Reposted 16 Days AgoSaved

Remote or Hybrid

CO, USA

110K-145K Annually

Senior level

110K-145K Annually

Senior level

Information Technology • Insurance • Software

The Sr. Site Reliability Engineer at Vertafore will own the reliability and performance of production services, design incident response protocols, and enhance system observability while applying software engineering practices.

Top Skills: .NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows

Airwallex

Senior Site Reliability Engineer, Spend

Reposted 17 Days AgoSaved

Hybrid

San Francisco, CA, USA

160K-250K Annually

Senior level

160K-250K Annually

Senior level

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI

Lead design and delivery of scalable cloud infrastructure for the Spend product. Embed with development teams to drive reliability, performance, observability, incident response, and automation. Own SLOs, runbooks, DevOps metrics, and collaborate with central DevOps and security teams to ensure compliance and resilience. Lead infrastructure projects including new service launches, data centre migrations, and modernising data pipelines.

Top Skills: Analytics PipelinesAWSData StreamingDevOpsGCPIncident ResponseKubernetesObservabilitySlosSre

Backblaze

Site Reliability Engineer II

Reposted 11 Days AgoSaved

In-Office or Remote

San Mateo, CA, USA

Junior

Cloud • Information Technology

The Site Reliability Engineer II role involves ensuring the stability and reliability of services, automating operational tasks, and collaborating with teams for system design while promoting reliability practices.

Top Skills: AnsibleAWSAzureBashCatchpointDockerElkGCPGoGrafanaJenkinsKubernetesPrometheusPythonTerraform

Florence Healthcare

Site Reliability Engineer (SRE)

Reposted 11 Days AgoSaved

In-Office

Atlanta, GA, USA

Mid level

Healthtech • Software

The Site Reliability Engineer (SRE) will enhance platform reliability and scalability through AI-driven automation, collaborate with product engineers, and manage incidents, monitoring, and documentation processes.

Top Skills: AWSCi/CdTerraform

OXIO

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

USA

Mid level

Other

As a Site Reliability Engineer, you will design cloud platforms, automate operations, maintain infrastructure, and support engineering teams in delivering reliable services.

Top Skills: AnsibleAWSAzureBashCircleCICloudFormationDatadogDnsDockerGitlab CiGoGCPGrafanaHTTPHttpsJenkinsKubernetesKvmLinuxPerlPrometheusPythonRubyTcp/IpTerraformUnixVMware

AXS

Site Reliability Engineer II

Reposted 12 Days AgoSaved

In-Office

Los Angeles, CA, USA

130K-145K Annually

Mid level

130K-145K Annually

Mid level

Events

The Site Reliability Engineer II designs and maintains scalable systems, focusing on automation, monitoring, incident response, and collaboration with developers to enhance operational practices and efficiency.

Top Skills: BashCloud Service OperationsContainersContinuous DeliveryContinuous IntegrationGoInfrastructure As CodeOrchestration PlatformsPython

RingCentral

Site Reliability Engineer

Reposted 12 Days AgoSaved

Hybrid

Denver, CO, USA

95K-136K Annually

Senior level

95K-136K Annually

Senior level

Artificial Intelligence • Cloud • Events • Productivity • Software • Business Intelligence • Conversational AI

Maintain and improve uptime, availability, and performance of services via observability, redundancy, failover, and load‑balancing. Integrate monitoring into SDLC, lead incident response/on‑call, assess capacity and risks, and work with teams to extend observability and automate self‑healing.

Top Skills: AlertmanagerAnsibleArgocdAWSAzureBashElkGCPGitlabGitlab CiGoGrafanaJavaJavaScriptJenkinsKafkaKubernetesLinuxMongoDBMySQLNginxPostgresPrometheusPythonTerraformVictoriametricsZabbix

Mistral AI

Site Reliability Engineer - NYC

Reposted 12 Days AgoSaved

Hybrid

New York, NY, USA

Senior level

Artificial Intelligence

Seeking an experienced Site Reliability Engineer to enhance platform reliability, scalability, and performance by balancing operations with long-term software engineering improvements.

Top Skills: AIBashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform

Baseten

SRE

Reposted 12 Days AgoSaved

Remote or Hybrid

2 Locations

165K-330K Annually

Mid level

165K-330K Annually

Mid level

Software

As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.

Top Skills: Tensorrt

Moonlite AI

Sr. Site Reliability Engineer (SRE)

Reposted 12 Days AgoSaved

In-Office or Remote

2 Locations

165K-225K Annually

Senior level

165K-225K Annually

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

Build and operate production-grade AI infrastructure using Kubernetes, ensuring high availability, reliability, and performance. Develop custom operators and implement automation for efficient operations and monitoring.

Top Skills: AnsibleBashElk StackEnterprise Storage SystemsGrafanaHigh-Performance NetworkingKubernetesLinuxNvidia Gpu TechnologiesPrometheusPythonTerraform

TherapyNotes, LLC

Senior Database Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

United States

120K-160K Annually

Senior level

120K-160K Annually

Senior level

Healthtech • Other • Software

As a Senior Database Site Reliability Engineer, you'll design, implement, and maintain PostgreSQL systems, ensure reliability, automate maintenance tasks, and participate in incident response.

Top Skills: AnsibleBashDatadogGrafanaNew RelicPostgresPowershellPrometheusPythonTerraform

New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free

Accela

Principal Site Reliability Engineer

Reposted 12 Days AgoSaved

In-Office or Remote

Basel, KS, USA

160K-185K Annually

Senior level

160K-185K Annually

Senior level

Software

Technical leader responsible for reliability, scalability, performance, and operational excellence of a cloud SaaS platform. Drive platform modernization to containers/Kubernetes on Azure, define SLOs/SLAs, lead observability, incident response/RCA, automation/tooling, and mentor engineers while ensuring compliance with public-sector standards.

Top Skills: AnsibleArgo CdBashClaude CodeDistributed TracingFedrampFluxGitGitGithub CopilotHipaaKubernetesLinuxLoggingAzureMonitoringObservability PlatformsOpentelemetryPci-DssPowershellPythonSoc 2StaterampTerraformVm-Based ArchitecturesWindows

OneStream Software

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

USA

114K-148K Annually

Senior level

114K-148K Annually

Senior level

Software • Financial Services

Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).

Top Skills: .NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu

Alpaca

Staff Site Reliability Engineer, Database

Reposted 12 Days AgoSaved

Remote

USA

Senior level

Fintech • Information Technology

As a Site Reliability Engineer at Alpaca, you will ensure system reliability and performance, troubleshoot issues, and collaborate with teams to design scalable features.

Top Skills: GoGormLinuxPgxPostgresPrometheusSqlc

Chess.com

Site Reliability Engineer

Reposted 12 Days AgoSaved

Remote

USA

Senior level

Gaming • Software

The Site Reliability Engineer will manage infrastructure stability and scalability, lead cloud migrations, and optimize performance across systems while mentoring team members.

Top Skills: AnsibleAWSAzureBashChefCloudFormationDatadogDockerElk StackGCPGoGrafanaKubernetesPrometheusPuppetPythonTerraformUnix/Linux

Blue Cross and Blue Shield of Nebraska

Technical Analyst - SRE

Reposted 12 Days AgoSaved

In-Office

Omaha, NE, USA

Mid level

Healthtech • Insurance

Owner of enterprise observability and SRE practices: define SLOs/SLA measurement, drive MTTR reduction, lead incident response, maintain service dependency maps and reliability dashboards, and leverage AI/AIOps to automate triage, root cause analysis, and self-healing remediation across vendor and internal platforms.

Top Skills: Ai/AiopsBashChaos EngineeringCi/CdCmdbDashboardingData ModelingDistributed TracingInfrastructure-As-CodeItsm/Ticketing SystemsLog AggregationMonitoring PlatformsObservability PlatformsPowershellPythonSIEMTelemetry

Fortive

Software Engineer (.NET & SRE)

Reposted 12 Days AgoSaved

In-Office

3 Locations

Mid level

Hardware • Other • Software • Appliances • Industrial • Manufacturing

Develop and maintain UIs and APIs using Next.js and .NET. Implement AWS services, apply SRE principles, and contribute to CI/CD pipelines.

Top Skills: .NetAWSAws CloudformationC#DockerEc2Entity FrameworkGrafanaKubernetesLambdaNext.JsPrometheusRdsReactS3Terraform

Pura

Staff SRE

12 Days AgoSaved

In-Office

Pleasant Grove, UT, USA

Expert/Leader

Hardware • Internet of Things

Lead architecture and implementation of enterprise-scale infrastructure and automation for web, mobile, backend, and data teams. Define reliability standards, incident response and DR strategies, optimize performance with advanced observability, and mentor engineering teams while driving SRE best practices across the organization.

Top Skills: AWSGCPGoIamKubernetesNode.jsObservabilityPythonTerraform

Kong

Staff Site Reliability Engineer - Volcano

12 Days AgoSaved

Remote

United States

150K-210K Annually

Senior level

150K-210K Annually

Senior level

Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics

Founding Staff SRE for Volcano: define SLOs/error budgets, architect multi-region Kubernetes infrastructure, build GitOps/CI-CD with ArgoCD/Helm/Terraform, scale managed Postgres/Redis/object storage, implement observability with Datadog/Prometheus/Grafana, lead incident response and SRE culture, and mentor cross-functional teams.

Top Skills: ArgocdCanary DeploymentsCi/CdCniDatadogGitopsGrafanaHelmIngressKubernetesObject StoragePostgresPrometheusRedisService MeshTerraformTerragrunt

Invisible Technologies Inc

Principal Software Engineer (SRE/DevOps) - Remote

12 Days AgoSaved

In-Office or Remote

San Francisco, CA, USA

Expert/Leader

Artificial Intelligence • Information Technology • Software • Automation

Lead technical vision as a principal engineer, either managing teams or driving cross-team initiatives. Design and architect cloud infrastructure, networking, and security; define authentication/authorization patterns; architect and operate Kubernetes deployments; and implement infrastructure-as-code using tools like Terraform, CloudFormation, Ansible, or Puppet.

Top Skills: AnsibleAWSCloudFormationGCPIamKubernetesPuppetRbacSecurity GroupsTerraform

Morgan Stanley

Site Reliability Engineer

Reposted 12 Days AgoSaved

In-Office

Alpharetta, GA, USA

Senior level

Fintech • Financial Services

Lead Site Reliability Engineer responsible for production support, automating deployments, monitoring availability and performance, troubleshooting infrastructure and applications, driving reliability improvements, collaborating with development and infrastructure teams, and participating in 24/7 on-call rotation.

Top Skills: AutosysAWSAzureC#Ci/CdContainersDb2Generative Ai ToolsIp SoftJavaJenkinsLinuxMqOraclePerlPythonRubyShellSockeyeSplunkSybaseTrainUnixVirtual MachinesWeb ServicesWindows

LSEG (London Stock Exchange Group)

Site Reliability Engineer - - Electronic Trading Team

Reposted 12 Days AgoSaved

In-Office

2 Locations

Senior level

Fintech • Analytics

As a Site Reliability Engineer, you will ensure the reliability and performance of a FX trading platform, develop automation, improve system health, and manage SLOs while collaborating with development teams.

Top Skills: AWSAzureBashC#JavaKubernetesPythonSQL

OpenAI

Site Reliability Engineer, Frontier Systems Infrastructure

Reposted 12 Days AgoSaved

In-Office

San Francisco, CA, USA

255K-490K Annually

Mid level

255K-490K Annually

Mid level

Artificial Intelligence • Machine Learning • Generative AI

As a Site Reliability Engineer, you will manage Kubernetes clusters, automate infrastructure, improve operational metrics, and enhance reliability across data centers.

Top Skills: CloudFormationGoGpuKubernetesLinuxPythonTerraform

DMSI

Site Reliability Engineer

Reposted 12 Days AgoSaved

In-Office

Omaha, NE, USA

Mid level

Software

As a Site Reliability Engineer, you'll optimize monitoring and alerting systems, enhance user experience, and support teams with actionable insights and automation.