Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Information Technology • Insurance • Software
Responsible for the reliability and performance of production services, managing SLIs and SLOs, and leading incident responses while collaborating with various teams.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Information Technology • Insurance • Software
The Sr. Site Reliability Engineer at Vertafore will own the reliability and performance of production services, design incident response protocols, and enhance system observability while applying software engineering practices.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead design and delivery of scalable cloud infrastructure for the Spend product. Embed with development teams to drive reliability, performance, observability, incident response, and automation. Own SLOs, runbooks, DevOps metrics, and collaborate with central DevOps and security teams to ensure compliance and resilience. Lead infrastructure projects including new service launches, data centre migrations, and modernising data pipelines.
Top Skills:
Analytics PipelinesAWSData StreamingDevOpsGCPIncident ResponseKubernetesObservabilitySlosSre
Cloud • Information Technology
The Site Reliability Engineer II role involves ensuring the stability and reliability of services, automating operational tasks, and collaborating with teams for system design while promoting reliability practices.
Top Skills:
AnsibleAWSAzureBashCatchpointDockerElkGCPGoGrafanaJenkinsKubernetesPrometheusPythonTerraform
Healthtech • Software
The Site Reliability Engineer (SRE) will enhance platform reliability and scalability through AI-driven automation, collaborate with product engineers, and manage incidents, monitoring, and documentation processes.
Top Skills:
AWSCi/CdTerraform
Other
As a Site Reliability Engineer, you will design cloud platforms, automate operations, maintain infrastructure, and support engineering teams in delivering reliable services.
Top Skills:
AnsibleAWSAzureBashCircleCICloudFormationDatadogDnsDockerGitlab CiGoGCPGrafanaHTTPHttpsJenkinsKubernetesKvmLinuxPerlPrometheusPythonRubyTcp/IpTerraformUnixVMware
Events
The Site Reliability Engineer II designs and maintains scalable systems, focusing on automation, monitoring, incident response, and collaboration with developers to enhance operational practices and efficiency.
Top Skills:
BashCloud Service OperationsContainersContinuous DeliveryContinuous IntegrationGoInfrastructure As CodeOrchestration PlatformsPython
Artificial Intelligence • Cloud • Events • Productivity • Software • Business Intelligence • Conversational AI
Maintain and improve uptime, availability, and performance of services via observability, redundancy, failover, and load‑balancing. Integrate monitoring into SDLC, lead incident response/on‑call, assess capacity and risks, and work with teams to extend observability and automate self‑healing.
Top Skills:
AlertmanagerAnsibleArgocdAWSAzureBashElkGCPGitlabGitlab CiGoGrafanaJavaJavaScriptJenkinsKafkaKubernetesLinuxMongoDBMySQLNginxPostgresPrometheusPythonTerraformVictoriametricsZabbix
Artificial Intelligence
Seeking an experienced Site Reliability Engineer to enhance platform reliability, scalability, and performance by balancing operations with long-term software engineering improvements.
Top Skills:
AIBashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform
Software
As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.
Top Skills:
Tensorrt
Artificial Intelligence • Cloud • Information Technology • Software
Build and operate production-grade AI infrastructure using Kubernetes, ensuring high availability, reliability, and performance. Develop custom operators and implement automation for efficient operations and monitoring.
Top Skills:
AnsibleBashElk StackEnterprise Storage SystemsGrafanaHigh-Performance NetworkingKubernetesLinuxNvidia Gpu TechnologiesPrometheusPythonTerraform
Healthtech • Other • Software
As a Senior Database Site Reliability Engineer, you'll design, implement, and maintain PostgreSQL systems, ensure reliability, automate maintenance tasks, and participate in incident response.
Top Skills:
AnsibleBashDatadogGrafanaNew RelicPostgresPowershellPrometheusPythonTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Software
Technical leader responsible for reliability, scalability, performance, and operational excellence of a cloud SaaS platform. Drive platform modernization to containers/Kubernetes on Azure, define SLOs/SLAs, lead observability, incident response/RCA, automation/tooling, and mentor engineers while ensuring compliance with public-sector standards.
Top Skills:
AnsibleArgo CdBashClaude CodeDistributed TracingFedrampFluxGitGitGithub CopilotHipaaKubernetesLinuxLoggingAzureMonitoringObservability PlatformsOpentelemetryPci-DssPowershellPythonSoc 2StaterampTerraformVm-Based ArchitecturesWindows
Software • Financial Services
Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).
Top Skills:
.NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu
Fintech • Information Technology
As a Site Reliability Engineer at Alpaca, you will ensure system reliability and performance, troubleshoot issues, and collaborate with teams to design scalable features.
Top Skills:
GoGormLinuxPgxPostgresPrometheusSqlc
Gaming • Software
The Site Reliability Engineer will manage infrastructure stability and scalability, lead cloud migrations, and optimize performance across systems while mentoring team members.
Top Skills:
AnsibleAWSAzureBashChefCloudFormationDatadogDockerElk StackGCPGoGrafanaKubernetesPrometheusPuppetPythonTerraformUnix/Linux
Healthtech • Insurance
Owner of enterprise observability and SRE practices: define SLOs/SLA measurement, drive MTTR reduction, lead incident response, maintain service dependency maps and reliability dashboards, and leverage AI/AIOps to automate triage, root cause analysis, and self-healing remediation across vendor and internal platforms.
Top Skills:
Ai/AiopsBashChaos EngineeringCi/CdCmdbDashboardingData ModelingDistributed TracingInfrastructure-As-CodeItsm/Ticketing SystemsLog AggregationMonitoring PlatformsObservability PlatformsPowershellPythonSIEMTelemetry
Hardware • Other • Software • Appliances • Industrial • Manufacturing
Develop and maintain UIs and APIs using Next.js and .NET. Implement AWS services, apply SRE principles, and contribute to CI/CD pipelines.
Top Skills:
.NetAWSAws CloudformationC#DockerEc2Entity FrameworkGrafanaKubernetesLambdaNext.JsPrometheusRdsReactS3Terraform
Hardware • Internet of Things
Lead architecture and implementation of enterprise-scale infrastructure and automation for web, mobile, backend, and data teams. Define reliability standards, incident response and DR strategies, optimize performance with advanced observability, and mentor engineering teams while driving SRE best practices across the organization.
Top Skills:
AWSGCPGoIamKubernetesNode.jsObservabilityPythonTerraform
Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics
Founding Staff SRE for Volcano: define SLOs/error budgets, architect multi-region Kubernetes infrastructure, build GitOps/CI-CD with ArgoCD/Helm/Terraform, scale managed Postgres/Redis/object storage, implement observability with Datadog/Prometheus/Grafana, lead incident response and SRE culture, and mentor cross-functional teams.
Top Skills:
ArgocdCanary DeploymentsCi/CdCniDatadogGitopsGrafanaHelmIngressKubernetesObject StoragePostgresPrometheusRedisService MeshTerraformTerragrunt
Artificial Intelligence • Information Technology • Software • Automation
Lead technical vision as a principal engineer, either managing teams or driving cross-team initiatives. Design and architect cloud infrastructure, networking, and security; define authentication/authorization patterns; architect and operate Kubernetes deployments; and implement infrastructure-as-code using tools like Terraform, CloudFormation, Ansible, or Puppet.
Top Skills:
AnsibleAWSCloudFormationGCPIamKubernetesPuppetRbacSecurity GroupsTerraform
Fintech • Financial Services
Lead Site Reliability Engineer responsible for production support, automating deployments, monitoring availability and performance, troubleshooting infrastructure and applications, driving reliability improvements, collaborating with development and infrastructure teams, and participating in 24/7 on-call rotation.
Top Skills:
AutosysAWSAzureC#Ci/CdContainersDb2Generative Ai ToolsIp SoftJavaJenkinsLinuxMqOraclePerlPythonRubyShellSockeyeSplunkSybaseTrainUnixVirtual MachinesWeb ServicesWindows
Reposted 12 Days AgoSaved
Fintech • Analytics
As a Site Reliability Engineer, you will ensure the reliability and performance of a FX trading platform, develop automation, improve system health, and manage SLOs while collaborating with development teams.
Top Skills:
AWSAzureBashC#JavaKubernetesPythonSQL
Artificial Intelligence • Machine Learning • Generative AI
As a Site Reliability Engineer, you will manage Kubernetes clusters, automate infrastructure, improve operational metrics, and enhance reliability across data centers.
Top Skills:
CloudFormationGoGpuKubernetesLinuxPythonTerraform
Software
As a Site Reliability Engineer, you'll optimize monitoring and alerting systems, enhance user experience, and support teams with actionable insights and automation.
Top Skills:
AnsibleAWSAzureBashDatadogElk StackGCPGitGrafanaJenkinsNagiosNew RelicPowershellPrometheusPythonTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Companies Hiring Site Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results






%20(1).png)














.jpg)












