Get the job you really want.
Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Artificial Intelligence • Information Technology
As a Site Reliability Engineer, maintain user-facing services, implement best practices for reliability, and manage production incidents.
Top Skills:
AnsibleCloud ServicesKubernetesProgramming LanguagesTerraform
Artificial Intelligence • Cloud • Information Technology • Mobile • Software • Consulting
The role involves designing and implementing observability solutions using OpenTelemetry, managing infrastructure through IaC, and establishing SRE practices. Strong expertise in cloud and DevOps engineering is required.
Top Skills:
ArgocdAWSAzureBashCloudFormationDockerGCPGithub ActionsGitlab CiGoJavaJenkinsKubernetesNode.jsOpentelemetryPowershellPulumiPythonRustTerraform
Information Technology
The Senior Site Reliability Engineer will design and optimize Kubernetes clusters, manage infrastructure with IaC tools, and enhance system reliability while collaborating with teams.
Top Skills:
AnsibleCluster ApiCniCriCsiKubernetesPulumiTerraform
Reposted 20 Days AgoSaved
Easy Apply
Easy Apply
Financial Services
As a Site Reliability Engineer, you'll ensure high availability of Commodities Technology applications, automate processes, and contribute to incident analysis and monitoring systems.
Top Skills:
AnsibleAWSC#DatadogDockerKubernetesLinuxPowershellPythonTerraformWindows
Legal Tech
Lead the design and automation of enterprise network infrastructures, managing cloud and on-premises networks with a focus on security and scalability.
Top Skills:
AnsibleAWSAzureBashBgpEvpnFortianalyzerFortinetFortinet Sd-WanFrroutingLinuxNsxNvidia CumulusOspfPalo AltoPanoramaPowershellPythonSolarwindsSonicTerraformVcfVMware
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills:
AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
Reposted 21 Days AgoSaved
Easy Apply
Easy Apply
Information Technology • Security • Software
Manage daily operations of a classified NOC, focusing on Kubernetes services, incident response, system monitoring, and ensuring security and availability.
Top Skills:
Aws GovcloudAzure GovernmentC2EC2SDockerElastic StackFluentdFluxGrafanaHelmJIRAJwccKubernetesOsticketPrometheusTerraform
Information Technology • Cybersecurity
The Director of SRE will oversee cloud infrastructure scalability, reliability, COGS optimization, and lead a team of SRE professionals while ensuring compliance and security of services.
Top Skills:
AWSAzureCi/CdCloudFormationDatadogGCPGrafanaKubernetesPrometheusPulumiSplunkTerraform
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills:
AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Fintech • Information Technology • Payments
The Staff SRE will improve system reliability, lead incident resolution, automate tasks, and support cloud migration efforts while ensuring secure software delivery.
Top Skills:
AWSEnterprise Monitoring ToolsMicrosoft StackMiddleware TechnologiesOrchestration ToolsPowershell
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Artificial Intelligence • Machine Learning • Security • Database • Analytics • Big Data Analytics
As a Site Reliability Engineer, you'll ensure the availability and performance of AI applications, maintain infrastructure, automate tasks, and troubleshoot issues in high-scale environments.
Top Skills:
AnsibleAWSAzureBashCircleCICloudFormationDatadogDockerDynatraceEc2Elk StackGCPGitlab CiGoGrafanaJenkinsKubernetesLambdaLinuxPrometheusPythonS3TerraformUnix
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Fintech • Financial Services
Lead the monitoring, automation, and incident response processes for platform stability, collaborating with cross-functional teams to enhance service reliability and performance optimization.
Top Skills:
AnsibleAWSAzureBashBigpandaDynatraceGCPGithub ActionsJenkinsLogscaleMonproPowershellPythonTerraform
Artificial Intelligence • Cloud • Software
The Senior Site Reliability Engineer will automate operations, optimize workflows for teams, manage secure infrastructure, and participate in on-call duties.
Top Skills:
AristaAWSBashCephChefCifsCiscoDnsDockerElk StackFortinetHpHTTPIcmpIscsiJenkinsKubernetesLinux/Debian Family/UbuntuMesosphereNfsNode.jsPivotal GreenplumPostgresPythonRabbitMQRubyS3ScyllaSshSslSupermicroTcpTls
Security • Software
The role involves developing and managing Tenable's cloud products, ensuring reliability and availability, automating systems, and collaborating on cloud technologies while meeting FedRAMP compliance.
Top Skills:
AWSAzureDockerGCPGradleHelmKubernetesNode.jsPythonTerraform
Information Technology
As a Site Reliability Engineer, you'll design and operate scalable storage systems and optimize performance for AI research data management.
Top Skills:
GoKubernetesPulumiRust
Artificial Intelligence • HR Tech • Legal Tech • Marketing Tech • Software • Conversational AI • Generative AI
The Site Reliability Engineer will enhance SaaS solutions' stability and scalability by automating workflows, monitoring systems, and responding to incidents.
Top Skills:
AnsibleAWSAzureDatadogDynatraceNew RelicPuppetTerraform
Information Technology • Software
The SRE will manage Verisign's data platform by architecting, deploying, and ensuring the stability and performance of large-scale data systems, while collaborating with multiple teams for customer support and infrastructure improvements.
Top Skills:
AnsibleDockerDruidHadoopJenkinsKafkaKubernetesPythonSpark
Information Technology • Software
Build and maintain Verisign's Kubernetes platform, enforce security practices, monitor performance, and provide tier 3 support. Requires extensive experience with Kubernetes and related technologies.
Top Skills:
GitJIRAKubernetesLinuxPythonTerraformUnix
Artificial Intelligence • Fintech • Software • Financial Services
Seeking a seasoned SRE to lead reliability for a cloud-native platform, overseeing infrastructure, CI/CD pipelines, observability, and mentoring engineers.
Top Skills:
AWSClickhouseGoJavaKafkaKubernetesPulumiTerraform
eCommerce • Retail • Software
The Director of Site Reliability Engineering will lead cloud deployment strategies, enhance automation and scalability, and mentor the engineering team.
Top Skills:
AnsibleApacheChefDockerGithub ActionsJenkinsKubernetesMongoDBMySQLNginxTerraform
Artificial Intelligence • Software
As a Principal Site Reliability Engineer, you will design hybrid infrastructure, integrate edge devices and cloud resources, optimize performance and costs, and collaborate with cross-functional teams to ensure robust systems.
Top Skills:
AWSGoKubernetesLinuxPythonTerraformTerragrunt
Cloud • Greentech • Other • Energy
As a Staff Site Reliability Engineer focused on storage, you'll ensure the reliability and performance of cloud storage systems while optimizing distributed, fault-tolerant architectures for AI workloads.
Top Skills:
AnsibleCCephDockerGlusterfsGoIscsiJavaKubernetesNfsNvme-OfOpenebsPuppetPythonSmbTerraform
Cloud • Greentech • Other • Energy
The role involves ensuring reliability of AI-optimized cloud services, focusing on design, automation, and performance for AI workloads.
Top Skills:
C++GoJavaKubernetesPython
Gaming • Software • Metaverse
The Senior Distributed Storage SRE Engineer manages distributed storage systems, ensuring stability, designing disaster recovery solutions, and optimizing performance. Responsibilities include incident response, tool development, and resource management.
Top Skills:
GoLinuxPythonShellTcp/IpUnix
Popular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results



























