Maximum of 25 job preferences reached.
Top Site Reliability Engineer Jobs
Fintech • Payments • Financial Services
Lead EarnIn's AI-first reliability engineering, enhancing incident response, automation, and resilience in operations while mentoring engineers.
Top Skills:
AIAWSCloudwatchDatadogGoKubernetesOpentelemetryPythonTerraform
Aerospace • Other
The role involves managing Kubernetes and Linux servers, supporting containerized applications, implementing automation solutions, and mentoring peers in a fast-paced environment.
Top Skills:
AnsibleDockerGoGrafanaHelmInfluxdbJSONKubernetesLinuxPrometheusPythonRkeTerraformYaml
Aerospace • Other
The Sr. IT Linux Site Reliability Engineer will manage and optimize Kubernetes clusters, automate systems, and collaborate with teams to ensure system resilience and performance.
Top Skills:
AnsibleDockerGoGrafanaKubernetesLinuxPrometheusPythonTerraform
Cloud • Fintech • HR Tech
Operate, monitor, and maintain Workdays Core Platform to ensure high availability and security. Automate infrastructure and CI/CD pipelines, improve observability and incident response, troubleshoot platform issues, document systems, and collaborate with engineering teams. Support federal contracts requiring U.S. citizenship and potential security clearance; may require onsite presence in the DC/MD/VA area.
Top Skills:
Automated TestingCi/CdConcurrencyDistributed SystemsInfrastructure AutomationJavaKotlinLoad TestingMultithreadingObservabilityScala
Financial Services
Lead SRE designing and automating infrastructure and operations to ensure availability, scalability, and performance. Own incident response, monitoring/observability, capacity planning, performance optimization, CI/CD automation, and documentation while mentoring teams and driving automation-first improvements.
Top Skills:
AnsibleAutosysAWSAzureBashCi/CdConfiguration ManagementControl-MDockerDremioEksElasticsearch ApmElasticsearch ObservabilityGCPGoJavaJenkinsKubernetesMongoDBOpentelemetryPythonQlik ReplicateSnowflakeSQLTerraform
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills:
KubernetesLinuxOpenstackPython
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills:
ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Artificial Intelligence • eCommerce • Retail • Software
Build and maintain CI/CD pipelines, manage and automate cloud infrastructure and configurations, implement monitoring/logging and alerting for reliability, enforce security and compliance practices, and collaborate with development teams to support scaling and operations.
Top Skills:
Soc ISoc Ii
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills:
AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills:
AnsibleBashGCPGkeGoKubernetesPulumiPython
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Fintech • Analytics
The role involves managing application services, driving improvements, handling incidents, and leveraging domain knowledge to enhance service quality and efficiency.
Top Skills:
DatadogItrs
Information Technology • Software
The Site Reliability Engineer will support critical services management and deployment, coordinate with teams, and participate in 24x7 on-call rotations.
Top Skills:
AnsibleDockerJenkinsKubernetesLinuxOpenstackPythonRhelSelinux
Fintech
The Principal Site Reliability Engineer designs , improves software and tools for performance, scalability, and availability, while leading incident management and collaborating with development teams.
Top Skills:
AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
The Site Reliability Engineer will architect solutions for SaaS applications, maintain cloud infrastructure, implement security best practices, and collaborate with teams for product quality. Responsibilities include incident management, monitoring, and automation of processes.
Top Skills:
AWSBashCloudFormationCloudwatchDockerDynatraceGrafanaJenkinsKubernetesMssqlMySQLNew RelicPerlPostgresPythonSplunkTerraform
Cryptocurrency
Own production reliability, availability, and performance for cloud-native systems. Operate and scale Kubernetes (EKS) clusters, manage AWS infrastructure, implement IaC with Terraform and Helm, improve CI/CD, build observability with Prometheus/Grafana/EFK, lead incident response and RCA, participate in on-call rotations, and support security and compliance.
Top Skills:
AirflowAws BatchAws Ec2Aws LambdaAws OrganizationsBashClickhouseCloudwatchDatabricksDockerDynamoDBEfk (ElasticsearchEksElasticacheEmrFluentdGitlab Ci/CdGitopsGrafanaHelmHpaKafkaKarpenterKedaKibana)KubernetesLoad BalancingNatPostgresPrometheusPythonRdsRedisS3SnowflakeSparkSqsTerraformTlsVpcVpn
Fitness • Retail • Sports • Manufacturing
Design, implement, and maintain highly available, scalable infrastructure and CI/CD for applications. Automate deployments, monitor performance, troubleshoot incidents, manage IaC with Terraform, support disaster recovery, and collaborate with dev and ops teams to improve reliability and security.
Top Skills:
Application InsightsAzureAzure DevopsBashBitbucketDockerGCPGcp Cloud MonitoringGitGrafanaHelmJenkinsKubernetesPowershellPrometheusTerraform
Internet of Things • Software • Manufacturing
Lead and oversee cloud operations and Site Reliability Engineering for a global IoT ecosystem, architecting strategies for performance, security, and innovation while mentoring a team of professionals in multi-cloud environments.
Top Skills:
AnsibleAzureCi/CdCloudElkGrafanaIotKubernetesPrometheusSreTerraform
Artificial Intelligence • Machine Learning • Biotech • Generative AI
The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.
Top Skills:
AnsibleDockerGrafanaKubernetesPrometheusPythonTailscaleTalos Linux
Artificial Intelligence • Machine Learning • Security • Database • Analytics • Big Data Analytics
As a Site Reliability Engineer, you'll ensure the availability and performance of AI applications, maintain infrastructure, automate tasks, and troubleshoot issues in high-scale environments.
Top Skills:
AnsibleAWSAzureBashCircleCICloudFormationDatadogDockerDynatraceEc2Elk StackGCPGitlab CiGoGrafanaJenkinsKubernetesLambdaLinuxPrometheusPythonS3TerraformUnix
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills:
AWSAzureC++GCPGoKubernetesOci
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills:
AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
25 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills:
AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
25 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills:
AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Popular Job Searches
All Software Engineer Jobs
.NET Developer Jobs
Aerospace Thermal Engineering Jobs
AI Engineer Jobs
Android Developer Jobs
Automation Engineer Jobs
Backend Developer Jobs
Blockchain Developer Jobs
C# Jobs
C++ Jobs
Cloud Architect Jobs
Cloud Engineer Jobs
Design Engineer Jobs
DevOps Engineer Jobs
Director Of Engineering Jobs
Electrical Engineering Jobs
Embedded Software Engineer Jobs
Engineering Jobs
Engineering Manager Jobs
Environmental Engineering Jobs
Field Engineer Jobs
Front End Developer Jobs
Full Stack Developer Jobs
Game Developer Jobs
Golang Jobs
Hardware Engineer Jobs
Industrial Engineering Jobs
iOS Developer Jobs
Java Developer Jobs
Javascript Developer Jobs
Linux Jobs
Manufacturing Engineer Jobs
Mechanical Engineering Jobs
Network Engineer Jobs
PHP Developer Jobs
Process Engineer Jobs
Project Engineer Jobs
Prompt Engineering Jobs
Python Jobs
QA Jobs
Robotics Engineer Jobs
Ruby on Rails Jobs
Salesforce Administrator Jobs
Salesforce Developer Jobs
Scala Jobs
Sharepoint Developer Jobs
Site Reliability Engineer Jobs
Software Engineering Manager Jobs
Solutions Architect Jobs
SQL Developer Jobs
Structural Engineer Jobs
System Engineer Jobs
Test Engineer Jobs
Web Developer Jobs
All Filters
Total selected ()
No Results
No Results










.jpg)


.png)







.png)











