Senior Cloud Site Reliability Engineer

Posted 23 Days Ago
Hiring Remotely in USA
Remote
110K-140K
Senior level
Software
The Role
As a Senior Cloud Site Reliability Engineer, you'll advance cloud infrastructure via IaC, enhance CI/CD pipelines, ensure reliability, and collaborate with engineering teams to improve overall platform performance and observability.
Summary Generated by Built In

Radicle Health is a collection of human services software products designed to foster collaboration and innovation, helping organizations better serve their communities. We believe technology plays a crucial role in the success of the human services sector, but no single system can meet the diverse needs of every agency. That’s why we’ve built Radicle Health as a home for mission-driven products that support organizations in delivering essential services. Under one roof, our teams learn from each other, test ideas faster, and think holistically about the individuals and communities we serve.

About the Job: 

Join the Radicle Health shared SRE team to help build and evolve a unified platform supporting the SaraWorks, Link2Feed, and Foothold Care Management applications. Our environment spans AWS, Azure, and GCP; this role will focus primarily on AWS (commercial and some GovCloud) and GCP initially, while contributing patterns and tooling usable across all clouds. You'll partner closely with individual product pods while shaping shared standards, automation, and platform capabilities.

Who you are: 

  • 5+ years in SRE / DeOps / Platform / Infrastructure Engineering.
  • Eligibility for (or prior possession of) a PIV credential (Tier 1 background investigation).
  • Strong AWS foundations (networking, IAM, compute, storage, managed databases, VPC design); exposure to multi-cloud concepts.
  • Linux systems administration and production troubleshooting proficiency.
    Production container experience (Docker plus ECS, Fargate, EKS, or Kubernetes).
  • Hands-on IaC (Terraform, Pulumi, or CloudFormation) with willingness to adopt Pulumi.
  • Scripting or programming in at least one of: Python, Bash, TypeScript, Go, Ruby, or similar.
  • CI/CD pipeline design and maintenance (GitLab CI or equivalent).
  • Practical observability (metrics, logs, tracing, alert strategy design -- we're invested in DataDog here).
  • Incident response/on‑call participation with follow‑through on remediation.
  • Clear written and verbal communication; able to tailor depth to audience.
  • Availability during core US Eastern collaboration hours.

Preferred Experience:

  • Pulumi (via Cloud or Self-Hosted).
  • AWS GovCloud experience; familiarity with compliance frameworks (HIPAA, FedRAMP, SOC 2).
  • GCP services (GKE, Cloud SQL, IAM, networking) and foundational Azure awareness.
  • Advanced container orchestration (autoscaling strategies, service mesh, workload isolation).
  • Performance tuning & optimization for PostgreSQL or other relational databases.
  • Application ecosystem familiarity: Ruby and/or .NET.
  • Disaster recovery strategy, resilience / chaos engineering practice.
  • AI-assisted DevOps / AIOps tooling: e.g., GitHub Copilot, incident automation, AI-driven runbook generation, etc.
  • Experience applying LLMs or automation to infra workflows (e.g., generating IaC modules, intelligent alert tuning, predictive scaling).
  • Familiarity with AI transformation initiatives: governance, data sensitivity considerations, and secure integration of AI into engineering workflows.

What you’ll be responsible for: 

1. Infrastructure as Code & Cloud Engineering  

  • Design, build, and evolve AWS (and GCP initial scope) infrastructure using IaC (Pulumi preferred; Terraform/CloudFormation experience transferable).  

2. Container & Runtime Platform  

  • Advance containerization (ECS/Fargate, EKS/Kubernetes, or equivalent) and establish secure, observable runtime patterns.  

3. CI/CD & Release Engineering

  • Enhance pipelines (GitLab CI or similar) for reliable builds, automated testing, artifact/version management, and progressive delivery.

4. Collaboration & Enablement  

  • Partner with engineering pods on hands-on implementation, architecture, incident response readiness, and post‑incident improvement.  

5. Observability & Operational Excellence  

  • Implement actionable metrics, tracing, structured logging, and intelligent alerting; refine SLOs and reduce MTTR.  

6. Reliability & Performance  

  • Lead capacity planning, resilience reviews, failover / DR exercises, and performance tuning aligned to SLIs/SLOs.  

7. Security & Compliance  

  • Embed least‑privilege IAM, secrets management, hardened configurations, and support compliance needs (e.g., GovCloud, healthcare).  

8. Automation & Tooling  

  • Eliminate toil via scripting, reusable service templates, policy-as-code, and self‑service operational workflows.  

9. Documentation & Runbooks  

  • Maintain clear architecture diagrams, decision records, playbooks, and onboarding guides.  

10. Incident & On‑Call  

  • Participate in a humane rotation; drive blameless retros and ensure remediation actions are implemented.

What we offer: 

  • Unlimited PTO policy 
  • Competitive medical, dental, and vision healthcare coverage  
  • 401k matching 
  • Paid holidays 
  • Volunteer time off 
  • Paid parental leave 
  • Remote work stipend  
  • Compensation: $110,000 - $140,000
  • Location: Remote 

Salary ranges are dependent on a variety of factors, including qualifications, experience and geographic location. More information about the salary range specific to your working location and other factors will be shared during the hiring process. 

Radicle Health is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Radicle Health does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy. 

Radicle Health is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Radicle Health does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy.

Top Skills

AWS
Azure
Bash
CloudFormation
Datadog
Docker
Ecs
Eks
Fargate
GCP
Gitlab Ci
Go
Kubernetes
Pulumi
Python
Ruby
Terraform
Typescript
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, New York
38 Employees

What We Do

Radicle Health acquires mission critical human services software companies. Today, Radicle Health's companies are: Foothold Technology, Exym, KCare, and Link2Feed.

We believe technology is at the root of success in the human services sector, but that no single system can meet the needs of every agency. So we’ve built Radicle Health around this guiding principle. Our companies are 100% committed to their products, their customers, and the individuals their customers serve. But under one roof, our teams can learn from each other, can more quickly test ideas, and can think holistically about our communities and the people at the center of those communities.

We believe that human services agencies and the people they serve deserve functional, modern, and easy-to-use software. And we believe we’re the ones to build it.

Similar Jobs

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
CA, USA
208K-334K

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office or Remote
3 Locations
168K-334K

Cox Enterprises Logo Cox Enterprises

Sr Customer Care Specialist (Manheim)

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Remote or Hybrid
AZ, USA
21-32

Gusto Logo Gusto

Staff Software Engineer

Fintech • HR Tech
Easy Apply
Remote or Hybrid
10 Locations
191K-265K

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account