Senior Site Reliability Engineer

Posted 2 Days Ago
Easy Apply
Be an Early Applicant
Hiring Remotely in Costa Rica
Remote
Senior level
AdTech • Cloud • Marketing Tech • Productivity • Software • Analytics • Automation
We are Acquia. We are building for the future of the web, and we want you to be a part of it!
The Role
The Senior Site Reliability Engineer will design, implement, and maintain CI/CD pipelines, cloud infrastructure, and monitoring solutions, advocating DevOps practices while ensuring system reliability, scalability, and automation. This role involves collaborating with engineering teams to create a self-service infrastructure and implementing security measures throughout the deployment process.
Summary Generated by Built In

Acquia empowers the world's most ambitious brands to create digital customer experiences that matter. With open source Drupal at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.

Headquartered in the U.S., Acquia is positioned as a market leader by the analyst community and is listed as one of the world's top software companies by The Software Report. We are Acquia. We are a global company with employees located in more than 30 countries, and we're building for the future. We want you to be a part of it!

About the role:

As a Senior Site Reliability Engineer, you will be a key player in designing, implementing, and maintaining our CI/CD pipelines, cloud infrastructure, and monitoring solutions. Your expertise in tools like ArgoCD, Kubernetes, and cloud-native architecture will help us achieve operational excellence at scale. You will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

This is a hands-on role for someone who thrives in an environment where automation is the goal, reliability is the baseline, and scalability is second nature. You won't just be maintaining systems-you'll be innovating, designing new ways to make our infrastructure smarter and our development faster.

Job Responsibilities: 

  • CI/CD Pipeline Mastery: Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools. Ensure zero-downtime, fully automated deployment pipelines.
  • Infrastructure as Code (IaC): Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools. Ensure everything is automated-from deployments to monitoring-so that infrastructure becomes a self-service platform.
  • Cloud Expertise: Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance. Implement disaster recovery, fault tolerance, and high availability strategies.
  • Monitoring and Alerting: Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers. Design and implement automated alerts for proactive system health monitoring.
  • DevOps Advocacy: Champion the culture of DevOps across teams-promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams. Be the go-to person for CI/CD, infrastructure scaling, and deployment automation.
  • SRE Mindset: Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals.
  • Security-First Approach: Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management.
  • Collaboration with Engineering Teams: Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production.

Skills:

  • BS in Computer Science or a comparable field of study, or equivalent practical experience.
  • Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript. 
  • Experience with Unix/Linux systems administration using the CLI.
  • Fundamental understanding of TCP/UDP networking concepts
  • Solid oral and written communications skills.
  • CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments.
  • Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production.
  • Cloud Mastery: Proficient in at least one major cloud provider-AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus.
  • IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure.
  • Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems.
  • Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening.
  • Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure.
  • Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization.

Preferred Qualifications: 

  • 5-9 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment.
  • Proven experience mentoring junior team-members. 
  • Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools.
  • Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure.
  • Strong coding skills in Python, Go, or Ruby.
  • Experience with service mesh architectures like Istio or Linkerd is a plus.
  • SRE Certification (or equivalent experience) is a bonus.
  • Certified Kubernetes Administrator (CKA) is preferred.
  • A passion for automation, observability, and reliability.

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Top Skills

Argocd
AWS
Azure
Datadog
Elk
GCP
Grafana
Jenkins
Kubernetes
Prometheus
Terraform

What the Team is Saying

Anne
Sarah
Brenno
Amy H.
Amy P.
Meagen
Mary
Leslie Persaud
The Company
HQ: Boston, MA
1,100 Employees
Hybrid Workplace
Year Founded: 2007

What We Do

Acquia is the open digital experience company. We provide the world's most ambitious brands with products built around Drupal to allow them to embrace innovation and create customer moments that matter. At Acquia, we believe in the power of community and collaboration — giving our customers and partners the freedom to build tomorrow on their terms.

Why Work With Us

At Acquia we value the differences in our life experiences and viewpoints. We believe that cultivating and supporting a diverse team globally is directly tied to our success as an organization, fueling greater innovation, productivity and business outcomes. We make it possible for all Acquians to make a lasting impact.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Acquia Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Typical time on-site: Flexible
HQBoston, MA
Ballerup, DK
Paris, FR
Pune, Maharashtra
Reading, GB
Sydney, NSW
Tokyo, Shibuya-ku
Learn more

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account