Site Reliability Engineer II

Posted 5 Days Ago
Be an Early Applicant
Hiring Remotely in Costa Rica
Remote
Mid level
Fintech • Internet of Things • Payments • Software
Our mission is to power the world’s best companies to win in the Subscription Economy.
The Role
Maintain and improve reliability, scalability, and performance of Zuora's SaaS platform. Build automation (IaC, self-healing, remediation), apply AI/ML for predictive monitoring, lead incident response and root cause analysis, instrument telemetry, tune performance, and collaborate cross-functionally to reduce manual work and improve operational excellence.
Summary Generated by Built In
Costa Rica
About Zuora

At Zuora, we help businesses grow smarter and adapt faster. Our platform powers modern business models — from subscriptions and usage-based pricing to AI-driven and outcome-based offerings — helping companies launch new products, automate complex billing, and unlock predictable, recurring revenue.

We’ve led the Subscription Economy for more than a decade. Now we’re evolving again by building the definitive platform for quote to cash and helping companies monetize their products and services with an adaptable, AI-ready foundation.


This is a location-specific position that requires you to come into the office regularly to be most effective.

*Zuora Costa Rica office (Heredia): hybrid model with 3 days in office and 2 days remote.


The Opportunity

Join Zuora's high-impact Operations team and help power the backbone of our industry-leading SaaS platform. In this role, you'll help ensure the reliability, scalability, and performance of Zuora's global production environment while building the next generation of intelligent operations.

We're looking for an engineer who enjoys solving complex infrastructure challenges, embraces an automation-first mindset, and is excited about applying AI and modern cloud technologies to improve operational excellence.

You'll have the opportunity to:

  • Design and implement intelligent automation for infrastructure lifecycle management, including self-healing, anomaly detection, and automated remediation using Infrastructure as Code (IaC) and AI-driven tooling.
  • Apply AI/ML techniques for predictive monitoring and proactive performance optimization to identify issues before they impact customers.
  • Lead complex incident response efforts and root cause analyses, embedding automation and continuous learning into operational processes.
  • Improve system reliability through dynamic scaling, telemetry instrumentation, and automated performance tuning.
  • Enhance operational runbooks and playbooks by eliminating manual processes through automation.
  • Evaluate and adopt emerging AIOps, cloud-native, and distributed systems technologies to continuously improve our platform.
  • Partner cross-functionally with Product Engineering, Customer Support, Global Services, Deal Desk, and Sales to deliver exceptional customer experiences.

Our technology stack includes Linux, Python, Docker, Kubernetes, AWS, Kafka, ActiveMQ, MySQL, Oracle, Redis, Tomcat, Jenkins, Terraform, GitOps, Ansible, Puppet, Prometheus, Grafana, OpenTelemetry, Debezium, Web Application Firewalls, and Load Balancers.


About You

You're passionate about building reliable systems, automating repetitive work, and continuously improving how infrastructure operates. You enjoy troubleshooting complex technical problems, collaborating across teams, and learning new technologies.

We're looking for someone with:

  • 2–4 years of experience in Linux systems administration and/or Python development in production environments.
  • Strong Linux administration skills, including troubleshooting, service management, performance tuning, and networking fundamentals.
  • Experience developing Python scripts or lightweight applications to automate operational workflows and system management.
  • Hands-on experience with Docker and familiarity with Kubernetes concepts, including deployments, services, and scaling.
  • At least one year of experience supporting SaaS or cloud-native production environments.
  • Working knowledge of messaging platforms and databases such as Kafka, Redis, MySQL, or similar technologies.
  • Experience contributing to CI/CD pipelines and deployment automation.
  • Hands-on experience with monitoring and observability platforms such as Prometheus, Grafana, or similar tools.
  • Experience participating in incident response, post-incident reviews, and root cause analysis.
  • A demonstrated passion for automation and improving operational efficiency.

Nice to have:

  • Experience with Jenkins, Terraform, GitOps, or advanced Infrastructure as Code practices.
  • Exposure to AI/ML technologies for anomaly detection, predictive operations, or intelligent automation.
  • Relevant certifications such as RHCSA, AWS/Azure/GCP certifications, PCAP (Python), Docker Certified Associate (DCA), Certified Kubernetes Administrator (CKA), or SRE-related certifications.
 About the Team

Zuora's Operations team is responsible for keeping our global SaaS platform running reliably, securely, and at scale. We combine operational excellence with engineering best practices to build resilient systems that enable our customers to succeed.

Our team believes that the best operations are automated, observable, and continuously improving. We invest heavily in modern cloud infrastructure, AI-driven operations, and engineering innovation to reduce manual work, improve reliability, and empower our engineers to solve meaningful technical challenges.

If you're excited about building intelligent infrastructure, driving automation, and shaping the future of cloud operations, we'd love to meet you.


Benefits

Zuora offers a comprehensive total rewards package designed to support ZEOs’ wellbeing, growth, and flexibility. While specific offerings may vary by country, we typically provide:

  • Competitive compensation, variable bonus and performance-based reward opportunities, and retirement programs
  • Medical, dental, and vision insurance
  • Generous, flexible time off, plus paid holidays, wellness days, and a company-wide year-end break
  • Paid parental leave (including fully paid leave for eligible ZEOs, subject to local policy)
  • Learning & development stipend to support ongoing growth
  • Opportunities to volunteer and give back, including charitable donation matching where available
  • Mental wellbeing resources and support

*Benefits may vary by location; details will be shared during the interview process


#ZEOLife at Zuora

ZEOs (our employees) are empowered to take ownership, challenge the status quo, and make a real impact. We:

  • Collaborate deeply across teams and regions
  • Learn constantly and iterate often
  • Build an inclusive, high-performance culture where people feel inspired, connected, and valued
Our Commitment to an Inclusive Workplace

Think, be and do you.
At Zuora, different perspectives, experiences, and contributions matter — everyone counts.


Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all. We do not discriminate on the basis of, and consider individuals seeking employment with Zuora without regard to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.


We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to [email protected] (or local equivalent, where applicable).

Skills Required

  • 2-4 years of experience in Linux systems administration and/or Python development in production environments
  • Strong Linux administration skills including troubleshooting, service management, performance tuning, and networking fundamentals
  • Experience developing Python scripts or lightweight applications to automate operational workflows
  • Hands-on experience with Docker and familiarity with Kubernetes concepts (deployments, services, scaling)
  • At least one year supporting SaaS or cloud-native production environments
  • Working knowledge of messaging platforms and databases such as Kafka, Redis, MySQL, or similar
  • Experience contributing to CI/CD pipelines and deployment automation
  • Hands-on experience with monitoring and observability platforms such as Prometheus and Grafana
  • Experience participating in incident response, post-incident reviews, and root cause analysis
  • Demonstrated passion for automation and improving operational efficiency
  • Experience with Jenkins, Terraform, GitOps, or advanced Infrastructure as Code practices
  • Exposure to AI/ML technologies for anomaly detection, predictive operations, or intelligent automation
  • Relevant certifications (RHCSA, AWS/Azure/GCP, PCAP, DCA, CKA, SRE-related)

Zuora Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Zuora and has not been reviewed or approved by Zuora.

  • Parental & Family Support Parental leave is described as up to six months of fully paid time for all parents globally, signaling a standout family support policy. Feedback suggests this benefit is a prominent part of the package.
  • Leave & Time Off Breadth Time off offerings include flexible or unlimited PTO for U.S. salaried roles, wellness days, and a company‑wide winter break. Feedback suggests the breadth of options supports meaningful time away when teams enable it.
  • Equity Value & Accessibility Equity programs such as RSUs and an ESPP with a discount are highlighted as meaningful components of total rewards. Feedback suggests these ownership programs enhance perceived competitiveness across several roles.

Zuora Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Redwood City, CA
1,500 Employees
Year Founded: 2007

What We Do

At Zuora, we do Modern Business. We’re helping people subscribe to new ways of doing business that are better for customers, companies and ultimately the planet. It’s an approach resulting from the shift to the Subscription Economy that puts customers first (building ongoing relationships instead of one-time product sales) and focuses on sustainable growth. Through our leading expertise and multi-product suite, we are transforming all industries and working with the world’s most innovative companies to monetize new business models, nurture subscriber relationships and optimize their digital experiences.

Why Work With Us

As an industry pioneer, our work is constantly evolving and challenging us in new ways that require us to think differently, iterate often and learn constantly. Our people, whom we call “ZEOs" are empowered to take on a mindset of ownership and work together in collaboration to make what’s next possible for our customers, community and the world.

Gallery

Gallery

Similar Jobs

Akamai Technologies Logo Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
17M-30M Annually

Backblaze Logo Backblaze

Site Reliability Engineer

Cloud • Information Technology
Remote
4 Locations
363 Employees

Akamai Technologies Logo Akamai Technologies

Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
15M-32M Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account