Staff Site Reliability Engineer

Reposted 12 Days Ago
Hiring Remotely in United States
Remote
220K-250K Annually
Expert/Leader
Cloud • Software • Database
We're the company behind YugabyteDB, the 100% open source cloud native database for mission critical applications.
The Role
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Summary Generated by Built In

Yugabyte is the company behind YugabyteDB, the AI-ready, multi-modal, distributed PostgreSQL database for cloud-native apps. Trusted by industry leaders including Shopify, Paramount+, GM, Kroger, Fiserv, and NPCI, YugabyteDB has been deployed in over 100 countries and powers more than 5 million clusters worldwide.

Together, our hard-working team of experts and our industry-leading technology are uniquely positioned to meet the demands of modern workloads: geo-distributed, ultra-resilient, and built to scale without limits.

Our Yugabeings (distributed, like our database) span 12+ countries and multiple time zones, sharing expertise from diverse backgrounds and industries.

YugabyteDB Aeon Staff Site Reliability Engineer 

At YugabyteDB, we are on a mission to build an open source, high-performance, distributed, and fault tolerant PostgresQL compatible database for powering global, internet-scale applications.  The YugabyteDB Managed team is building a Database as a Service (DBaaS) to run in major cloud providers, and be available globally. 

As a Site Reliability Engineer focused on database availability and reliability you will be using your skills to operate and automate the life cycle of the YugabyteDB DBaaS.  You will design and build processes that will spin up systems and the infrastructure that manages the databases using secure, reliable, scalable and highly observable methodologies.  You will be using, operating, and configuring Kubernetes environments (GKE, EKS, AKS), Java frameworks, Shell scripts, Python scripts, Terraform templates and many other cloud technologies.  You will participate in the on-call rotation for 12 hours a day over 7 days, every 4-5 weeks and manage incidents on the DBaaS infrastructure coordinating support for our customers.  You will learn how to diagnose problems with our database and infrastructure technology and help deliver reliable service to our customers. 

We are looking for a strong Staff SRE who exemplifies collaboration, teamwork, empathy and likes to lead by example. We enjoy working with people who are driven and thrive in a fast-paced startup environment, and who have a strong desire to build an internet-scale, extensible control plane with strong emphasis on simplicity and user experience.  

Responsibilities

  • Define and drive the technical vision, architecture, and strategy for YugabyteDB’s Database-as-a-Service (DBaaS).
  • Lead, Design, develop, test, debug, troubleshoot, and maintain components of the DBaaS cloud infrastructure
  • Manage operational priorities of the DBaaS infrastructure
  • Establish processes for handling and leading response to incidents on databases or infrastructure
  • Automate and manage regular maintenance operations such as upgrades etc.
  • Design and build DBaaS processes for encryption, security key/password management, storage management, etc. 
  • Utilize SRE golden signals to analyze and optimize the DBaaS system's performance and reliability strategies

Requirements

  • Strong software design and implementation skills in building infrastructure frameworks
  • 15+ years of experience as a SRE and 5+ years of technical leadership experience
  • Experience in building and managing large-scale distributed systems
  • Experience building and operating data systems for production applications, including fault tolerant designs, software lifecycles, and automation of critical operations
  • Strong track record of Incident Response and Management in a managed service which is mission critical for its customers
  • Experience with:
    • Relational Database systems (PostgresQL preferred)
    • Public cloud infrastructure (AWS, GCP, and/or Azure)
    • Containerization tooling, theory and design (Docker, Kubernetes)
    • Infrastructure as Code (Terraform preferred)
    • Configuration Management Tooling (Ansible preferred)
    • Automation Scripting (Python and Bash preferred)
    • Monitoring systems (Prometheus preferred)
    • Version control systems (git preferred)
    • CI/CD systems (GitHub Actions preferred)
  • Solid understanding of Linux systems operations and troubleshooting
  • Willingness and ability to learn new languages and concepts

We feel strongly about equal pay for equal work, and transparency in compensation is one way to help achieve that. The cash compensation for this role is market competitive, with a range of USD 220,000-USD 250,000, inclusive of variable/incentive for some roles. As well as equity (when applicable), and benefits including health plans, retirement plans, and unlimited paid time off (PTO). The pay range for this position is a general guideline only and not a guarantee of compensation or salary. The actual pay will vary based on factors including experience, qualifications, and skill level.

Due to the Proclamation, “Restriction on Entry of Certain Nonimmigrant Workers”, which went into effect on September 21, 2025, at this time we are no longer able to sponsor new H-1B visa petitions filed after September 21, 2025 for new hires. We are still able to consider candidates who require H-1B extensions, changes of employer, or other types of work authorization.

#LI-Hybrid

Equal Employment Opportunity Statement:

As an equal opportunity employer, Yugabyte is committed to a diverse workforce. Employment decisions regarding recruitment and selection will be made without discrimination based on race, color, religion, national origin, gender, age, sexual orientation, physical or mental disability, genetic information or characteristic, gender identity and expression, veteran status, or other non-job related characteristics or other prohibited grounds specified in applicable federal, state and local laws. 

To review Yugabyte's Privacy Policy please visit Yugabyte Privacy Notice.

Skills Required

  • 15+ years of experience as a Site Reliability Engineer
  • 5+ years of technical leadership experience
  • Experience building and managing large-scale distributed systems
  • Experience building and operating production data systems with fault tolerant designs and automated operations
  • Strong software design and implementation skills for infrastructure frameworks
  • Incident response and management experience for mission-critical managed services
  • Experience with relational database systems (PostgreSQL preferred)
  • Public cloud infrastructure experience (AWS, GCP, and/or Azure)
  • Containerization tooling and design experience (Docker, Kubernetes; GKE/EKS/AKS)
  • Infrastructure as Code experience (Terraform preferred)
  • Configuration management experience (Ansible preferred)
  • Automation scripting experience (Python and Bash preferred)
  • Monitoring systems experience (Prometheus preferred)
  • Version control experience (git preferred) and CI/CD (GitHub Actions preferred)
  • Solid understanding of Linux systems operations and troubleshooting
  • Willingness and ability to learn new languages and concepts
  • Participate in on-call rotation and manage incidents (12-hour shifts every 4-5 weeks)

Yugabyte Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Yugabyte and has not been reviewed or approved by Yugabyte.

  • Fair & Transparent Compensation Pay is positioned as competitive for U.S. engineering roles, with total compensation described as aligned with mid-stage startup norms. Base salary ranges for mid-level engineers are framed as strong relative to the Bay Area market.
  • Healthcare Strength Health coverage is characterized as comprehensive, spanning medical, dental, and vision with high employer premium coverage in the U.S. Mental health support and low-deductible plan design are also highlighted as meaningful parts of the package.
  • Leave & Time Off Breadth Time-off offerings are described as broad, including unlimited PTO with typical usage levels cited and additional holiday coverage. Parental leave policies are called out as a notable component of the overall benefits mix.

Yugabyte Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Sunnyvale, CA
400 Employees
Year Founded: 2016

What We Do

At Yugabyte, we're building a diverse, multi-generational company with the vision to make YugabyteDB the distributed SQL database of choice for mission-critical cloud native applications. After a decade spent modernizing their IT infrastructure and applications, we are seeing companies enter a new phase of database modernization initiatives. We aim to be their database partner on this journey.

Why Work With Us

It’s easy to claim that your company is a best place to work, but at Yugabyte we know that we are. Why? Because our Yugabeings tell us so. Being able to work with and learn from smart people all over the world, along with a sense of ownership, mentoring opportunities, and pride in a job well done. That's what being a Yugabeing is all about.

Gallery

Gallery

Similar Jobs

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

Coinbase Logo Coinbase

Site Reliability Engineer

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
USA
4700 Employees
218K-257K Annually

Dropbox Logo Dropbox

Site Reliability Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
United States
2500 Employees
223K-302K Annually

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
Orlando, FL, USA
68000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account