Site Reliability Engineer

Reposted Yesterday
Be an Early Applicant
San Francisco, CA, USA
Hybrid
190K-220K Annually
Senior level
Artificial Intelligence • Big Data • Software
Airbyte, The Open Data Movement Platform.
The Role
You will manage the infrastructure for the Data Replication team, focusing on Kubernetes, reliability standards, and integrating product features with infrastructure. You'll enhance observability and tooling using AI, ensuring engineers can effectively manage their stack.
Summary Generated by Built In

Airbyte is the open‑source standard for data movement. We've enabled data teams to move data from applications, APIs, unstructured sources and databases to data warehouses, lakes, and AI applications. With tens of thousands of connectors built and hundreds of thousands of companies adopting Airbyte, we've proven the economics of data integration at scale. And now Airbyte is building the frontier agentic data infrastructure, purpose-built for AI agents that need fast, accurate access to data across hundreds of sources. Our mission: make data available and actionable, everywhere.

We've raised $181M from the world's top investors (Benchmark, Accel, Altimeter, Coatue, Y Combinator, etc.) and we believe in product-led growth, where we build something awesome that all our users love. We’ve raised enough capital to explore boldly, but we still choose to move quickly, stay scrappy, and experiment constantly as we find the right paths in an AI-native landscape.

The Role:

You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running over 3 million sync jobs a week powering thousands of data use cases across multiple regions and clouds. You’ll build and maintain the infrastructure, set reliability standards, drive down incidents, and make it easier and safer for engineers to ship through tooling. You're equally comfortable in a Terraform file, a Kubernetes cluster, and a postmortem doc.


We expect engineers here to actively use AI as a force multiplier - agentic tools to automate toil, augment incident response, and build smarter internal tooling. If you're not already doing this, you should be excited to start. We care as much about how you work as what you build. Trust, directness, and craftsmanship matter here.

What You’ll Do:
  • Own the infrastructure underpinning the Data Replication platform - Kubernetes clusters, CI/CD pipelines, secrets management, networking, and cloud resource configuration across AWS and GCP.

  • Partner with product engineers to reliably integrate product features with infrastructure.

  • Maintain and enhance observability, alerting, and anomaly detection with an eye towards LLM automation.

  • Maintain and enhance AI-augmented release and internal tooling: canary deployments, progressive rollouts, automated release qualification, and rollback automation - with an eye towards LLM automation.

  • Set the infrastructure bar for the team - build self-serve tooling, write runbooks, and coach engineers to own more of their stack.

What You’ll Need:
  • 7+ years in infrastructure, platform engineering, SRE, or DevOps.

  • Hands-on ownership of Kubernetes, Helm, and Terraform in production environments.

  • Deep experience with observability stacks (Prometheus, Grafana, Datadog) and on-call operations.

  • Experience with CI/CD pipeline ownership and developer tooling.

  • Ability & willingness to read backend code to understand how systems break and instrument them correctly.

  • Fluency with AI tools - LLMs and agentic frameworks to automate, debug faster, and reduce toil.

  • A startup-ready mindset: comfortable with ambiguity, moving fast, and owning problems end-to-end.

Nice To Have:
  • Data pipelines, replication systems, or ETL/ELT platforms.

  • Control plane / data plane architectures or internal developer platforms.

  • Experience with Airbyte, CDKs, or connector-based architectures.

Location:
  • Onsite 5 days/week in San Francisco, CA

If you find this role exciting, we encourage you to apply even if you think you don’t meet all of the requirements!

Airbyte is an equal opportunity employer that does not discriminate on the basis of actual or perceived race, creed, color, religion, national origin, ancestry, age, physical or mental disability, pregnancy, genetic information, sex, sexual orientation, gender identity or expression, marital status, familial status, domestic violence victim status, veteran or military status, or any other legally recognized protected basis under federal, state or local laws. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Airbyte is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. Please let us know if you need assistance or accommodations due to a disability.

Skills Required

  • 7+ years in infrastructure, platform engineering, SRE, or DevOps
  • Hands-on ownership of Kubernetes, Helm, and Terraform in production environments
  • Deep experience with observability stacks (Prometheus, Grafana, Datadog) and on-call operations
  • Experience with CI/CD pipeline ownership and developer tooling
  • Ability & willingness to read backend code to understand how systems break
  • Fluency with AI tools - LLMs and agentic frameworks for automation
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
120 Employees
Year Founded: 2020

What We Do

Airbyte specializes in open-source data integration, designed to centralize data from diverse sources into storage solutions like data warehouses and lakes. Supporting over 400 connectors and a self-serve, extensible framework, Airbyte enables organizations to move both structured and unstructured data seamlessly for uses like AI, analytics, and business intelligence. Airbyte’s flexibility in deployment—whether cloud, hybrid, or on-premises—prioritizes data security, compliance, and governance, making it ideal for complex, scalable data needs across industries.

Why Work With Us

Airbyte is extremely transparent both internally and externally. Our company handbook, culture & values, strategy, and roadmap are open to all. https://handbook.airbyte.com/

Gallery

Gallery

Similar Jobs

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
119K-170K Annually

Navan Logo Navan

Site Reliability Engineer

Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
Easy Apply
Hybrid
Palo Alto, CA, USA
3300 Employees
86K-192K Annually

Sprinter Health Logo Sprinter Health

Site Reliability Engineer

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
Remote or Hybrid
2 Locations
500 Employees
160K-235K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
San Jose, CA, USA
8697 Employees
193K-275K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account