Senior Site Reliability Engineer, APAC

Posted 2 Days Ago
Be an Early Applicant
3 Locations
Remote
166K-260K Annually
Senior level
Computer Vision • Machine Learning • Software
The Role
Lead observability, incident management, and reliability for Ditto's edge-to-cloud infrastructure. Build monitoring (Prometheus, Grafana, Datadog), define SLOs, automate recovery and tooling, author runbooks, collaborate with product teams, and participate in on-call rotations to ensure scalable, enterprise-grade system resilience.
Summary Generated by Built In

About Ditto:

Ditto is redefining how data moves at the edge. Our mission is to make it seamless for developers to build resilient, real-time applications, regardless of network conditions. Whether you're in a stadium, airplane, or remote military base, Ditto's peer-to-peer sync engine ensures devices stay connected and data stays consistent, even without internet. With more than $145 million in funding and trusted by organizations like Chick-fil-A, Delta Airlines, and the U.S. military, Ditto powers mission-critical experiences across aviation, retail, travel, hospitality, defense, and more. As a globally distributed, fast-growing startup, we’re committed to building a diverse and inclusive team that reflects the wide range of perspectives needed to solve the world’s hardest connectivity problems.

About the position

Ditto is at an inflection point. As we scale to meet the demands of our enterprise customers, we need experienced Site Reliability Engineers to ensure our infrastructure delivers enterprise-grade reliability.

This is a unique opportunity to join a specialized team focused on observability, system reliability and operational excellence for our cutting-edge, edge-to-cloud, database technology.

As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability, performance, and scalability of Ditto's cloud infrastructure. You'll collaborate with product engineering teams to improve system resilience, lead and develop incident management processes and build observability solutions for our unique distributed architecture.

As a Site Reliability Engineer, you will:

  • Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana

  • Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions

  • Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes

  • Work with teams to implement and maintain SLOs, monitoring, and alerting strategies that ensure reliability at scale

  • Design and implement automation and support tooling to improve system resilience, maintain operational safety and reduce operational overhead

  • Lead the development and maintenance of runbooks, alert definitions, and incident response procedures

  • Participate in on-call rotations to provide 24/7 support for critical production systems

What you'll need:

  • 6+ years of experience in Site Reliability Engineering or similar DevOps roles focused on system reliability and incident management

  • Strong experience with modern monitoring stacks including Prometheus, Grafana, and Datadog

  • Experience in at least one systems programming language, such as Python, Go, Rust, C/C++, or Java

  • Expertise with Infrastructure as Code tools, like Terraform and Helm

  • Expertise with at least one major cloud service provider (AWS, GCP, Azure)

  • Strong communication skills, with the ability to lead incident response and effectively collaborate across teams

  • Willingness and experience engaging with on-call rotations and emergency response procedures

  • A high degree of agency and bias towards action. Identify problems and work autonomously to solve them

  • Excellent problem-solving skills and a methodical approach to troubleshooting complex issues

Nice to have:

  • Experience building multi-tenant, multi-cloud SaaS/DBaaS Platforms

  • 4+ years of hands-on experience architecting applications for Cloud Platforms, and managing Cloud based infrastructure

  • Knowledge of edge computing or mesh networking

  • Experience instrumenting advanced observability practices (tracing, profiling) in distributed systems

  • Experience working with globally distributed teams

  • Proven experience in project management

The Benefits of Building with Us

We offer competitive salaries and meaningful equity. We believe everyone on the team should have a stake in what we’re building. Benefits vary by region to make sure you're covered in the ways that matter most. In the US, that includes health, dental, vision, life, and disability insurance, plus a 401(k) and flexible spending accounts.

Regardless of where you live, everyone at Ditto can utilize flexible time off. And while we work remotely, our Atlanta and San Francisco offices are open if you ever want a place to work or meet up with teammates.

Apply Anyway

At Ditto, we know game-changers don’t always come wrapped in a “perfect” resume. Years of experience? Every single bullet point checked? Meh. That’s not what drives us.

What does matter?

  • Grit.

  • Curiosity.

  • Adaptability.

  • And a genuine spark for what we’re building.

So if you’re fired up about our mission but not sure you tick every box - hit that apply button anyway. Use your application to show us how you’ll make an impact here.

We’re always on the lookout for exceptional humans who want to grow, stretch, and build something meaningful with us.

Equal Opportunity Employer

Ditto is proud to be an equal-opportunity employer. We do not discriminate in hiring or any employment decision based on race, color, religion, national origin, age, sex (including pregnancy, childbirth, or related medical conditions), marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity or expression, sexual orientation, or other applicable legally protected characteristics. Ditto is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, please let us know.

Skills Required

  • 6+ years of Site Reliability Engineering or similar DevOps experience
  • Strong experience with Prometheus, Grafana and Datadog
  • Experience in at least one systems programming language (Python, Go, Rust, C/C++, or Java)
  • Expertise with Infrastructure as Code tools such as Terraform and Helm
  • Expertise with at least one major cloud provider (AWS, GCP, or Azure)
  • Proven incident management and on-call experience, including coordinating responses and follow-up actions
  • Ability to design and implement monitoring, SLOs, alerting, and runbooks
  • Strong communication skills and ability to collaborate across teams
  • Willingness and experience engaging with on-call rotations and emergency response
  • Experience building multi-tenant, multi-cloud SaaS/DBaaS platforms
  • 4+ years architecting applications for cloud platforms and managing cloud infrastructure
  • Knowledge of edge computing or mesh networking
  • Experience with advanced observability practices (tracing, profiling) in distributed systems
  • Experience working with globally distributed teams
  • Project management experience

Ditto Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Ditto and has not been reviewed or approved by Ditto.

  • Fair & Transparent Compensation Postings publish explicit, location-based salary ranges for roles, and mirrored ranges plus third-party submissions indicate market-aligned compensation for U.S. tech roles. Structured compensation practices are signaled by clearly defined bands across markets.
  • Healthcare Strength Public job descriptions consistently include medical, dental, vision, and life/disability coverage for U.S. employees. This breadth of core health coverage is repeatedly referenced across recent postings.
  • Leave & Time Off Breadth Listings describe flexible or unlimited PTO within a remote-first setup. Time-off flexibility appears to be a standard part of the package.

Ditto Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Oakland, CA
67 Employees
Year Founded: 2011

What We Do

We are redefining the eyewear shopping experience to make it simple, personal and a little bit magical. With our industry-leading eyewear recommendation and virtual try-on technology platform, we are fundamentally changing the way eyewear is bought and sold globally for over 50 million customers each year. Computer vision and machine learning power our technology. We license this platform to eyewear retailers who embed it into their web, mobile and in-store experiences to fundamentally shift how they sell eyewear. Our technology is being used by over 10M users a month around the world by some of the world’s best forward-looking eyewear retailers. We provide a unique opportunity to work alongside a talented team of software engineers, business leaders, creatives, physicists and researchers to bring state of the art computer vision and machine learning technologies to market at scale. Come be apart of the fun at Ditto and join our team today!

Similar Jobs

Micron Technology Logo Micron Technology

FAB Engineer (半導体製造設備のエンジニア)

Artificial Intelligence • Hardware • Information Technology • Machine Learning
Remote
Hiroshima, JPN
45000 Employees

Micron Technology Logo Micron Technology

Electrical Engineer

Artificial Intelligence • Hardware • Information Technology • Machine Learning
Remote
Hiroshima, JPN
45000 Employees

Micron Technology Logo Micron Technology

ENGINEER, HVM EQUIP CV EQ

Artificial Intelligence • Hardware • Information Technology • Machine Learning
Remote
Hiroshima, JPN
45000 Employees

Micron Technology Logo Micron Technology

生産エンジニア(シフト勤務)/Production Engineer (Shift Work)

Artificial Intelligence • Hardware • Information Technology • Machine Learning
Remote
Hiroshima, JPN
45000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account