duvo.ai Jobs

Site Reliability Engineer (EU/UK Based - Remote)

duvo.ai

Site Reliability Engineer (EU/UK Based - Remote)

Reposted 10 Days Ago

Be an Early Applicant

Hiring Remotely in London, Greater London, England, GBR

In-Office or Remote

250K-250K Annually

Mid level

Artificial Intelligence • Software • Automation

The Role

The Site Reliability Engineer will manage platform reliability and infrastructure, ensuring security and observability while automating deployments. Responsibilities include incident response, monitoring, and improving reliability practices.

Summary Generated by Built In

Who we are

Enterprise teams still copy data between systems all day. Work gets stuck in emails, legacy UIs, and handoffs. That chaos is costly, slow, and risky.

We're a fast-moving team on a mission to end it for good. Traction is strong and we're solving real problems for real customers—but to win, we need exceptional talent. We stay humble, do the work, and let results speak.

What we are building

We're building the AI operations platform for retail and CPG enterprises—a horizontal platform where AI agents execute end-to-end work across UIs and APIs with governance built in.

Where copilots stop, Duvo finishes the job. Business users specify the outcome; agents plan, act, request approvals on exceptions, and learn with every run. We start with a retail wedge (category management, supply chain, finance ops) where ROI is obvious, then expand to adjacent functions and sectors.

Velocity is our moat: ship fast, iterate faster, compound learning.

The role

You will own the reliability, security, and infrastructure that lets our platform run AI agents for enterprise customers. This isn't traditional web app SRE — our agents execute arbitrary code in sandboxes, make unpredictable external API calls, and run for hours. Keeping this reliable, secure, and observable is the job.

You'll be part of newly formed SRE team as one of the first teammembers. Infrastructure is currently owned collectively by product engineers — you'll take ownership, inherit real infrastructure (25+ Terraform modules, full OpenTelemetry pipeline, Prometheus/Grafana monitoring), and build the reliability practice from scratch.

Your unit of ownership: platform reliability, infrastructure, observability, and incident response. You own sandbox infrastructure and capacity; the AI Platform Engineer owns sandbox behavior and runtime logic.

We're a growing product team scaling into multiple initiatives, each with a lead, engineers, a design engineer, and an AI-focused engineer.

What we're looking for

These are non-negotiables—the things we'll specifically evaluate you on:

Distributed systems experience. You've designed and operated systems that scale. You understand failure modes, capacity planning, and the tradeoffs between consistency, availability, and latency in real production environments.
Security mindset. You'll handle enterprise data flowing through sandboxed environments, manage KMS encryption, configure Cloud Armor WAF rules, and ensure network isolation between tenant workloads. Security is a default consideration, not an afterthought.
Observability and incident response. You build monitoring and alerting that catches problems before customers do. When incidents happen, you lead structured responses, find root causes, and drive lasting fixes — not just restarts.
Infrastructure as code and automation. You automate everything you can. You've worked with IaC tools, CI/CD pipelines, and container orchestration in production. Manual runbooks make you uncomfortable.
Shipping and ownership. You don't just maintain systems — you improve them. You take ownership of reliability projects from proposal to production, and you measure the results.
Judgment on where to invest. You'll decide what to automate first, where to invest in reliability vs. ship speed, and make incident calls with incomplete information.

You might also

Have experience with GCP, Kubernetes, or similar cloud-native infrastructure.
Have worked with sandboxed execution environments or multi-tenant isolation.
Be comfortable with AI/ML production systems — understanding the unique reliability challenges of LLM-based applications.
Have a product engineering background — you've built features and understand the developer experience you're supporting.

This is not for you if

You want a traditional ops role where you follow runbooks — we're building the reliability practice, not maintaining one.
You want to build AI features — see AI Platform Engineer.

Our tech stack

GCP (Cloud Run, GKE, GCS)
Terraform, Docker
Prometheus, Grafana, Loki, OpenTelemetry
TypeScript and Python services (you'll read and occasionally modify application code, but deep language expertise isn't required)
Postgres, Redis

How we work

These are real tradeoffs we've made, not aspirations:

Initiative-driven. We organize around customer problems, not org charts. Problems surface through product feedback, competitive analysis, and direct customer conversations — then we prioritize, build, and ship weekly.
Customer-obsessed. We solve real problems, not hypothetical ones. Features that don't move customer metrics get cut.
Iterative by default. We ship small, learn fast, and never get attached to yesterday's code. This means things break sometimes — we fix forward.
AI-first leverage. We use AI to move faster and focus human time where it matters most. If a tool can do it, a person shouldn't.
Direct feedback. We give each other actionable feedback immediately. This can feel uncomfortable — we think that's worth it.
Autonomy with accountability. We trust people to make decisions and hold them to outcomes, not process.

What we offer

Unlimited AI budget. We don't just allow AI tools — we strongly encourage them. Want to try a new tool? Buy it. Want to automate part of your workflow? Do it.
Autonomy to do your best work. Want to meet someone to learn from? Set it up. Want a mentor? Go get one. Want to fly out to talk to an important customer? Just ask.
A real AI product with real customers. You're not building demos or internal tools. Enterprise customers use what you ship, and their feedback drives what you build next.
A sharp, motivated team that values ownership and candor.
Compensation 250.000,- CZK / month with a meaningful equity component. You can trade salary for additional equity if you prefer more upside.

How we hire

We respect your time and aim to move fast:

Discovery call with a senior teammember (online, 30 min). We'll talk about you, how you think and whether there's mutual fit.
Remote task (async, time-boxed, ~1 hour). Build a small product end-to-end. Not LeetCode.
Technical interview (online, ~1 hour). Meet the team. We'll go deeper on your experience, system design, product thinking, and collaboration. No trick questions — we want to see how you think and build.
On-site trial day (2 days). Ship something small to production with us and see how we work together. Fully compensated.

Skills Required

Experience with distributed systems design and operations
Strong security mindset and experience managing data in sandbox environments
Ability to build observability and incident response practices
Proficiency in Infrastructure as Code and automation tools
Ownership of reliability and infrastructure improvement projects

View all jobs at duvo.ai

View duvo.ai Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Prague, Capital City of Prague

32 Employees

What We Do

Duvo automates the hard, long-running processes that break across messy enterprise stacks — ERPs, portals, phone calls, spreadsheets, and systems with no API. The stuff nobody else wants to touch. Not drafts. Not suggestions. Closed cases: systems updated, write-backs verified, evidence attached. End to end, with enterprise reliability. Cloud browsing for systems nobody else can reach. Voice for last-mile closure. Governed execution with human-in-the-loop approvals and full audit trails. No token bills. No usage surprises. One predictable subscription — so you can build a business case on day one. No technical skills needed. Enterprise-grade security. Runs 24/7. Backed by Index Ventures, Northzone, and Credo Ventures.