Founding Site Reliability Engineer (SRE)

Posted 20 Days Ago
Be an Early Applicant
London, Greater London, England
In-Office
Expert/Leader
Artificial Intelligence • Edtech • Software
The Role
As the founding Site Reliability Engineer, you'll ensure performance and reliability, define SLIs/SLOs, conduct load testing, automate operations, and mentor engineers.
Summary Generated by Built In

Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We're building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.  

With over 1 million monthly active users and $4M in annual recurring revenue, we’re already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn.

Role Overview
You will be our founding SRE. Reporting to the CTO, you will own capacity, performance and reliability for Gizmo’s full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You’ll write code across the stack, but your charter is classic SRE: defend SLOs, eliminate toil, and raise the ceiling on scale before it becomes a hard limit.

Key Responsibilities

  • Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs.
  • Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.
  • Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks.
  • Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements.
  • Automate repetitive ops on Kubernetes and CI/CD; keep “toil” <50 % of your time by pushing fixes into code.
  • Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook

Requirements
  • Hands-on scale experience: you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
  • You have software engineering experience.
  • Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
  • Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
  • Comfort with Kubernetes, IaC and cloud-native patterns; can debug from network to application layer.
  • Self-starter with a maker mindset. We’re looking for ex-founders or individuals with start-up experience. 
  • Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
  • Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
  • Driven by impact - you prioritise work that moves the needle!

Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or operating OpenSearch at scale.


Benefits
  • Highly competitive salary.
  • You'll own a piece of what you're building - equity included.
  • Hybrid working model with 4 days in our East London office, ideally located between Shoreditch High Street, Old Street, and Liverpool Street stations.
  • The opportunity to become one of the earliest employees in one of the UK’s fastest-growing startups.
  • Private health insurance

Top Skills

Ci/Cd
Grafana
Hasura
Kubernetes
Opensearch
Opentelemetry
Postgres
Prometheus
Redis
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
105 Employees
Year Founded: 2019

What We Do

Gizmo is on a mission to help students remember everything they learn in a simple and fun way. We use AI to transform notes and study materials into fun quizzes that are based on cognitive science: quiz for just 10 minutes a day, and you will remember everything you learn! Save All is growing rapidly with teachers and students across the world who love the efficiency of the tool: just paste in your notes, and Save All turns them into fun, active recall quizzes.

Similar Jobs

Lloyds Banking Group Logo Lloyds Banking Group

Senior Site Reliability Engineer

Fintech • Software • Financial Services
In-Office
London, England, GBR
60287 Employees
82K-91K
In-Office
London, England, GBR
543 Employees
71K-71K

Speechmatics Logo Speechmatics

Site Reliability Engineer

Machine Learning • Software • Conversational AI
Hybrid
London, Greater London, England, GBR
115 Employees

Sectigo Logo Sectigo

Site Reliability Engineer

Information Technology • Internet of Things • Machine Learning • Software
In-Office or Remote
Manchester, Greater Manchester, England, GBR
406 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account