Senior Site Reliability Engineer

Reposted 12 Days Ago
Be an Early Applicant
Singapore, SGP
In-Office
Senior level
Gaming • Information Technology • Software
The Role
The Senior Site Reliability Engineer will enhance system reliability and performance while managing AWS infrastructure, Kubernetes, and automation tools for a growing client base.
Summary Generated by Built In
About k-ID

k-ID is the global leader in privacy-first compliance and age verification infrastructure. Recognized as one of TIME’s Best Inventions of 2025, named a Tech Pioneer by the World Economic Forum and a winner of Fast Company’s Next Big Things in Tech, we are building the Age Layer for the internet—the fundamental infrastructure that allows digital platforms to verify age and manage compliance globally without friction.
Our core platform, anchored by the Compliance Development Kit (CDK) and AgeKit, is the trusted engine for the world’s largest game publishers and digital ecosystems. We replace fragmented, manual compliance with a unified API that handles age verification, parental consent, and regulatory logic across 200+ markets. Backed by top-tier venture capital firms like a16z and Lightspeed, k-ID is entering a phase of growth to define the standard for global digital safety.

About the role

We are hiring a Senior Site Reliability Engineer to help make k-ID reliable at scale.

This role sits in the middle of our production backbone. You will own and improve the systems that keep our platform available, observable, secure, and resilient as traffic grows and our client base expands globally. You will work across infrastructure, tooling, deployment workflows, incident response, and systems design to make sure we can scale without breaking.

This is not a ticket closing operations role. We want someone who can look at a system, find the weak points, and harden it. Someone who cares about failure modes, blast radius, deployment safety, recovery time, cost discipline, and the realities of running production systems under pressure. You should be comfortable writing code, automating away toil, and partnering closely with engineers to improve reliability through better architecture and better operating practices.

Responsibilities
  • Own the reliability, availability, and performance of the systems behind k-ID’s platform and public APIs

  • Design and improve scalable infrastructure on AWS and Kubernetes that can support high growth, uneven traffic, and global production workloads

  • Build and maintain strong observability across logs, metrics, tracing, alerting, and service health so issues are caught early and investigated quickly

  • Improve deployment safety through better CI and CD workflows, release controls, rollback paths, and environment consistency

  • Drive incident response and production readiness practices, including runbooks, on call hygiene, postmortems, capacity planning, and resilience testing

  • Reduce operational toil by automating repetitive work and improving internal tooling for developers and operators

  • Partner with engineering teams to embed reliability and operability into service design from the start, not after something fails in production

  • Strengthen platform security and infrastructure hygiene across access controls, secrets handling, system hardening, and production safeguards

  • Continuously improve system performance, resource efficiency, and cost awareness without compromising reliability

Qualifications
  • 5+ years of experience in infrastructure, platform engineering, site reliability engineering, or software engineering with meaningful production ownership

  • Strong experience running production systems in AWS

  • Strong hands on experience with Kubernetes and container based workloads

  • Experience with infrastructure as code, preferably Terraform

  • Experience designing and operating observability stacks using tools such as Prometheus, Alertmanager, Grafana, OpenTelemetry, or equivalent systems

  • Strong understanding of distributed systems, failure modes, service reliability, and production debugging

  • Experience building or improving CI and CD systems and release workflows in modern engineering environments

  • Ability to write code and automation in one or more languages such as Go, Python, or TypeScript

  • Good judgment during incidents and a practical mindset around tradeoffs, risk, and recovery

  • Clear written and verbal communication skills with the ability to work effectively in a remote team

  • Startup experience is a plus, especially in environments where systems and processes are still being built

Applicants Privacy Policy

Skills Required

  • 5+ years of experience in infrastructure, platform engineering, site reliability engineering, or software engineering with meaningful production ownership
  • Strong experience running production systems in AWS
  • Strong hands on experience with Kubernetes and container based workloads
  • Experience with infrastructure as code, preferably Terraform
  • Experience designing and operating observability stacks using tools such as Prometheus, Alertmanager, Grafana, OpenTelemetry, or equivalent systems
  • Strong understanding of distributed systems, failure modes, service reliability, and production debugging
  • Experience building or improving CI and CD systems and release workflows in modern engineering environments
  • Ability to write code and automation in one or more languages such as Go, Python, or TypeScript
  • Good judgment during incidents and a practical mindset around tradeoffs, risk, and recovery
  • Clear written and verbal communication skills with the ability to work effectively in a remote team
  • Startup experience is a plus
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
48 Employees
Year Founded: 2023

What We Do

k-ID is a first-of-its-kind global compliance engine that makes it easy for game developers and parents to ensure the safety and privacy of kids and teens online, providing age-appropriate and market-specific feature access in more than 200 markets around the world.

Similar Jobs

Autodesk Logo Autodesk

Senior Site Reliability Engineer

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
In-Office
Singapore, SGP
13285 Employees

DBS Bank Ltd Logo DBS Bank Ltd

Full-stack Engineer

Fintech • Information Technology • Software • Financial Services
In-Office or Remote
17 Locations
41000 Employees

Airwallex Logo Airwallex

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office or Remote
Singapore, SGP
2200 Employees
Hybrid
Singapore, SGP
1700 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account