Lead Site Reliability Engineer - SG

Reposted 12 Days Ago
Be an Early Applicant
Singapore, SGP
In-Office
Senior level
Gaming • Information Technology • Software
The Role
The Lead Site Reliability Engineer will ensure the operational health of k-ID's systems by leading the NOC, improving reliability processes, and collaborating with engineering teams for better performance and incident management.
Summary Generated by Built In
About k-ID

k-ID is the global leader in privacy-first compliance and age verification infrastructure. Recognized as one of TIME’s Best Inventions of 2025, named a Tech Pioneer by the World Economic Forum and a winner of Fast Company’s Next Big Things in Tech, we are building the Age Layer for the internet—the fundamental infrastructure that allows digital platforms to verify age and manage compliance globally without friction.
Our core platform, anchored by the Compliance Development Kit (CDK) and AgeKit, is the trusted engine for the world’s largest game publishers and digital ecosystems. We replace fragmented, manual compliance with a unified API that handles age verification, parental consent, and regulatory logic across 200+ markets. Backed by top-tier venture capital firms like a16z and Lightspeed, k-ID is entering a phase of growth to define the standard for global digital safety.

About The Role

We are hiring a Lead Site Reliability Engineer and NOC Lead to own production reliability and operational excellence across the platform.

This is a senior role for someone who can lead from the front. You will be responsible for the reliability, availability, observability, and operational maturity of k-ID’s systems, while also leading the Network Operations Center function. That means this person is not just responding to incidents. They are building the systems, processes, tooling, and team standards that make incidents less frequent, less severe, and faster to resolve when they do happen.

This role is more senior than our senior NOC hires. We want someone who can set the operating model for the NOC, raise the technical bar for incident management, partner deeply with engineering leadership, and drive the long term reliability roadmap for the business. You should be comfortable switching between hands on technical work, operational leadership, incident command, and team development.

Location & Language
  • Location: Singapore

  • Languages: Proficiency in English

Key Responsibilities
  • Own the reliability and operational health of k-ID’s production systems and critical services

  • Lead the NOC function, including shift structure, escalation paths, incident handling standards, readiness processes, and operational reporting

  • Act as the senior escalation point for major incidents and serve as incident commander for high severity events when needed

  • Design and improve monitoring, alerting, and operational tooling so the NOC can detect issues early and respond effectively

  • Drive root cause analysis and post incident review practices that produce real corrective action rather than superficial summaries

  • Partner with engineering teams to improve system resilience, deployment safety, service ownership, and production readiness

  • Identify systemic risks across infrastructure, services, dependencies, and operational processes, then drive plans to reduce them

  • Improve platform performance, availability, and recovery time through architecture changes, better automation, and stronger operating discipline

  • Build and maintain runbooks, readiness checklists, service health standards, and escalation playbooks across the organization

  • Help define service level objectives, operational metrics, and reliability targets that align with business needs

  • Support and mentor senior NOC engineers and other operations team members, helping raise technical depth and decision quality across the function

  • Contribute hands on to infrastructure and reliability engineering work where needed, especially in high leverage areas

Qualifications
  • 7 or more years of experience in site reliability engineering, infrastructure engineering, platform engineering, or software engineering with significant production ownership

  • Strong experience operating production systems in AWS

  • Strong hands on experience with Kubernetes, containerized services, and modern infrastructure tooling

  • Experience building and improving observability across metrics, logs, tracing, alerting, and service health

  • Deep understanding of distributed systems, service failure modes, traffic management, capacity planning, and recovery design

  • Experience designing or running incident response programs, on call operations, escalation frameworks, and post incident review processes

  • Experience leading or managing NOC, production operations, or support functions in a high availability environment

  • Strong experience with infrastructure as code such as Terraform

  • Experience improving CI and CD workflows, release safety, rollback practices, and change management

  • Ability to write code or automation in one or more languages such as Go, Python, or TypeScript

  • Strong written and verbal communication skills, especially in high pressure operational settings

  • Experience working in fast moving startup environments is strongly preferreded

BenefitsCompetitive Salary
  • A competitive startup salary aligned with experience and market benchmarks.

  • Employee Stock Ownership Plan so you participate directly in the long term upside of the company.

Health and Wellbeing
  • Comprehensive family health coverage, including medical, dental, and vision benefits

  • Provided Mental Health and Wellness support benefit

Professional Development
  • Hands on exposure with key clients in a scaling global tech company

  • Opportunities for continuous learning through real ownership rather than formal training alone.

  • Direct collaboration with the Founders and the tech leadership team

Culture and Ways of Working
  • A collaborative, inclusive and low politics work environment.

  • Flexible, trust based working culture shaped by a US startup operating model.

  • A mission driven company focused on improving online experiences for kids and teens globally.

Applicants Privacy Policy

Skills Required

  • 7 or more years of experience in site reliability engineering, infrastructure engineering, platform engineering or software engineering with significant production ownership
  • Strong experience operating production systems in AWS
  • Strong hands on experience with Kubernetes, containerized services, and modern infrastructure tooling
  • Experience building and improving observability across metrics, logs, tracing, alerting, and service health
  • Deep understanding of distributed systems, service failure modes, traffic management, capacity planning, and recovery design
  • Experience designing or running incident response programs, on call operations, escalation frameworks, and post incident review processes
  • Strong experience with infrastructure as code such as Terraform
  • Ability to write code or automation in one or more languages such as Go, Python, or TypeScript
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
48 Employees
Year Founded: 2023

What We Do

k-ID is a first-of-its-kind global compliance engine that makes it easy for game developers and parents to ensure the safety and privacy of kids and teens online, providing age-appropriate and market-specific feature access in more than 200 markets around the world.

Similar Jobs

k-ID Logo k-ID

Senior Site Reliability Engineer

Gaming • Information Technology • Software
In-Office
Singapore, SGP
48 Employees
In-Office or Remote
Singapore, SGP
91 Employees

Datadog Logo Datadog

Recruiter

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Hybrid
Singapore, SGP
6500 Employees

Ambiq  Logo Ambiq

Staff Engineer

Hardware • Internet of Things • Software • Wearables • Semiconductor
Easy Apply
In-Office
Singapore, SGP
220 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account