DevOps / SRE Engineer - AI Platform

Reposted 4 Days Ago
Be an Early Applicant
Krung Thep Maha Nakhon, Phra Nakhon, Bangkok, THA
In-Office
Senior level
Marketing Tech • Retail • Software
The Role
Manage ARIP's infrastructure using Terraform, build CI/CD pipelines, implement observability and cost measures, and lead incident response efforts.
Summary Generated by Built In

The DevOps / SRE Engineer owns the operational substrate of an AI-native retail decisioning platform — infrastructure, CI / CD, observability, cost meter, and incident response for a system that runs production agents taking real business actions. The role builds on the enterprise Terraform standard, CI / CD spine, and FinOps tagging policy rather than reinventing parallel infrastructure. 

Remote candidates outside of Thailand are welcome to apply.

Key Responsibilities:
    • Adopt the enterprise Terraform standard and module library for all platform infrastructure; author platform-specific modules where needed (agent runtime, vector DB, knowledge graph); run drift detection weekly. 
    • Build platform-specific CI / CD pipelines on the enterprise spine — service deploys, agent deploys, eval-gate enforcement; integrate eval gates so no agent reaches production without eval pass. 
    • Operate rollback orchestration with sub-15-minute recovery; quarterly game days. 
    • Own the platform observability stack — OpenTelemetry, Langfuse for LLM traces, custom dashboards for per-agent cost. 
    • Implement the per-agent cost meter end-to-end — token counts, vector queries, model inference, downstream LLM Gateway costs; surface cost data to the enterprise GenAI cost dashboard. 
    • Stand up the platform on-call rotation; author runbooks for every production agent and service; lead incident response with measurable corrective actions. 
    • Implement platform cost-tagging policy consistent with the enterprise standard (team, domain, environment, project, agent, suite, persona); report monthly to Cost Review. 
    • Drive cost optimisation — right-sizing, caching, model routing decisions, reserved compute. 

Requirements
    • Bachelor's or Master's degree in Computer Science, Engineering, or a related discipline. 
    • 5+ years SRE / DevOps with production ownership. 
    • Terraform at scale — modules, state, drift, environment promotion. 
    • CI / CD for data + ML / AI services (GitLab CI / CD or comparable). 
    • Cloud platform (Azure preferred; AWS / GCP transferable). 
    • Observability — OpenTelemetry, Langfuse (or comparable LLM traces), custom dashboards. 
    • FinOps — tagging policies, attribution, optimisation. 
    • Incident response — on-call, post-mortems, runbook authorship. 

Preferred Qualifications

  • AI / agent platform SRE experience; cost-meter / chargeback systems built or operated. 
  • Multi-cloud production experience; open-source contributions to IaC / observability tooling. 
  • AI / ML / agent system observability instrumentation (LLM cost, agent cost, eval scores). 
  • Vendor certifications such as HashiCorp Terraform Associate / Professional, Azure Solutions Architect Associate, or Databricks Data Engineer Professional. 

Skills Required

  • 5+ years SRE / DevOps experience
  • Expertise in Terraform at enterprise scale
  • Experience with CI/CD for ML/AI services
  • Knowledge of OpenTelemetry for observability
  • Senior-level incident response experience
  • Experience from recognized companies in AI or data-intensive platforms
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Khet Suan Luang, Bangkok
103 Employees

What We Do

Makro PRO is an exciting new digital venture by the iconic Makro. Our proud purpose is to build a technology platform that will help make business possible for restaurant owners, hotels, and independent retailers, and open the door for sellers. Makro PRO brings together the best talent across multi-nationals to transform the B2B marketplace ecosystem. We welcome bold, energetic, and thoughtful people who share our belief in collaboration, diversity, excellence, and putting customers at the heart of our work.

Similar Jobs

Taboola Logo Taboola

Advertising Account Management Director - SEA & TH Markets

AdTech • Big Data • Digital Media • Marketing Tech
Hybrid
Bangkok, Phra Nakhon, Bangkok, THA
1900 Employees

Taboola Logo Taboola

Human Resources Director

AdTech • Big Data • Digital Media • Marketing Tech
Hybrid
Bangkok, Phra Nakhon, Bangkok, THA
1900 Employees

Pfizer Logo Pfizer

Commercial Lead, Vietnam and Thailand

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
2 Locations
121990 Employees

Capco Logo Capco

Data Architect

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
Bangkok, Phra Nakhon, Bangkok, THA
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account