- Design and build AI Harness capabilities for SRE / DevOps scenarios, including fault detection, change analysis, capacity risk identification, automated inspection, drill evaluation, and recovery recommendations.
- Drive the development of an automated RCA (Root Cause Analysis) system, combining logs, metrics, distributed tracing, events, changes, topology, and other data to achieve root cause analysis, impact scope assessment, and post-incident review support.
- Build AIOps platform capabilities, including intelligent alert noise reduction, anomaly detection, event correlation, trend prediction, fault attribution, and automated closed-loop remediation.
- Collaborate with R&D, SRE, platform, data, and business teams to embed AI capabilities into Code Review, CI/CD, GitOps, DevOps, incident response, and stability governance processes.
- Bachelor's degree or above in Computer Science or a related field, with 8+ years of experience in R&D, architecture, or platform engineering; experience building AI applications, SRE, AIOps, or DevOps platforms is preferred.
- Strong software architecture skills, familiar with microservices architecture, distributed systems, high-availability design, service governance, observability, and platform engineering.
- Familiar with LLM application development; understanding of core technologies such as LLM, RAG, Embedding, vector databases, Agents, Function Calling / Tool Calling, and Prompt Engineering. Understanding of the production challenges of AI applications, including hallucination control, result evaluation, permission boundaries, data security, cost control, observability, and failure fallback mechanisms.
- Experience delivering AI Agent or intelligent assistant products, able to design complex task decomposition, multi-tool invocation, multi-turn reasoning, context management, and human-machine collaboration workflows.
- Familiar with RCA or AIOps capability development, including log analysis, metric anomaly detection, distributed tracing, event correlation, alert noise reduction, topology analysis, and root cause localization.
- Proficient in at least one mainstream development language, such as Java, Python, Go, or TypeScript, with strong engineering implementation and system design skills.
- Familiar with cloud-native technology stacks and common middleware, such as Kubernetes, Docker, Kafka, Redis, MySQL, Elasticsearch, Prometheus, Grafana, OpenTelemetry, etc.
- Strong complex problem analysis skills and holistic architectural thinking, able to drive problem-solving from business, platform, process, and organizational collaboration perspectives.
- Ability to communicate in both Chinese and English is preferred as the role requires collaborating with cross-region stakeholders
- Competitive total compensation package
- L&D programs and Education subsidy for employees' growth and development
- Various team building programs and company events
- Wellness and meal allowances
- Comprehensive healthcare schemes for employees and dependents
- More that we love to tell you along the process!
Skills Required
- Bachelor's degree in Computer Science or related field
- 8+ years of experience in R&D, architecture, or platform engineering
- Experience in building AI applications, SRE, AIOps, or DevOps platforms
- Strong software architecture skills
- Familiarity with LLM application development
- Experience with RCA or AIOps capability development
- Proficient in at least one mainstream development language
- Familiarity with cloud-native technology stacks and common middleware
- Strong complex problem analysis skills
- Ability to communicate in Chinese and English
OKX Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about OKX and has not been reviewed or approved by OKX.
-
Fair & Transparent Compensation — Pay is considered competitive or above market, especially in engineering, product, and legal roles across major hubs. This positioning is consistently cited as a major attraction for candidates.
-
Healthcare Strength — Role descriptions indicate comprehensive medical, dental, vision, life, and disability coverage, with employer-paid premiums in some cases. Health coverage is highlighted alongside core benefits like PTO and parental leave.
-
Wellbeing & Lifestyle Benefits — Allowances for education and fitness, meal perks and snacks, team-building budgets, and structured learning programs are described across locations. These extras enhance the total rewards package beyond base pay.
OKX Insights
What We Do
Founded in 2017, OKX is one of the world’s leading cryptocurrency spot and derivatives exchanges. OKX innovatively adopted blockchain technology to reshape the financial ecosystem by offering some of the most diverse and sophisticated products, solutions, and trading tools on the market. Trusted by more than 20 million users in over 180 regions globally, OKX strives to provide an engaging platform that empowers every individual to explore the world of crypto. In addition to its world-class DeFi exchange, OKX serves its users with OKX Insights, a research arm that is at the cutting edge of the latest trends in the cryptocurrency industry. With its extensive range of crypto products and services, and unwavering commitment to innovation, OKX’s vision is a world of financial access backed by blockchain and the power of decentralized finance.

.jpeg)





