Infrastructure Stability Architect - Stability Engineering Platform

Posted 9 Hours Ago
Be an Early Applicant
Singapore
7+ Years Experience
Financial Services
The Role
Design and lead stability architecture for large-scale distributed systems, optimize stability strategies, drive infrastructure intelligence and automation, collaborate with teams, and develop technical standards. Full-time position at OKX in Singapore.
Summary Generated by Built In

OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa

 

Who We Are

At OKX, we believe the future will be reshaped by technology. Founded in 2017, we are revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems. We reshape the financial ecosystem by offering some of the most diverse and sophisticated products, solutions, and trading tools on the market. Trusted by more than 50 million users in over 180 countries globally, OKX empowers every individual to explore the world of Web3. With our extensive range of products and services, and unwavering commitment to innovation, OKX envisions a world of financial access backed by blockchain and the power of decentralized finance.

 

We are innovative in the way we think, work, and in the products we create. We are also socially responsible by actively participating and encouraging employees to take part in various public welfare activities. With more than 3,000 employees around the world, we believe embracing diversity and inclusion will spark the creation of long-term value for the industry. Come Build the Future with Us now!



About the Opportunity

With the vision of ensuring service stability as one of the core competitiveness of the company's products, the service stability engineering team has built end-to-end link-level risk management capabilities to achieve sustainable automatic identification and analysis of potential stability risks, and changed from "passive governance" to "active governance", so as to move more stability matters forward and left, prevent them before they occur, and improve user experience.

 

What You’ll Be Doing

  • Design and lead the stability architecture for large-scale distributed systems, including big data platforms, data warehouses, and core middleware infrastructure.
  • Develop and optimize comprehensive stability strategies covering capacity planning, performance optimization, fault prevention, and disaster recovery.
  • Spearhead chaos engineering practices, design complex fault injection scenarios to validate system resilience and self-healing capabilities.
  • Build and refine comprehensive monitoring and alerting systems for rapid fault detection, localization and recovery,.
  • Lead root cause analysis for major incidents and formulate long-term improvement plans to continuously enhance system availability and reliability.
  • Drive infrastructure intelligence and automation, designing and implementing AIOps solutions. 
  • Collaborating closely with product, development, and operations teams to integrate stability requirements throughout the product lifecycle.
  • Lead the development of stability-related technical standards and best practices, promoting their adoption across the organization. 

 

What We Look For In You 

  • Bachelor degree or above in Computer Science or related major, with more than 10 years of architecture design experience in large-scale internet or computing platforms.
  • Expert knowledge of distributed system architectures, with deep understanding and rich practical experience in big data, cloud-native, and micro-service technologies.
  • In-depth understanding of various infrastructure components (e.g. Kubernetes, Kafka, Database) and ability to perform advanced tuning.
  • Strong systems thinking capability, able to analyze and solve complex stability issues from a holistic perspective.
  • Extensive experience in handling large-scale system failures, with the ability to quickly locate and resolve challenging problems.
  • Mastery of Linux systems and network technologies, familiarity with mainstream cloud platforms e.g. Alibaba Cloud, AES) architecture and services.
  • Excellent technical leadership skills, able to guide teams and drive cross-department collaboration.
  • Proficiency in speaking, reading and writing in both English and Mandarin to collaborate effectively with global and cross-functional team members.
  • Passion for continuous learning, able to quickly grasp new technologies and apply them in practical work scenarios.


Perks & Benefits

  • Competitive total compensation package

  • L&D programs and Education subsidy for employees' growth and development

  • Various team building programs and company events

  • Wellness and meal allowances

  • Comprehensive healthcare schemes for employees and dependants

  • More that we love to tell you along the process!

Top Skills

Database
Kafka
Kubernetes
The Company
1,073 Employees
Remote Workplace
Year Founded: 2017

What We Do

Founded in 2017, OKX is one of the world’s leading cryptocurrency spot and derivatives exchanges. OKX innovatively adopted blockchain technology to reshape the financial ecosystem by offering some of the most diverse and sophisticated products, solutions, and trading tools on the market. Trusted by more than 20 million users in over 180 regions globally, OKX strives to provide an engaging platform that empowers every individual to explore the world of crypto.

In addition to its world-class DeFi exchange, OKX serves its users with OKX Insights, a research arm that is at the cutting edge of the latest trends in the cryptocurrency industry. With its extensive range of crypto products and services, and unwavering commitment to innovation, OKX’s vision is a world of financial access backed by blockchain and the power of decentralized finance.

Jobs at Similar Companies

Energy CX Logo Energy CX

Talent Acquisition Specialist

Greentech • Professional Services • Business Intelligence • Consulting • Energy • Financial Services • Utilities
Easy Apply
Chicago, IL, USA
55 Employees

MassMutual India Logo MassMutual India

BI Support Developer

Big Data • Fintech • Information Technology • Insurance • Financial Services
Hyderabad, Telangana, IND

TBD Logo TBD

Staff Engineer

Blockchain • Fintech • Financial Services • Cryptocurrency
Remote
Hybrid
New York, NY, USA
190 Employees
240K-359K Annually

Similar Companies Hiring

TBD Thumbnail
Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
190 Employees
Energy CX Thumbnail
Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
Chicago, IL
55 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account