Staff/Senior Staff Engineer, Kubernetes

Posted 14 Hours Ago
Be an Early Applicant
Singapore, SGP
In-Office
Senior level
Fintech • Financial Services • Cryptocurrency
The Role
Own large-scale Kubernetes cluster lifecycle and multi-cloud (Alibaba Cloud & AWS) operations. Lead cloud-native architecture, stability, security hardening, automation (Shell/Python), CI/CD and IaC integration, incident response, monitoring, and team enablement for high-availability production environments.
Summary Generated by Built In
OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa.
Who We Are
At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom.
 
OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. 
 
Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.
OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.

What You’ll Be Doing 
  • K8s cluster lifecycle management: Own the build, scaling, version upgrades, daily operations, fault diagnosis, and performance tuning of large-scale production Kubernetes clusters; ensure 7×24 high availability and stable operations; support continuous business iteration.
  • Alibaba Cloud & AWS multi-cloud operations (core responsibility): Operate, govern, and optimize Alibaba Cloud and AWS resources across dual-cloud environments, covering container services, networking, storage, IAM, load balancing, databases, and object storage; manage configuration changes, cost optimization, and disaster recovery to achieve unified multi-cloud governance.
  • Cloud-native architecture and optimization: Lead containerization and microservices operational rollout; optimize Pod scheduling, resource quotas, network policies, image management, and log monitoring systems; resolve cluster resource fragmentation, business adaptation, and network interoperability challenges.
  • Stability and security: Build comprehensive K8s cluster monitoring, alerting, logging, and distributed tracing systems; define operations runbooks, change processes, and incident response plans; strengthen cluster security controls, disable high-risk permissions, harden container runtime environments, and ensure infrastructure and business data security.
  • Automated operations and DevOps: Develop operations automation scripts using Shell/Python; integrate Jenkins, GitLab CI, and ArgoCD to build automated release, inspection, and backup systems; implement Infrastructure as Code (IaC) principles to improve efficiency and reduce human error.
  • Incident management and post-mortem optimization: Lead online incident response, conduct root cause analysis, produce post-mortem reports, and continuously optimize cluster architecture, resource allocation, monitoring strategy, and long-term stability assurance mechanisms.
  • Technical knowledge sharing and team empowerment: Track Cloud Native and public cloud technology developments; document operations best practices and technical specifications; assist the team in improving multi-cloud K8s operations capabilities.

What We Look For In You 
  • Bachelor's degree or above in a computer-related field; 4+ years of hands-on experience operating production-level Kubernetes clusters; proficient in K8s core principles and components including Pod, Deployment, StatefulSet, Service, Ingress, CRD, controllers, scheduling strategies, network models, and storage mounting; able to independently resolve complex cluster failures and performance bottlenecks.
  • Proficient in Alibaba Cloud and AWS dual-cloud operations, with independent experience in dual-cloud production environments:
  • Alibaba Cloud: proficient in ACK Container Service, ECS, SLB, VPC, RAM, RDS, OSS, CloudMonitor, security groups, and snapshot backups.
  • AWS: proficient in EKS, EC2, S3, VPC, IAM, TGW, load balancing, CloudWatch, and security policies; practical experience in overseas cloud deployment, operations, and disaster recovery.
  • Proficient in Linux system administration; familiar with system optimization, permission control, process management, log analysis, and online troubleshooting.
  • Familiar with mainstream container runtimes (containerd/Docker); understand K8s networking (CNI plugins such as Calico/Flannel), storage (CSI), and multi-cluster management; familiar with Istio/Envoy service mesh, east-west traffic governance, gray-scale releases, and network interoperability.
  • Strong Shell and Python automation skills; experienced with CI/CD pipelines (Jenkins, GitLab CI, ArgoCD); familiar with IaC tools (Terraform, Ansible, Helm); experienced with observability stacks (Prometheus, Grafana, ELK/EFK, Jaeger, SkyWalking).
  • Preferred: experience in large-scale public cloud environments (100+ nodes); multi-cloud cost optimization; K8s security hardening (OPA/Gatekeeper, Pod Security Standards, Falco); Kubernetes CKA/CKS certification; experience with AI/LLM workload scheduling (GPU scheduling, distributed training).

Perks & Benefits 
  • Competitive total compensation package
  • L&D programs and education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants
  • More that we love to tell you along the process!
Notice:
All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.
Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

Skills Required

  • Bachelor's degree or above in a computer-related field
  • 4+ years hands-on experience operating production-level Kubernetes clusters
  • Deep knowledge of Kubernetes core components (Pod, Deployment, StatefulSet, Service, Ingress, CRD, controllers, scheduling, networking, storage)
  • Proficient in Alibaba Cloud container and infrastructure services (ACK, ECS, SLB, VPC, RAM, RDS, OSS, CloudMonitor)
  • Proficient in AWS infrastructure and EKS (EKS, EC2, S3, VPC, IAM, TGW, load balancing, CloudWatch)
  • Proficient Linux system administration and online troubleshooting
  • Familiarity with container runtimes and Kubernetes networking/storage (containerd, Docker, CNI plugins such as Calico/Flannel, CSI)
  • Familiar with Istio/Envoy service mesh and east-west traffic governance
  • Strong Shell and Python automation skills
  • Experience with CI/CD tools and GitOps (Jenkins, GitLab CI, ArgoCD)
  • Familiar with IaC and orchestration tools (Terraform, Ansible, Helm)
  • Experience with observability stacks (Prometheus, Grafana, ELK/EFK, Jaeger, SkyWalking)
  • Ability to lead incident response, perform root cause analysis, and produce post-mortems
  • Experience operating dual-cloud production environments and disaster recovery
  • Experience in large-scale public cloud environments (100+ nodes)
  • Experience with multi-cloud cost optimization
  • Experience with Kubernetes security hardening (OPA/Gatekeeper, Pod Security Standards, Falco)
  • CKA/CKS certifications
  • Experience scheduling AI/LLM workloads (GPU scheduling, distributed training)

OKX Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about OKX and has not been reviewed or approved by OKX.

  • Fair & Transparent Compensation Pay is considered competitive or above market, especially in engineering, product, and legal roles across major hubs. This positioning is consistently cited as a major attraction for candidates.
  • Healthcare Strength Role descriptions indicate comprehensive medical, dental, vision, life, and disability coverage, with employer-paid premiums in some cases. Health coverage is highlighted alongside core benefits like PTO and parental leave.
  • Wellbeing & Lifestyle Benefits Allowances for education and fitness, meal perks and snacks, team-building budgets, and structured learning programs are described across locations. These extras enhance the total rewards package beyond base pay.

OKX Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Dublin, Dublin
1,073 Employees
Year Founded: 2017

What We Do

Founded in 2017, OKX is one of the world’s leading cryptocurrency spot and derivatives exchanges. OKX innovatively adopted blockchain technology to reshape the financial ecosystem by offering some of the most diverse and sophisticated products, solutions, and trading tools on the market. Trusted by more than 20 million users in over 180 regions globally, OKX strives to provide an engaging platform that empowers every individual to explore the world of crypto. In addition to its world-class DeFi exchange, OKX serves its users with OKX Insights, a research arm that is at the cutting edge of the latest trends in the cryptocurrency industry. With its extensive range of crypto products and services, and unwavering commitment to innovation, OKX’s vision is a world of financial access backed by blockchain and the power of decentralized finance.

Similar Jobs

Micron Technology Logo Micron Technology

Principal Engineer

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
Singapore, SGP
45000 Employees

Micron Technology Logo Micron Technology

Senior Engineer

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
Singapore, SGP
45000 Employees

Micron Technology Logo Micron Technology

Data Scientist

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
Singapore, SGP
45000 Employees

Micron Technology Logo Micron Technology

Sr. Specialist, GFac Ops ProgMgr & Chng Mgr

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
Singapore, SGP
45000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account