Senior Devops Engineer

Posted 21 Days Ago
Be an Early Applicant
Petaling Jaya, Petaling, Selangor
In-Office
100K-120K Annually
Senior level
Artificial Intelligence • Gaming
The Role
The Senior DevOps Engineer will maintain and optimize production systems, respond to faults, ensure high availability, and drive collaborative problem-solving for AI and cloud gaming customers.
Summary Generated by Built In

Aethir is the leading DePIN, enterprise-grade, AI-focused GPU-as-a-Service provider in the market. By leveraging a highly distributed cloud computing infrastructure, we help GPU providers serve AI and gaming customers at scale. Our mission is to deliver powerful AI chips for enterprise clients while supporting cloud gaming for hundreds of thousands of users worldwide—all under a decentralized cloud architecture that brings compute power directly to the community.

We are looking for a Senior DevOps Engineer (Site Reliability Engineer) to join our new headquarters in Kuala Lumpur, Malaysia. In this role, you will be responsible for maintaining, optimizing, and scaling our production systems to ensure high availability, reliability, and performance across our decentralized compute network. You’ll play a key part in supporting mission-critical infrastructure for our AI and cloud gaming customers globally.

Key Responsibilities:

  • Monitor, Review, and Respond to Faults: Take on the responsibility of monitoring, reviewing, responding to faults, troubleshooting, resolving, and subsequently optimizing the production system.
  • System Architecture and Performance: Continuously monitor and review the system architecture, process logic, system performance, stability, and other technical areas and indicators to ensure their rationality.
  • Coordination with Business Team: Drive the business team in resolving any issues related to operations and maintenance.
  • Production Failure Response: Respond promptly to production failures, acting as the overall coordinator for resolution.
  • Collaborative Problem-Solving: Organize relevant R&D, operations and maintenance, and product teams to collaboratively investigate and resolve problems.
  • Failure Response Time: Responsible for the failure response time and resolution time, ensuring timely resolution of issues.
  • Case Studies and Optimization: Conduct case studies on production issues and follow up with optimizations to improve system performance and stability.
  • Documentation: Maintain comprehensive documentation of system architecture, processes, and troubleshooting procedures.
  • Continuous Improvement: Identify areas for improvement in the operations and maintenance processes and implement necessary changes.

Requirements
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Experience in operations and maintenance development, preferably in a cloud computing or AI-focused environment.
  • Strong understanding of system architecture, performance monitoring, and troubleshooting methodologies.
  • Excellent communication and collaboration skills.
  • Ability to work in a fast-paced, startup environment.
  • Proficiency in Kubernetes (K8S), CI/CD, and Docker.
  • Expertise in AWS (VPC, S3, EC2, etc.) or Python (one of the two).
  • Responsible for building the operations and maintenance infrastructure platform and handling core business operations.
  • Management experience is a plus, but not required.
  • Prior experience working in structured environments such as Huawei, ZTE, or banking institutions is preferred.
  • Fluency in Mandarin is mandatory (written and spoken)

Benefits
  • Hypergrowth Startup Environment
  • Fantastic Career Progression Opportunities
  • Work within a Global and Local Team
  • Collaborative and innovative work environment with opportunities to contribute to cutting-edge projects.

Top Skills

AWS
Ci/Cd
Docker
Kubernetes
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
13 Employees
Year Founded: 2021

What We Do

Aethir builds Decentralized Cloud Infrastructure (DCI) for Gaming and AI companies.

Similar Jobs

Quantios Logo Quantios

Senior Devops Engineer

Information Technology
In-Office
Serdang, Petaling, Selangor, MYS
265 Employees

CARSOME Logo CARSOME

Senior Devops Engineer

Automotive • Marketing Tech
In-Office
Damansara, Petaling Jaya, Petaling, Selangor, MYS
1737 Employees

Mastercard Logo Mastercard

Technical Program Manager

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Hybrid
Selangor, MYS
35300 Employees

Mastercard Logo Mastercard

Director, Product Management (Direct to Scheme)

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Hybrid
Selangor, MYS
35300 Employees

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account