AIOps Engineer

Posted 5 Hours Ago
Be an Early Applicant
Shah Alam, Petaling, Selangor, MYS
In-Office
Mid level
Gaming • Hardware
The Role
The AIOps Engineer will enhance the reliability and performance of payment platform infrastructure by implementing automation solutions and improving observability practices, working closely with engineering teams.
Summary Generated by Built In

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.

Job Responsibilities :

We are seeking an experienced AIOps Engineer to enhance the reliability, performance, and operational intelligence of mission-critical payment platform infrastructure and services.
This role focuses on designing and implementing intelligent automation solutions, improving observability practices, and leveraging AI-assisted operational capabilities to proactively detect system risks and optimize platform performance. The AIOps Engineer will work closely with DevOps, SRE, and Software Engineering teams to reduce operational overhead, improve incident response efficiency, and support scalable transaction processing environments.Key Responsibilities:Operational Intelligence & Observability Engineering
  • Design and improve monitoring strategies covering infrastructure, applications, transaction flows, and distributed system dependencies.

  • Build dashboards and alerting frameworks that provide actionable operational insights.

  • Analyze logs, metrics, traces, and telemetry data to identify performance degradation patterns and reliability risks.

  • Define and track service reliability indicators such as SLIs and SLOs.

Automation & Intelligent Workflow Implementation
  • Develop automation scripts and workflows to reduce repetitive operational tasks and improve system recovery speed.

  • Implement intelligent alert enrichment and automated incident triage mechanisms.

  • Improve signal-to-noise ratio by tuning anomaly detection thresholds and alert correlation logic.

  • Support creation of self-healing mechanisms for common infrastructure or application failure scenarios.

Incident Response & Reliability Improvement
  • Participate in incident investigations and lead technical diagnosis for complex operational issues.

  • Identify systemic reliability weaknesses and recommend engineering improvements.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) through improved tooling and automation.

  • Contribute to resilience engineering initiatives such as failover validation or controlled fault simulations.

Performance & Capacity Intelligence
  • Analyze operational data to forecast infrastructure capacity requirements and scaling thresholds.

  • Provide recommendations for performance optimization across compute, database, messaging, and networking components.

  • Support cost efficiency initiatives through workload behavior analysis.

AI-Enabled Operational Innovation
  • Implement AI use cases such as:

    • anomaly detection for infrastructure metrics

    • log clustering and automated incident summarization

    • predictive scaling signals based on workload patterns

    • deployment risk analysis using historical operational data

    • operational insight dashboards for proactive decision-making

  • Evaluate emerging AIOps tools and integrate suitable capabilities into monitoring and automation platforms.

  • Collaborate with R&D and platform teams to validate intelligent automation solutions in production environments.

Security & Compliance Awareness
  • Ensure operational tooling and automation workflows align with secure engineering practices and regulatory expectations.

  • Support audit readiness by maintaining visibility and traceability of monitoring configurations and operational actions.

  • Contribute to detection use cases related to abnormal infrastructure or system behavior.

Requirements
  • Bachelor’s Degree in Computer Science, Engineering, Data Science, or related field.

  • 3–5 years experience in DevOps, SRE, Platform Engineering, or operational analytics roles.

  • Strong understanding of distributed systems, cloud infrastructure, and reliability engineering concepts.

  • Experience working with monitoring, logging, or observability platforms.

  • Hands-on scripting or programming experience (e.g., Python, Go, Bash).

  • Experience analyzing operational datasets such as logs, metrics, or event streams.

  • Strong analytical thinking and troubleshooting capabilities.

Preferred Qualifications
  • Experience in Payment Gateway, FinTech, Banking, or high-transaction platforms.

  • Exposure to cloud platforms such as AWS, GCP, or Azure.

  • Familiarity with container platforms and orchestration environments.

  • Experience with messaging or event streaming systems.

  • Knowledge of AI/ML tooling for anomaly detection or predictive analytics.

  • Experience supporting regulated production environments (e.g., PCI DSS).

Pre-Requisites :

Razer is proud to be an Equal Opportunity Employer. We believe that diverse teams drive better ideas, better products, and a stronger culture. We are committed to providing an inclusive, respectful, and fair workplace for every employee across all the countries we operate in. We do not discriminate on the basis of race, ethnicity, colour, nationality, ancestry, religion, age, sex, sexual orientation, gender identity or expression, disability, marital status, or any other characteristic protected under local laws. Where needed, we provide reasonable accommodations - including for disability or religious practices - to ensure every team member can perform and contribute at their best.

Are you game?

Skills Required

  • Bachelor's Degree in Computer Science, Engineering, Data Science, or related field.
  • 3-5 years experience in DevOps, SRE, Platform Engineering, or operational analytics roles.
  • Strong understanding of distributed systems, cloud infrastructure, and reliability engineering concepts.
  • Experience working with monitoring, logging, or observability platforms.
  • Hands-on scripting or programming experience (e.g., Python, Go, Bash).
  • Experience analyzing operational datasets such as logs, metrics, or event streams.
  • Strong analytical thinking and troubleshooting capabilities.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
1,383 Employees
Year Founded: 2005

What We Do

Razer™ is the world’s leading lifestyle brand for gamers. The triple-headed snake trademark of Razer is one of the most recognized logos in the global gaming and esports communities. With a fan base that spans every continent, the company has designed and built the world’s largest gamer-focused ecosystem of hardware, software and services. Razer’s award-winning hardware includes high-performance gaming peripherals and Blade gaming laptops. Razer’s software platform, with over 70 million users, includes Razer Synapse (an Internet of Things platform), Razer Chroma™ (a proprietary RGB lighting technology system), and Razer Cortex (a game optimizer and launcher). In services, Razer Gold is one of the world’s largest virtual credit services for gamers, and Razer Fintech is one of the largest online-to-offline digital payment networks in SE Asia. Founded in 2005 and dual-headquartered in Irvine and Singapore, Razer has 18 offices worldwide and is recognized as the leading brand for gamers in the USA, Europe and China. Razer is listed on the Hong Kong Stock Exchange (Stock Code: 1337).

Similar Jobs

Razer Logo Razer

Senior AIOps Engineer

Gaming • Hardware
In-Office
Shah Alam, Petaling, Selangor, MYS
1383 Employees

Mondelēz International Logo Mondelēz International

Management Trainee

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Selangor, MYS
90000 Employees

Mastercard Logo Mastercard

Manager, Business Development

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Remote or Hybrid
Selangor, MYS
38800 Employees

Mondelēz International Logo Mondelēz International

HR Intern

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Selangor, MYS
90000 Employees

Similar Companies Hiring

Turion Space Thumbnail
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Irvine, CA
150 Employees
ARB Interactive Thumbnail
Gaming • Software
Miami, Florida
175 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account