Reliability Operations Engineer (Malaysia)

Reposted 2 Days Ago
Be an Early Applicant
Penang, Daerah Timor Laut, Penang, MYS
In-Office
Mid level
Robotics
Serve Robotics develops advanced, AI-powered, sidewalk delivery robots that make delivery sustainable and economical
The Role
The Reliability Operations Engineer manages operational reliability for robotic systems, handles escalations, performs technical investigations, updates runbooks, and enhances troubleshooting workflows.
Summary Generated by Built In

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

The Reliability Operations Engineer supports the operational reliability of robotic and cloud systems by handling Tier 2 escalations, following and improving runbooks, and performing technical investigations during your region’s daytime hours. This role works closely with senior team members, product engineering, and SREs to investigate issues, refine operational workflows, and strengthen system health. This position contributes to incident response by providing triage and clear communication, ensuring timely escalation and effective coordination across teams.

Responsibilities

  • Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response.

  • Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed.

  • Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures.

  • Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks

  • Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance.

  • Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination.

  • Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability.

  • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise.

  • Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves.

Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent hands-on experience.

  • 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.

  • Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation.

  • Exposure to operational environments supporting distributed or cloud-based systems.

  • Participation in incident response workflows and/or on-call rotations.

  • Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics.

  • Experience using and contributing to runbooks and operational workflows.

  • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry.

  • Familiarity with cloud platforms, preferably Google Cloud Platform (GCP).

  • Ability to follow documented remediation steps, with good judgment around when to escalate.

  • Understanding of CI/CD pipelines and how application deployments affect runtime behavior.

  • Experience using Jira or similar ticketing systems.

  • Clear and effective communicator, especially when providing updates during time-sensitive operational issues.

  • Calm, organized approach to troubleshooting and prioritization.

  • Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs.

  • Strong sense of ownership and accountability for operational responsibilities.

What Makes You Stand You

  • Prior experience participating in high-severity incident response or supporting operational incidents.

  • Exposure to robot fleets, IoT systems, or other distributed physical device environments.

  • Ability to write or modify lightweight scripts and automations to improve operational workflows.

  • Familiarity with incident management platforms such as PagerDuty, OpsGenie, Jira Service Management, or Grafana IRM.

  • Experience contributing to the creation or improvement of operational runbooks and support documentation.

  • Strong networking fundamentals; familiarity with Tailscale or similar zero-trust networking tools is a plus.

  • Demonstrated ability to learn quickly and contribute to improving operational maturity within a team

Additional Information

  • As part of maintaining continuous operational coverage, this role also participates in a rotating weekend on-call schedule shared across the Reliability Operations team.

Skills Required

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or equivalent hands-on experience.
  • 2-4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, or IT Operations.
  • Experience participating in Tier 1 or Tier 2 investigations including log review.
  • Exposure to operational environments supporting distributed or cloud-based systems.
  • Proficiency with Linux, including navigating systems and performing diagnostics.
  • Experience using and contributing to runbooks and operational workflows.
  • Ability to interpret metrics and logs using tools like Grafana/Prometheus.
  • Familiarity with cloud platforms, preferably Google Cloud Platform (GCP).
  • Experience using Jira or similar ticketing systems.

Serve Robotics Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Serve Robotics and has not been reviewed or approved by Serve Robotics.

  • Equity Value & Accessibility Equity is positioned as a meaningful part of total compensation through company equity incentive plans and active stock‑based awards. Filings and investor materials indicate broad use of options/RSUs consistent with growth‑stage tech compensation.
  • Healthcare Strength Core medical, dental, and vision coverage is provided alongside life and AD&D plus short‑ and long‑term disability insurance. Flexible spending accounts are also available to support healthcare needs.
  • Leave & Time Off Breadth Vacation and paid holidays are included as standard components of the package. Disclosures reference flexible or paid time off frameworks that can vary by department.

Serve Robotics Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Los Angeles, CA
402 Employees
Year Founded: 2021

What We Do

Serve Robotics (NASDAQ:SERV) develops advanced, AI-powered, low-emissions sidewalk delivery robots that endeavor to make delivery sustainable and economical. Spun off from Uber in 2021 as an independent company, Serve has completed tens of thousands of deliveries for enterprise partners such as Uber Eats and 7-Eleven. The company has scalable multi-year contracts, including a signed agreement to deploy up to 2,000 delivery robots on the Uber Eats platform across multiple U.S. markets.

Similar Jobs

In-Office
Penang, Daerah Timor Laut, Penang, MYS
402 Employees

Motorola Solutions Logo Motorola Solutions

Procurement - Product Sourcing & Supply Intelligence Lead

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Hybrid
Penang, Daerah Timor Laut, Penang, MYS
23000 Employees

Motorola Solutions Logo Motorola Solutions

R&D Mechanical Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Penang, MYS
23000 Employees

Motorola Solutions Logo Motorola Solutions

Software Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Penang, MYS
23000 Employees
1-5 Annually

Similar Companies Hiring

Apptronik Thumbnail
Computer Vision • Hardware • Machine Learning • Robotics • Software
Austin, TX
355 Employees
Doodle Labs Thumbnail
Wearables • Robotics • Internet of Things • Hardware • Automation • App development • Aerospace
SG
50 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account