Senior/ Lead DevOps Engineer

Posted 9 Hours Ago
Be an Early Applicant
Hiring Remotely in Vietnam
Remote or Hybrid
Senior level
Artificial Intelligence • Big Data • eCommerce • Retail
Scaling Business with Digital and Deep Tech
The Role
Join a global SRE team to ensure platform availability and 24/7 incident response. Manage AWS/GCP infrastructure, Kubernetes clusters, CI/CD and GitOps workflows, observability (monitoring/logging), and automation. Participate in on-call rotations, configure AI tooling and MCP servers, write post-incident reviews, and collaborate with international engineering teams.
Summary Generated by Built In
ABOUT US:
At Gradion, we are the strategic partner for ambitious businesses, helping them achieve breakthrough growth through Digital Innovation and Deep Tech.
With a global vision and an AI-first approach, we enable clients to reshape strategies, optimize systems, and adopt cutting-edge technologies to create sustainable value.
 
From AI and data to cybersecurity, robotics, and large-scale enterprise platforms, Gradion designs practical solutions that lay the foundation for the next generation of billion-dollar companies.
 
OUR FACTS & FIGURES:
- 23+ years of expertise - Gradion builds digital platforms & deep-tech solutions.
- 3 continents: Asia, Europe and Africa.
- 300+ specialists across 7 countries Vietnam, Singapore, Thailand, Saudi Arabia, Germany, Egypt and Australia.
- 100+ enterprise clients, including several unicorns (e.g., Alaiko, HomeToGo, Roadsurfer).
- Vietnam’s Best IT Company - recognized by ITViec for 8 consecutive years, including 2 consecutive years of ranking #1 (2024 and 2025).
- ISO 27001.

About the Role

  • Gradion is expanding its SRE team for the a client with a long-term managed services contract running through 2028. You will be part of a global, follow-the-sun SRE function, responsible for platform stability, cloud infrastructure, and 24/7 incident response across European and global client time zones.

  • This role suits engineers who are technically solid, self-directed, and comfortable operating in a fast-moving, internationally distributed environment. You will go through a structured onboarding alongside an internal SRE team before taking on independent operational responsibility.

What You Will Do

  • Own platform availability: monitor, triage, and resolve incidents within defined SLA windows

  • Manage cloud infrastructure on AWS and/or GCP - provisioning, scaling, and day-to-day operations

  • Maintain and improve CI/CD pipelines and GitOps workflows

  • Operate observability systems: monitoring, logging, and alerting at production scale

  • Participate in on-call rotation as part of the global follow-the-sun coverage model

  • Configure, deploy, and manage AI tooling and MCP servers in production environments

  • Contribute to infrastructure automation, scripting, and internal tooling

  • Write clear post-incident reviews and contribute to the monthly operational report

  • Collaborate closely with engineering teams across multiple time zones

What You Bring

  • 4+ years in a DevOps / SRE / Platform Engineering role within an international team

  • Solid Kubernetes knowledge - cluster operations, troubleshooting, and configuration

  • Hands-on cloud experience with AWS and/or GCP

  • Good understanding of networking fundamentals - DNS, load balancing, firewalls, VPC

  • Scripting and automation skills (Python, Bash, or similar)

  • Experience with CI/CD tools and GitOps-based delivery

  • Working knowledge of monitoring and observability systems (Prometheus, ELK, or equivalent)

  • Good English - daily communication with European stakeholders is a core requirement

  • Self-directed and proactive - you ask the right questions and drive issues to resolution without waiting to be told

Nice to Have

  • Experience configuring and managing MCP servers and AI tooling in production

  • Exposure to AI enablement workflows or LLM infrastructure

  • Background supporting eCommerce or SaaS platforms

  • Familiarity with the Frontastic / commercetools Frontend ecosystem

Why you’ll love working here?
 
🏆 Join Vietnam’s Best IT Company – Gradion Vietnam (formerly NFQ Vietnam) was recognized by ITViec for 7 consecutive years, including 2 successive years as the Winner. Work with some of the best minds in the industry and be part of a company that’s redefining how businesses scale through technology.
🌍 Career Growth & Leadership Development – Work closely with our leadership team, gain mentorship from experienced executives, and have direct exposure to high-level strategic decisions. Your growth is limitless, as long as you’re ready to step up, opportunities will always be there for you.
💰 Competitive Compensation – We believe great talent deserves great rewards. Expect an attractive salary, performance-based bonuses, and a benefits package that reflects your impact. We value talent over salary budgets - exceptional contributions deserve exceptional rewards.
And Many More Benefits to Explore! But most importantly, a healthy work-life balance and an environment where you can thrive - professionally and personally. Including:
- A laptop is provided.
- Community Tech activities.
- A fun & dynamic environment and freedom to be creative.
- Modern office with a flexible, relaxing zone.
- Performance bonus (up to 2-month salary).
- Performance review 2 times/ year.
- Extra Premium Healthcare & Annual Health-check.
- 15 days of annual leave.
 
Working time: Monday - Friday (9 AM - 6 PM)
Location:
- Ho Chi Minh office: Podium Floor, Sapphire 2 tower, 92 Nguyen Huu Canh Street, Thanh My Tay Ward, Ho Chi Minh City, Vietnam.      
- Da Nang office: 23rd Floor, G8 Golden Building, 65 Hai Phong, Hai Chau Ward, Da Nang City, Vietnam.
- Remote Work: Candidates based in Hanoi or Can Tho are welcome to work remotely.
 
According to General Data Protection Regulation (GDPR), Singapore's Personal Data Protection Act (PDPA), and Vietnam's Decree 13/2023/ND-CP, while also ensuring compliance with other applicable local data protection laws in the jurisdictions where we operate, including but not limited to Vietnam, Thailand, Egypt, Singapore, Germany, and Saudi Arabia, Gradion applies the “Personal Data Protection Policy” to all candidates to ensure compliance with the laws.
By submitting your application to Gradion, you agree to allow us to process your provided information in accordance with the Personal Data Protection Policy that you have carefully read, understood, and agreed to in its entirety at Link.

Skills Required

  • 4+ years in a DevOps / SRE / Platform Engineering role within an international team
  • Solid Kubernetes knowledge - cluster operations, troubleshooting, and configuration
  • Hands-on cloud experience with AWS and/or GCP
  • Good understanding of networking fundamentals - DNS, load balancing, firewalls, VPC
  • Scripting and automation skills (Python, Bash, or similar)
  • Experience with CI/CD tools and GitOps-based delivery
  • Working knowledge of monitoring and observability systems (Prometheus, ELK, or equivalent)
  • Good English for daily communication with European stakeholders
  • Participation in on-call rotation as part of global follow-the-sun coverage
  • Experience configuring and managing MCP servers and AI tooling in production
  • Exposure to AI enablement workflows or LLM infrastructure
  • Background supporting eCommerce or SaaS platforms
  • Familiarity with the Frontastic / commercetools Frontend ecosystem
  • Self-directed and proactive, able to drive issues to resolution
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
100 Employees

What We Do

At Gradion, we consult businesses with Digital & Deep Tech. With over 23 years of experience and a global team of 300+ experts across 7 countries, our impact spans e-commerce, retail, mobility, travel & logistics, and intralogistics, where we’ve driven bold, measurable transformative technology solutions. Powered by the AI-first foundation, our journey is defined by customer stories of bold transformations in Business, Technology and People. We've partnered with visionary companies to architect the strategies, systems, and technologies that propelled them to extraordinary success. From large-scale platforms to AI, data, cybersecurity & robotic solutions, Gradion delivers tailored, future-ready solutions, empowering the next generation of Billion Dollar Companies.

Similar Jobs

Mondelēz International Logo Mondelēz International

Analytics Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees

Airwallex Logo Airwallex

Business Development Manager

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office or Remote
Ho Chi Minh City, VNM
2200 Employees

Airwallex Logo Airwallex

Go-To-Market Partnerships Manager, SME & Growth

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office or Remote
Ho Chi Minh City, VNM
2200 Employees

Mondelēz International Logo Mondelēz International

Analyst, Analytics - 6 months contract

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account