Staff Cloud Backend Engineer- Observability and Site Reliability

Reposted 12 Days Ago
Be an Early Applicant
Bengaluru, Bengaluru Urban, Karnataka, IND
In-Office
120K-160K Annually
Senior level
eCommerce
Building the future of eCommerce. We push the boundaries of what’s possible to solve problems!
The Role
As a Staff Backend Engineer, design and maintain observability platforms, implement SRE best practices, optimize performance, and ensure security compliance in datacentre operations.
Summary Generated by Built In

Company Introduction  

We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce.   

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurs surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.  

Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional trade-offs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world. 
  

Role Overview

As a Staff Data Centre Observability and Site Reliability Engineer, you will own the design and operation of scalable observability platforms to ensure the reliability, performance, and availability of datacentre services. You will apply SRE best practices, automation, and performance optimization to deliver resilient infrastructure. This role partners closely with engineering teams and vendors to drive operational excellence while maintaining security and compliance standards.

What You Will Do

Observability and Monitoring:
• Design, implement, and maintain observability solutions for datacentre infrastructure.
• Develop, deploy, and maintain the operational and reliability components of a large-scale Observability and Telemetry collection platform, emphasizing performance at scale, real-time monitoring, logging, and alerting.
• Participate in and enhance the entire lifecycle of services, from inception and design to deployment, operation, and refinement.
• Develop and optimize monitoring systems to ensure high availability and performance.
• Create and manage dashboards, alerts, and reports to provide visibility into system health and performance.
Site Reliability Engineering (SRE):
• Implement SRE best practices to improve the reliability, scalability, and performance of datacentre services.
• Develop and maintain automation scripts for infrastructure provisioning, monitoring, and management.
• Conduct root cause analysis and post-mortem reviews to prevent recurrence of incidents.
Performance Optimization:
• Analyze and optimize the performance of datacentre systems and applications.
• Implement best practices for resource utilization and efficiency. Collaboration:
• Work closely with other engineering teams to understand and meet their observability and reliability requirements.
• Collaborate with hardware and software vendors to evaluate and integrate new technologies.
Security and Compliance:
• Ensure that observability and reliability solutions comply with security policies and industry standards.
• Implement and maintain security measures to protect data and infrastructure.
Troubleshooting and Support:
• Provide support for observability and reliability-related issues, including debugging and resolving hardware and software problems.
• Develop and maintain documentation for troubleshooting procedures and best practices.
Continuous Improvement:
• Stay updated with the latest advancements in observability and SRE technologies and integrate them into the infrastructure.
• Continuously improve the reliability, scalability, and performance of datacentre services.


Basic Qualifications

  • Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field. 
  • Experience: 8–12 years of progressive software engineering experience, with a heavy emphasis on distributed systems, cloud-native architectures, or platform operations. 
  • Programming: Strong proficiency in Go or Python, with a deep understanding of networked systems and performance optimization. 
  • Orchestration: Expert-level knowledge of Kubernetes internals (scheduling, controllers) and containerization ecosystems. 
  • Traffic Management: Proven experience with load balancing, service mesh, and request routing at scale. 
  • Operational Excellence: A strong "ownership" mindset with a track record of maintaining mission-critical, high-availability systems in production. 

Preferred Qualifications

  • AI/ML Domain Knowledge: Prior experience building infrastructure specifically for LLM inference or large-scale training clusters. 
  • Low-Level Optimization: Familiarity with inference, including mixed precision, kernel tuning, or custom hardware accelerators. 
  • Public/Private Cloud: Experience managing hybrid-cloud or multi-AZ deployments across AWS, Azure, or GCP. 
  • Compliance: Experience operating in regulated environments with strict security and compliance requirements. 

Type of work model 

  • Hybrid

Details to consider 

  • Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws. 
     

Privacy Notice  

  • Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below. https://privacy.coupang.com/en/land/jobs/ 

Skills Required

  • 8-12 years of experience in software engineering
  • Strong proficiency in Go or Python
  • Expert-level knowledge of Kubernetes
  • Experience with load balancing and service mesh
  • Bachelor's or Master's degree in Computer Science or related field

Coupang Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Coupang and has not been reviewed or approved by Coupang.

  • Healthcare Strength Benefits are described as robust, including medical, dental, vision, disability, life, mental health, and transgender healthcare. Some locations also offer on-site or virtual health screenings that enhance preventive care access.
  • Retirement Support Financial programs include a 401(k) with company matching and an employee stock purchase plan alongside performance bonuses. Feedback suggests these elements provide a structured foundation for long-term savings and wealth building.
  • Leave & Time Off Breadth Paid time off is described as generous with accrual increasing with tenure, complemented by company-paid holidays and parental leave. Flexible work arrangements in some roles further support time away when needed.

Coupang Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Mountain View, CA
70,000 Employees
Year Founded: 2010

What We Do

We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce. We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurial, surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day. Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.

Why Work With Us

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact.

Gallery

Gallery

Similar Jobs

In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
70000 Employees

CSC Logo CSC

Accountant

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
8500 Employees

CSC Logo CSC

Associate KYC Services Specialist

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
8500 Employees

CSC Logo CSC

Associate Client Order Coordinator

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
8500 Employees

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Artificial Intelligence • eCommerce • Fintech • Payments • Retail • Software • Analytics
US
35 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account