Tech Infra Engineer

Reposted 19 Days Ago
Be an Early Applicant
2 Locations
In-Office
Expert/Leader
eCommerce
Building the future of eCommerce. We push the boundaries of what’s possible to solve problems!
The Role
As a Staff Systems Engineer, you will develop scalable orchestration solutions using Kubernetes technology, ensuring reliable and automated application lifecycle management across cloud environments.
Summary Generated by Built In

As a Staff Systems Engineer in Developer Platform, you will partner with leaders of multiple platform teams. You will work closely with product to define and implement simple solutions to complex orchestration problems, building a highly scalable, reliable, and efficient platform for our customers. You will engineer and develop Kubernetes controllers, operators, and node-level daemons for the application runtime; drive performance tuning and scaling; and design multi-cluster control-plane capabilities that scale to millions of pods across thousands of clusters.

What You Will Do

  • Engineer and develop a unified application platform for hybrid (multi-cluster, multi-region, multi-cloud) application management using Kubernetes controllers and feedback-driven control systems to meet SLOs.
  • Deliver end-to-end automation for application lifecycle (deployments, rollouts, failovers, policy enforcement) to minimize manual work for users.
  • Drive fleet-wide optimization for cost, performance, and latency through data-informed controls and capacity management, improving $/RPS and tail latency.
  • Build resilient, multi-tenant control planes and workflows that safely scale to millions of pods across thousands of clusters.
  • Ensure reliability, security, and governance with clear guardrails, safe defaults, and automated remediation.
  • Partner with product and customers to turn complex orchestration problems into simple, reusable platform primitives and great developer experiences.
  • Champion observability and continuous improvement with measurable, outcome-focused metrics.

Basic Qualifications

  • Bachelor’s degree in Computer Science, Electrical Engineering, Math, or a closely related field (or equivalent experience)
  • 10+ years in backend software development and operations
  • Recent experience designing and operating large-scale distributed systems (last 3 years) • Fluency in one or more among Go, C/C++, Python, or Java
  • Proven track record of delivering mission-critical systems
  • Experience with cloud computing using AWS or Azure or GCP

 

Preferred Qualifications

  • Kubernetes API machinery and semantics: SSA, SMP, server-side dry-run, watches/informers/listers, rate-limited workqueues, finalizers, owner references, leader election, API Priority and Fairness
  • Controllers/operators and node daemons in Go: client-go/controller-runtime, reconciliation patterns, backoff and retry, idempotency, partitioned/sharded controllers, HA and failover
  • CRDs and webhooks: versioning, conversion functions/webhooks, validating/mutating admission webhooks, policy frameworks and best practices
  • Pod/runtime semantics: sidecars, init/ephemeral containers, probes (readiness/liveness/startup), lifecycle hooks, termination behavior, PDBs, QoS classes, ResourceQuota/LimitRange, topology spread, affinity/anti-affinity
  • Scaling systems: HPA (resource/custom/external metrics), VPA, cluster autoscaler; multi-dimensional scaling, health-aware/autopilot-style policies; external metrics adapters and SLO-driven scaling
  • Federated and multi-cluster: placement/propagation, failover, drift detection, reconciliation strategies; consistent hashing and partitioning for scale
  • Distributed systems: CRDTs and eventual consistency paradigms; Raft/memberlist/gossip; deep familiarity with etcd, Kafka, Redis and their operational characteristics (compaction, backpressure, retention, failover)
  • Observability and data: Prometheus (cardinality control, recording rules), tracing; experience with vector databases for search and diagnostics; strong time-series forecasting (classical + ML) and statistical modeling for proactive optimization
  • Languages and interfaces: Go (primary), Java/Python as needed; gRPC/protobuf; JSON/YAML/Jsonnet
  • Leadership: ability to handle multiple competing priorities in a fast-paced environment and lead the delivery of large-scale services for complex business offerings

Privacy Notice​ 

  • Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: https://www.coupang.jobs/privacy-policy/ 

Top Skills

AWS
Azure
C/C++
GCP
Go
Java
JSON
Jsonnet
Kafka
Kubernetes
Prometheus
Python
Redis
Yaml
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Mountain View, CA
70,000 Employees
Year Founded: 2010

What We Do

We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar e-commerce industry from the ground up. We are one of the fastest-growing e-commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce.

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurial, surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.

Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.

Why Work With Us

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact.

Gallery

Gallery

Similar Jobs

Motive Logo Motive

Platform Engineer

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
In-Office
5 Locations
4000 Employees
164K-236K Annually

Magna International Logo Magna International

Eng, Manufacturing

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Spring Hill, TN, USA
171000 Employees

CrowdStrike Logo CrowdStrike

Senior Data Scientist

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees
125K-180K Annually

CrowdStrike Logo CrowdStrike

Software Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees
120K-180K Annually

Similar Companies Hiring

ClickMint Thumbnail
Marketing Tech • Generative AI • eCommerce • AdTech
Malibu, CA
9 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account