Staff Software Engineer, Infrastructure

Posted 2 Days Ago
2 Locations
Remote
238K-382K Annually
Senior level
Information Technology
Docker helps developers bring their ideas to reality by conquering the complexity of app development.
The Role
Lead design and delivery of self-service platform capabilities: provisioning, deployment, observability, and multi-region networking. Drive RFCs and cross-team alignment, build APIs (primarily in Go), implement Terraform/GitOps flows, improve EKS reliability and Grafana-based SLOs, and reduce operational toil via safe, auditable automation. Own on-call rotation and platform adoption outcomes.
Summary Generated by Built In

Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. From solo founders to the world's largest companies, developers rely on Docker to build, share, and run their applications across our suite of products including Docker Desktop, Docker Hub, and Docker Scout.
We are a globally distributed, remote-first team building the tools that define how software gets built and delivered. As AI agents redefine software development, Docker is at the center of that shift, providing the sandboxed environments, verified images, and secure infrastructure that make autonomous workflows trustworthy by default.

Docker is shipping a wave of new products this year, with R&D initiatives likely to lead to more, and we're investing heavily in the platform underneath all of it. That platform supports hundreds of engineers across many development teams and carries high-scale production traffic and data transfer every day. It has grown faster than its foundations, and this year is about closing that gap.

Today, much of that work still leans on a handful of experts unblocking the same provisioning and operational workflows by hand. The top priority for this role is moving that work from expert-driven support to paved roads: self-service systems with clear ownership, safe defaults, strong guardrails, and adoption we can measure. The goal is a platform teams trust enough to stop thinking about it, one that just works, so they can focus on their own products instead of ours.

The concrete version sits on this year's roadmap: spinning up a new global region or application environment should take hours, not days. Right now it takes days. Getting there means building the foundations underneath it. We need a real multi-region, cross-account network architecture and a testing and continuous-deployment flow teams can trust, then a self-service layer on top.

We're the container company building our own internal platform, so the bar for "the easy path is also the safe path" is high. You'd be joining a team of four, growing to seven this year (this is one of those hires), and we're looking for a Staff engineer to set technical direction and lead it through real production adoption.

Responsibilities

This is a Staff-level role, so success is measured by leverage rather than just your own commits. On a team this size you'll stay hands-on in the codebase while also setting direction, aligning teams on pragmatic standards, and carrying platform investments through to adoption. Concretely, you will:

  • Take ambiguous infrastructure problems and turn them into proposals the org can rally around, then drive them through RFCs and architecture reviews across teams.

  • Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations, with contracts and docs teams actually use.

  • Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and good testing, including building the continuous-deployment flow we're missing today.

  • Evolve the multi-tenant EKS foundations toward better reliability, security, scale, and cost: Envoy Gateway ingress, traffic routing, and the multi-region, cross-account connectivity we need.

  • Improve SLOs, alerting, and incident follow-up on Grafana Cloud so production gets safer and less dependent on heroics.

We judge this work by outcomes the consuming teams feel: how fast they can provision and ship, how much they can do without us, and how reliably it all runs.

AI-assisted operations

We're actively investing in AI-assisted and agentic workflows to cut operational toil. We care that they stay safe, auditable, and human-reviewed. You'll help shape where these earn their place and where they don't. Early targets include:

  • Alert enrichment and incident context-gathering: assembling the relevant signals, history, and runbook so the on-call engineer starts with context instead of a blank page.

  • Runbook-assisted diagnosis and remediation recommendations, with a human in the loop on anything that changes production.

  • Onboarding and readiness assistants that answer the questions our experts answer today.

If you've built operational automation and have a healthy skepticism about where automation belongs, this is a place to put both to work.

On-call

Operational ownership is part of the job. You'll join the rotation after onboarding and shadowing. As a Staff engineer, you'll also improve the health of on-call itself, with better alerts, stronger runbooks, less toil, and blameless postmortems aimed at prevention.

Qualifications
  • 8+ years of professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering.

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience

  • Strong software engineering in Go or a similar language: design, testing, debugging, review, long-term maintainability.

  • A track record designing, shipping, and operating cloud services or infrastructure platforms in production. We hire for skill and impact, not years.

  • Deep expertise in at least one of: Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms, plus solid Linux, networking, and production-ops fundamentals.

  • Experience setting technical direction and leading work that needs cross-team alignment.

  • Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).

Nice to have: EKS and ingress/CNI/service-mesh experience; observability with OpenTelemetry/Prometheus/Grafana; CI/CD and progressive delivery (GitHub Actions, Argo CD, canaries); experience leading migrations or adoption programs across teams.

You don't need every item here. We value deep expertise in one area, strong systems judgment, and curiosity across the rest.

What to ExpectFirst 30 days
  • Build context, meet partner teams, ship your first change, shadow on-call.

First 90 days
  • Own a strategic platform problem with a clear plan and metrics; lead an improvement from design to production.

One Year Outlook
  • Lead a major cross-team initiative (for example, self-service provisioning of new regions and environments, or the multi-region networking and CD foundations behind it) and establish durable patterns that change how Docker engineers build and operate services.

Docker considers visa sponsorship on a case-by-case basis based on business needs.

Perks

  • Freedom & flexibility; fit your work around your life

  • Designated quarterly Whaleness Days plus end of year Whaleness break

  • Home office setup; we want you comfortable while you work

  • 16 weeks of paid Parental leave (after 6 months of employment)

  • Technology stipend equivalent to $100 USD net/month

  • PTO plan that encourages you to take time to do the things you enjoy

  • Training stipend for conferences, courses and classes

  • Equity; we are a growing start-up and want all employees to have a share in the success of the company

  • Docker Swag

  • Medical benefits, retirement and holidays vary by country

  • Remote-first culture, with offices in Seattle and Paris

Docker embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the better our company will be.

#LI-REMOTE

Skills Required

  • 8+ years professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • Strong software engineering in Go or a similar language (design, testing, debugging, maintainability)
  • Track record designing, shipping, and operating cloud services or infrastructure platforms in production
  • Deep expertise in at least one: Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms
  • Solid Linux, networking, and production-ops fundamentals
  • Experience setting technical direction and leading cross-team alignment
  • Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups)
  • Operational ownership including joining on-call rotation and participating in incident response and blameless postmortems
  • Experience with Terraform and implementing GitOps delivery patterns (Argo CD)
  • EKS and ingress/CNI/service-mesh experience
  • Observability with OpenTelemetry, Prometheus, and Grafana
  • CI/CD and progressive delivery experience (GitHub Actions, Argo CD, canaries)
  • Experience leading migrations or adoption programs across teams

Docker, Inc Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Docker, Inc and has not been reviewed or approved by Docker, Inc.

  • Healthcare Strength Healthcare coverage is described as comprehensive, including employer-paid medical, dental, and vision for employees and dependents in the U.S. Additional resources such as telehealth, mental-health support, and an HRA for deductibles are highlighted.
  • Flexible Benefits Remote-first support includes a home office setup budget, monthly technology and coworking stipends, and async/time-zone flexibility. These elements indicate adaptability to distributed work.
  • Leave & Time Off Breadth Time off programs include flexible PTO, companywide wellness days, and a year-end recharge period. Paid parental leave is also offered following an eligibility period.

Docker, Inc Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Palo Alto, CA
498 Employees
Year Founded: 2013

What We Do

At Docker, we simplify the lives of developers who are making world-changing apps. We simplify and accelerate workflows with an integrated development pipeline and application components. Actively used by millions of developers around the world, Docker Desktop and Docker Hub provide unmatched simplicity, agility and choice.

Why Work With Us

We are a people-first organization that provides every employee an opportunity to grow and learn. We provide regular development opportunities for all employees helping employees achieve their goals.

Gallery

Gallery

Similar Jobs

Remote
2 Locations
29 Employees
175K-230K Annually

Cash App Logo Cash App

Applied Research Intern, Proactive Intelligence & Customer World Models (PhD / Graduate Co-op)

Blockchain • Fintech • Mobile • Payments • Software • Financial Services
Remote or Hybrid
8 Locations
3500 Employees

Block Logo Block

Applied Research Intern, Proactive Intelligence & Customer World Models (PhD / Graduate Co-op)

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees

Block Logo Block

Applied Research Intern, Proactive Intelligence & Customer World Models (PhD / Graduate Co-op)

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
Toronto, ON, CAN
12000 Employees

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account