Senior Platform Engineer / SRE

Posted 2 Days Ago
Be an Early Applicant
2 Locations
In-Office or Remote
Senior level
HR Tech • Information Technology • Professional Services • Software
The Role
Lead design and operation of multi-tenant Kubernetes infrastructure on AWS EKS, drive Terraform-based IaC and GitOps at scale, build self-service automation, own reliability (SLOs, incident response), set observability and security standards, mentor engineers, and contribute to the platform roadmap.
Summary Generated by Built In

THE ROLE

We're looking for a Senior Platform Engineer / SRE who can work with complex infrastructure work, drive IaC and GitOps architecture, and set the standard for how we automate and operate systems at scale. You'll tackle hard problems — multi-tenant isolation, self-service infrastructure, reliability engineering — and have the scope to solve them properly.

This is not a ticket-processing role. Seniors here identify problems before they're asked, make architectural calls, mentor engineers, and raise the ceiling on what the platform can do.

WHAT YOU'LL WORK ON

• IaC architecture — Terraform module design, state management, multi-account patterns, and setting the standards the rest of the team builds against

• Drive GitOps at scale — ArgoCD configuration, progressive delivery patterns, promotion workflows, and deployment reliability across multiple environments and tenants

• Architect and operate multi-tenant Kubernetes infrastructure on AWS EKS — tenant isolation, workload placement, cluster topology, and long-term scalability strategy

• Build self-service infrastructure automation — provisioning pipelines, configuration management, and platform capabilities that engineering teams can consume without manual intervention

• Agentic coding tools for infrastructure work — scaffolding new environments, generating and reviewing IaC, accelerating automation, and establishing patterns for the team

• Own reliability — SLO definitions, error budgets, incident response quality, and the feedback loop that turns incidents into platform improvements

• Set observability standards — trace coverage, alert quality, on-call ergonomics, and runbook culture

• Partner with security on zero-trust architecture, secrets management at scale, and infrastructure hardening

• Contribute to technical roadmap and help the team prioritize the right work

• Mentor mid-level engineers — code review, design feedback, on-call shadowing

WHAT WE'RE LOOKING FOR

• 6+ years in platform engineering, SRE, or infrastructure — with meaningful time operating production systems at scale

• Deep IaC expertise — you design Terraform architectures, not just write modules; you've managed complex state and multi-account configurations in production

• Strong GitOps background — you understand declarative infrastructure management at depth and have opinions on how to do it well

• Deep Kubernetes knowledge — you've operated clusters in production, dealt with real failure modes, and understand the system at the control plane level

• Strong AWS background — networking, compute, IAM, storage, multi-account design

• Experience with multi-tenant infrastructure — isolation patterns, noisy neighbor mitigation, and tenant lifecycle management

• Automation-first thinking at a senior level — you design systems that eliminate entire categories of manual work, not just individual tasks

• Active user of agentic coding tools — you know how to direct them effectively, review their output critically, and use them to multiply your output

• Reliability engineering track record — SLOs defined and measured, post-mortems run, measurable improvements driven

• Strong communicator — you can make architectural decisions legible to engineers and leadership alike

NICE TO HAVE

• Experience with Karpenter and node lifecycle management in production

• Background in FinOps — cost attribution, reserved capacity planning, workload right-sizing

• Familiarity with data infrastructure — object storage, CDC pipelines, or lakehouse patterns

• Experience supporting AI/ML inference workloads or GPU-based compute in production

• Prior experience scaling platform infrastructure at a startup moving toward enterprise-grade requirements

WHAT YOU WON'T FIND HERE

A platform team that maintains the status quo. We're actively building — new scale requirements, new architectural domains, new automation capabilities. Senior engineers here shape how the platform evolves, and the tools available to do it are better than they've ever been.

Type: Full-Time, remote

Work hours aligned with EST or PST

Skills Required

  • 6+ years in platform engineering, SRE, or infrastructure with production operations experience
  • Design and operate Terraform architectures, state management, and multi-account configurations
  • Strong GitOps experience and ArgoCD configuration at scale
  • Deep Kubernetes production experience, including control plane and failure modes
  • Strong AWS experience (networking, compute, IAM, storage, multi-account design)
  • Experience with multi-tenant infrastructure, isolation patterns, and noisy-neighbor mitigation
  • Automation-first design mindset to eliminate manual work at scale
  • Reliability engineering track record: SLOs, error budgets, post-mortems, measurable improvements
  • Active user of agentic coding tools and ability to review and direct their output
  • Strong communication skills and ability to make architectural decisions legible to engineers and leadership
  • Experience with Karpenter and node lifecycle management in production
  • FinOps experience: cost attribution, reserved capacity planning, workload right-sizing
  • Familiarity with data infrastructure (object storage, CDC, lakehouse patterns)
  • Experience supporting AI/ML inference workloads or GPU-based compute in production
  • Prior experience scaling platform infrastructure at a startup toward enterprise-grade requirements
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
30 Employees
Year Founded: 2021

What We Do

Wizdaa is an IT recruitment services company that specializes in sourcing and placing top-tier remote developers from Latin America with startups, primarily in U.S. time zones. From its headquarters in Miami, it sources and vets elite-level developers to ensure seamless, real-time collaboration for clients. The company provides end-to-end solutions including onboarding, payroll, and tax management, helping startups build world-class development teams efficiently.

Similar Jobs

Circle (circle.so) Logo Circle (circle.so)

Lead Product Designer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
140K-170K Annually

Motive Logo Motive

Account Manager

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
Pakistan
4000 Employees

Motive Logo Motive

Implementation/Installation Strategist

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
Pakistan
4000 Employees

Ericsson Logo Ericsson

Architect

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office or Remote
6 Locations
88000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account