Senior Cloud Infrastructure Engineer

Posted 22 Days Ago
Hiring Remotely in United States
Remote
200K-250K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
The Role
Design, build, and operate Restate Cloud and BYOC deployments across multi-tenant SaaS and on-prem environments. Implement IaC and cloud orchestration for Kubernetes-based stateful workloads, ensure reliability and observability (SLOs, metrics, traces, logs, runbooks), automate fleet scaling, and participate in on-call rotations supporting production operations.
Summary Generated by Built In
Build the infrastructure platform powering durable execution

The next generation of applications will be long-running, integration-heavy, and increasingly autonomous.

As systems become more agentic and failure-prone, durable execution transforms reliability from an application concern into a platform capability.

Restate sits at the center of that shift.

Restate (restate.dev) turns AI agents, workflows, and backend services into durable processes, allowing developers to focus on business logic rather than retries, state recovery, and failure handling.

We're looking for a Senior to Staff-level Cloud Infrastructure Engineer to own the infrastructure platform that powers Restate across open source, self-hosted deployments, multi-tenant SaaS, and Bring Your Own Cloud environments.

This is a high-ownership role at the intersection of cloud infrastructure, distributed systems, and platform engineering.

What you'll own
  • The infrastructure and control plane powering Restate Cloud.

  • The systems enabling BYOC and self-hosted deployments.

  • Reliability, observability, and operational excellence across the fleet.

  • Automation and tooling that allow the platform to scale efficiently.

  • Production operations and participation in the cloud on-call rotation.

You will operate across networking, storage, Kubernetes, cloud APIs, observability, and infrastructure automation while owning systems end-to-end from design through production.

Why this role is interesting

Durable execution is becoming foundational infrastructure

As AI systems become increasingly long-running and stateful, durable runtimes are emerging as a core infrastructure primitive.

You'll help build the platform enabling that transition.

Build infrastructure from first principles

Restate reimagines durable execution as a lightweight, self-contained runtime:

  • Single Rust binary deployment

  • Custom storage layer

  • Low-latency orchestration engine

  • Integrated observability

  • No external database dependency

Work on systems where reliability matters

Restate powers workloads running inside Fortune 500 enterprises, financial institutions, and AI-native startups building production-grade agents and workflows.

The systems you build operate in environments where correctness and operational simplicity are mission critical.

Work with exceptional engineers

You'll work directly with engineers who built foundational distributed systems at global scale, including creators of Apache Flink and leaders from Meta's messaging infrastructure teams.

What we're looking for

Must have
  • Experience operating large-scale SaaS or platform infrastructure in production.

  • Deep understanding of cloud infrastructure and Kubernetes-based systems.

  • Experience with infrastructure-as-code and cloud automation.

  • Strong software engineering skills in Rust, Go, or C++.

  • Comfort owning systems from design through operations.

  • Ability to thrive in ambiguity and operate with significant autonomy.

Nice to have
  • Kubernetes operator development.

  • Cluster API, Crossplane, or Terraform experience.

  • Experience with multi-tenant control planes.

  • Experience operating infrastructure in enterprise or compliance-sensitive environments.

  • Familiarity with durable execution systems or workflow runtimes.

This role may not be a fit if
  • You prefer working on runtime internals rather than cloud infrastructure.

  • You prefer architecture and review work over hands-on ownership.

  • You are uncomfortable operating production infrastructure or participating in on-call rotations.

Location
  • Fully remote within the United States.

  • East Coast candidates are preferred to improve on-call coverage.

  • Minimal travel requirements.

Skills Required

  • Strong cloud infrastructure background with deep understanding of major cloud provider architectures
  • Experience with infrastructure-as-code and cloud orchestration, particularly Kubernetes-based stateful workloads
  • Software engineering skills in a systems language (Rust, Go, C++)
  • Willingness and ability to learn Rust on the job
  • Prior experience operating production SaaS or platform infrastructure
  • Comfortable taking ownership end-to-end from design through production operations; hands-on mentality
  • Participate in the cloud on-call rotation
  • US-based (fully remote)
  • Prior experience with Restate or durable execution specifically
  • Deep enterprise procurement/compliance navigation
  • Kubernetes operator development and IaC systems like Cluster API, Crossplane, or Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
6 Employees
Year Founded: 2022

What We Do

Restate provides a lightweight runtime that enables developers to build innately resilient distributed applications. By turning AI agents, workflows, and backend services into durable processes, Restate removes the complexity of managing failure mechanics, allowing developers to focus on their business logic rather than the underlying infrastructure's resilience.

Similar Jobs

Remote
USA
30 Employees
164K-220K Annually
Remote
United States
16 Employees

Megazone Cloud Logo Megazone Cloud

Senior Cloud Engineer

Artificial Intelligence • Cloud • Consulting • Cybersecurity
In-Office or Remote
3 Locations
2700 Employees
145K-165K Annually

Helix Logo Helix

Infrastructure Engineer

Healthtech • Biotech
Remote or Hybrid
14 Locations
200 Employees
148K-190K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account