Freeplay.AI

Site Reliability Engineer (SRE)

Freeplay.AI

Site Reliability Engineer (SRE)

Sorry, this job was removed at 12:11 p.m. (UTC) on Tuesday, May 05, 2026

Boulder, CO, USA

In-Office

Artificial Intelligence • Software

The Role

The Opportunity

We're hiring an experienced Site Reliability Engineer to own the reliability of the Freeplay platform and drive success for our most advanced enterprise customers. In this role, you will bridge the gap between core infrastructure engineering and high-stakes customer deployments. You won’t just be maintaining our internal SaaS environment; you will be the technical expert guiding Fortune 100 engineering teams as they deploy Freeplay into their own private clouds.

This is an exciting chance to join a fast-growing startup with a front-row seat to how AI products are being built at some of the largest and most innovative companies in the world. You’ll be hands-on with customers, learning about cutting-edge AI architectures while ensuring our platform runs flawlessly in their diverse and complex environments.

What's Freeplay?

Freeplay is the end-to-end platform for software teams to ship great AI products. We give product development teams the power to test, evaluate, monitor & optimize AI in production. Our customers use Freeplay to build better LLM features, chatbots, and agents. Today we serve leading software companies from growing startups to Fortune 100 companies.

Your Mission

Build the infrastructure that powers Freeplay and ensure successful deployments for our enterprise customers.

Partner with Enterprise Customers: Act as a key technical contact for our "Bring Your Own Cloud" (BYOC) deployments. You will jump on calls with customer engineering teams to guide them through installation, debug configuration issues in their VPCs, and ensure they are successful running Freeplay.
Own the Multi-Cloud Architecture: Help manage and improve our internal production infrastructure across AWS, GCP, and Azure ensuring high availability and seamless networking.
Solve the "Shipped Software" Challenge: Drive the engineering efforts to package and distribute Freeplay using tools like Helm, Replicated, and KOTS. You will help ensure our software is portable, installing as reliably in a customer's cloud environment as it does in our SaaS.
Master Infrastructure as Code: Drive our Terraform strategy, building modular, reusable, and secure infrastructure definitions that treat operations with the same rigor as application code.
Champion Observability: Implement and tune our monitoring stack (Datadog) to provide deep visibility into system health, and help customers implement similar observability for their private instances.
Scale Data & Messaging: Manage the stateful components of our stack, including PostgreSQL, Elasticsearch, and NATS JetStream, ensuring data integrity and performance under load.

About You

Experience: We are open to candidates ranging from Mid-Level (3+ years) to Senior/Staff (7+ years). We will tailor the scope and responsibilities to your expertise.
Customer-facing confidence. You are comfortable interacting directly with external engineering teams. You can troubleshoot a failed deployment while on a Zoom call with a client and explain complex architectural requirements clearly.
Production Kubernetes fluency. You are confident managing EKS/GKE/AKS clusters, debugging complex pod failures, managing ingress controllers, and handling autoscaling in production.
Deep Terraform expertise. You have experience structuring IaC for scale and have managed multi-environment setups.
Database operational experience. You aren't just an infrastructure plumber; you understand how to manage and tune databases (Postgres) and search indices (Elasticsearch) at scale.
Security-first thinking. You are familiar with cloud security best practices, including VPC networking, IAM/Workload Identity, and secrets management, and you can explain these concepts to security-conscious enterprise clients.

Bonus Points

Experience in a Solutions Engineering or Field Engineering capacity.
Experience with Replicated / KOTS or similar tools for packaging enterprise software for on-premise/VPC deployments.
Experience operating message queues like NATS, JetStream, or Kafka.
Background in AI/ML infrastructure or high-throughput data systems.

Compensation & Benefits

Competitive salary commensurate with experience, plus equity package.
Medical, dental, and vision insurance.
Premium hardware setup (MacBook, monitor, peripherals).
Four weeks of Paid Time Off per year (and we encourage you to take it!).

Location

We prefer candidates able to work full-time on-site in Boulder, CO, but we're open to exceptional remote candidates who can visit Boulder every 6 weeks for team collaboration.

View all jobs at Freeplay.AI

View Freeplay.AI Profile

Report Job

Similar Jobs

NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

Remote or Hybrid

Centennial, CO, USA

68000 Employees

110K-145K Annually

Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning

Easy Apply

Remote or Hybrid

200 Employees

200K-230K Annually

General Dynamics Information Technology

Site Reliability Engineer

Aerospace • Information Technology • Professional Services • Security • Software

In-Office

21625 Employees

128K-173K Annually

Binance

Senior Site Reliability Engineer

Blockchain • Fintech • Software • Cryptocurrency • Metaverse

In-Office or Remote

7696 Employees

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Boulder, Colorado

14 Employees

Year Founded: 2022

What We Do

A better way to build with LLMs. Bridge the gap between domain experts & developers. Prompt engineering, testing & evaluation tools for your whole team. Now in private beta.