Site Reliability Engineer (SRE)

Sorry, this job was removed at 12:11 p.m. (UTC) on Tuesday, May 05, 2026
Boulder, CO, USA
In-Office
Artificial Intelligence • Software
The Role

The Opportunity

We're hiring an experienced Site Reliability Engineer to own the reliability of the Freeplay platform and drive success for our most advanced enterprise customers. In this role, you will bridge the gap between core infrastructure engineering and high-stakes customer deployments. You won’t just be maintaining our internal SaaS environment; you will be the technical expert guiding Fortune 100 engineering teams as they deploy Freeplay into their own private clouds.

This is an exciting chance to join a fast-growing startup with a front-row seat to how AI products are being built at some of the largest and most innovative companies in the world. You’ll be hands-on with customers, learning about cutting-edge AI architectures while ensuring our platform runs flawlessly in their diverse and complex environments.

What's Freeplay?

Freeplay is the end-to-end platform for software teams to ship great AI products. We give product development teams the power to test, evaluate, monitor & optimize AI in production. Our customers use Freeplay to build better LLM features, chatbots, and agents. Today we serve leading software companies from growing startups to Fortune 100 companies.

Your Mission

Build the infrastructure that powers Freeplay and ensure successful deployments for our enterprise customers.

  • Partner with Enterprise Customers: Act as a key technical contact for our "Bring Your Own Cloud" (BYOC) deployments. You will jump on calls with customer engineering teams to guide them through installation, debug configuration issues in their VPCs, and ensure they are successful running Freeplay.

  • Own the Multi-Cloud Architecture: Help manage and improve our internal production infrastructure across AWS, GCP, and Azure ensuring high availability and seamless networking.

  • Solve the "Shipped Software" Challenge: Drive the engineering efforts to package and distribute Freeplay using tools like Helm, Replicated, and KOTS. You will help ensure our software is portable, installing as reliably in a customer's cloud environment as it does in our SaaS.

  • Master Infrastructure as Code: Drive our Terraform strategy, building modular, reusable, and secure infrastructure definitions that treat operations with the same rigor as application code.

  • Champion Observability: Implement and tune our monitoring stack (Datadog) to provide deep visibility into system health, and help customers implement similar observability for their private instances.

  • Scale Data & Messaging: Manage the stateful components of our stack, including PostgreSQL, Elasticsearch, and NATS JetStream, ensuring data integrity and performance under load.

About You

  • Experience: We are open to candidates ranging from Mid-Level (3+ years) to Senior/Staff (7+ years). We will tailor the scope and responsibilities to your expertise.

  • Customer-facing confidence. You are comfortable interacting directly with external engineering teams. You can troubleshoot a failed deployment while on a Zoom call with a client and explain complex architectural requirements clearly.

  • Production Kubernetes fluency. You are confident managing EKS/GKE/AKS clusters, debugging complex pod failures, managing ingress controllers, and handling autoscaling in production.

  • Deep Terraform expertise. You have experience structuring IaC for scale and have managed multi-environment setups.

  • Database operational experience. You aren't just an infrastructure plumber; you understand how to manage and tune databases (Postgres) and search indices (Elasticsearch) at scale.

  • Security-first thinking. You are familiar with cloud security best practices, including VPC networking, IAM/Workload Identity, and secrets management, and you can explain these concepts to security-conscious enterprise clients.

Bonus Points

  • Experience in a Solutions Engineering or Field Engineering capacity.

  • Experience with Replicated / KOTS or similar tools for packaging enterprise software for on-premise/VPC deployments.

  • Experience operating message queues like NATS, JetStream, or Kafka.

  • Background in AI/ML infrastructure or high-throughput data systems.

Compensation & Benefits

  • Competitive salary commensurate with experience, plus equity package.

  • Medical, dental, and vision insurance.

  • Premium hardware setup (MacBook, monitor, peripherals).

  • Four weeks of Paid Time Off per year (and we encourage you to take it!).

Location

We prefer candidates able to work full-time on-site in Boulder, CO, but we're open to exceptional remote candidates who can visit Boulder every 6 weeks for team collaboration.

Similar Jobs

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
Centennial, CO, USA
68000 Employees
110K-145K Annually

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

General Dynamics Information Technology Logo General Dynamics Information Technology

Site Reliability Engineer

Aerospace • Information Technology • Professional Services • Security • Software
In-Office
5 Locations
21625 Employees
128K-173K Annually

Binance Logo Binance

Senior Site Reliability Engineer

Blockchain • Fintech • Software • Cryptocurrency • Metaverse
In-Office or Remote
45 Locations
7696 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Boulder, Colorado
14 Employees
Year Founded: 2022

What We Do

A better way to build with LLMs. Bridge the gap between domain experts & developers. Prompt engineering, testing & evaluation tools for your whole team. Now in private beta.

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account