Senior Site Reliability Enigneer

Posted 3 Hours Ago
Be an Early Applicant
Hiring Remotely in US
Remote
Senior level
Artificial Intelligence
The Role
Own operational excellence for cloud infrastructure: run incident management, improve reliability through automation, own a platform domain (e.g., Kubernetes, Temporal, observability), manage vendor and cost relationships, and deliver measurable reductions in incidents and costs within 12 months.
Summary Generated by Built In

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

Remote (US East Coast preferred, for timezone coverage)

About the team

Cloud Infrastructure owns the platform every Synthesia product runs on — AWS, Kubernetes, MongoDB, Temporal, our observability stack, and the vendor and cost relationships underneath them. We're a small, high-leverage team scaling toward a domain-ownership model: small groups that both build and operate the systems they're accountable for.

The role

We're hiring a dedicated SRE to take real ownership of operational excellence across Cloud Infrastructure. Today, too much critical operational knowledge — vendor relationships, cost management, and incident response — lives with one or two people. Your mission is to take genuine ownership of those domains, make them resilient to any single person, and raise the bar on how reliably we run. This is not simply a ticket-queue or keep-the-lights-on role. You'll own domains end to end: understand them deeply, operate them well, and build the automation and tooling that make them boring. We deliberately pair operational and engineering work so the role grows rather than narrows.

What you'll own

  • Incident management & operational excellence — take custody of the incident process: on-call quality, response, post-mortems, and driving down incident count, time-to-detect, and time-to-resolve.

  • Automation & reliability engineering — automate low-frequency, high-consequence operations (the certificate-renewal class of problem — rare, easy to forget, outage-causing when missed), not just the high-frequency toil. You decide what to automate based on risk and blast radius, not just time saved.

  • A platform domain — over time, deep ownership of a domain such as Temporal, observability, or Kubernetes operations, partnering with the engineers building in it.

  • Vendor & third-party management — own key external relationships and integrations (e.g. LLM API providers, third-party services), today managed manually and informally. Bring structure, automation, and bus-factor resilience.

  • FinOps — own cloud and platform cost visibility and efficiency, and the mechanics of how usage maps to billing.

What success looks like (first 12 months)

  • Critical operational knowledge is documented and shared — no single point of failure for vendor, cost, or incident response.

  • Measurable reliability gains: fewer SEV1–SEV3 incidents per quarter, faster customer-impact resolution, and a much higher share of incidents caught by monitoring before customers feel them.

  • High-risk manual processes are automated and self-documenting.

What we're looking for

  • Strong production operations experience on AWS and Kubernetes; comfortable with MongoDB and scripting/automation in Python.

  • An operations-and-reliability mindset — you take pride in systems that run quietly — paired with the instinct to engineer the problem away rather than absorb it manually.

  • Sound judgement on incidents and risk; calm and clear under pressure.

  • Influences through relationships and evidence, not escalation; comfortable owning a domain and partnering across teams.

  • Bonus: vendor/cost management exposure, Temporal, observability tooling.

How we think about this role

We don't letterbox engineers. You'll have a clear primary mission (operational excellence) but real domain ownership and the mandate to build — not a fixed lane. We expect the shape of the role to evolve as the team grows.

Skills Required

  • Production operations experience on AWS
  • Production operations experience on Kubernetes
  • Experience with MongoDB
  • Scripting and automation experience in Python
  • Incident management and on-call experience, post-mortems, incident response
  • Automation and reliability engineering skills (automating high-risk manual processes)
  • Domain ownership experience partnering across teams
  • Calm, clear judgement under pressure and ability to influence through relationships
  • Vendor and cloud cost management / FinOps exposure
  • Experience with Temporal and observability tooling

Synthesia Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Synthesia and has not been reviewed or approved by Synthesia.

  • Leave & Time Off Breadth Leave benefits are positioned as generous, including substantial annual leave plus public holidays and an additional long-tenure sabbatical with a cash award. Flexible working hours and hybrid/remote arrangements further strengthen perceived time-off and flexibility value.
  • Healthcare Strength Health coverage is described as robust, including private medical insurance with mental health support and dental/vision coverage. Added features like cashback options and gym discounts extend the package beyond basic medical coverage.
  • Equity Value & Accessibility Equity is framed as a meaningful part of total rewards through a generous stock options plan and a recent employee liquidity event tied to a major funding round. This can materially improve the perceived value and accessibility of long-term incentives versus options that remain purely paper value.

Synthesia Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
428 Employees
Year Founded: 2017

What We Do

Synthesia is the #1 rated AI video communications platform. Thousands of companies use it to create videos in 140 languages, saving up to 80% of their time and budget. 👉 Trusted by Zoom, Xerox, Teleperformance, Amazon and mor

Similar Jobs

Holitix Logo Holitix

Blockchain Engineer

Blockchain • Software • Cryptocurrency • Web3
Remote
3 Locations
60 Employees
164K-185K Annually

FinalyticsAI Logo FinalyticsAI

Business Operations Manager

Software • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
20 Employees
110K-150K Annually

Globe Life Logo Globe Life

Licensing Specialist (Remote)

Insurance • Financial Services
Remote
TX, USA
3000 Employees

Coinbase Logo Coinbase

Senior Executive Protection Agent

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
USA
4700 Employees
131K-154K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account