Principal ML Platform Engineer

Posted Yesterday
Be an Early Applicant
27 Locations
Remote
Expert/Leader
Artificial Intelligence
The Role
Design, build, and operate ML platform systems to train, evaluate, deploy, and serve generative models. Improve reliability, scalability, observability, and cost-efficiency of GPU and cloud workloads. Build internal tools and agent-friendly workflows, collaborate with researchers and engineers, and drive platform architecture and automation.
Summary Generated by Built In

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

We’re looking for a Principal Engineer to join the ML Platform team at Synthesia.

Our team builds and operates the systems that allow researchers and product teams to train, serve, and deploy generative models reliably and efficiently. This includes research infrastructure, production serving systems, internal tooling, and the platform interfaces that connect them. A growing part of our mission is making these systems more automation-friendly and agent-oriented, so that workflows can increasingly be operated through reliable tooling rather than manual effort.

We’re looking for a strong generalist with a systems mindset:

  • someone who is comfortable working across infrastructure, backend systems, and tooling, and who has seen ML systems in practice.

  • this is not a pure ML Engineer role. We’re especially interested in people who think deeply about reliability, scalability, performance, and resource efficiency in complex production environments.

This is a hands-on IC role with significant ownership. You’ll help shape how our ML platform evolves as we scale the number of models, workloads, tools and teams relying on it.

What you’ll do
  • Design and improve the platform systems that support model training, evaluation, and production serving.

  • Build infrastructure and tooling that make ML workloads more reliable, scalable, and cost-efficient.

  • Develop internal tools and workflows that are easy to operate both by humans and by agents.

  • Work on the architecture behind how models are deployed, served, and operated across research and product environments.

  • Improve how we schedule, monitor, and debug workloads running on GPUs and cloud infrastructure.

  • Develop internal tools and abstractions and agentic systems that reduce operational overhead for researchers and engineers.

  • Drive improvements across observability, automation, reliability, and developer experience.

  • Collaborate closely with researchers and product engineers to understand pain points and turn them into robust platform capabilities.

  • Contribute to technical direction and make pragmatic architectural tradeoffs as the platform grows.

You’ll thrive in this role if you have
  • Strong experience building or operating production systems with a focus on reliability, scalability, and maintainability.

  • A systems mindset: you naturally think in terms of bottlenecks, failure modes, interfaces, resource usage, and long-term operability.

  • Solid hands-on experience with cloud infrastructure, Linux, and infrastructure automation.

  • Experience with Kubernetes and operating distributed workloads in production.

  • Strong coding skills, ideally in Python or similar languages used for backend systems and tooling.

  • Strong judgment around where automation adds leverage, and where human control and reliability matter most.

  • Experience building internal platforms, developer tooling, or infrastructure abstractions used by other engineers.

  • Comfort working in ambiguous environments and taking ownership of open-ended technical problems.

  • A pragmatic approach: you care about solving the right problem well, not over-engineering.

Particularly relevant experience
  • Operating ML infrastructure or model serving systems in production.

  • Supporting research or data-intensive workloads.

  • Working with GPU-based systems or other performance-sensitive infrastructure.

  • Experience with observability and debugging in distributed systems.

  • Familiarity with Terraform, Datadog, GitHub Actions, or similar tools.

Bonus points for
  • Experience building agentic or LLM-powered internal tools.

  • Experience with workflow orchestration systems such as Temporal.

  • Experience working at the boundary between research and production engineering.

  • Familiarity with performance optimization, scheduling, or resource allocation problems.

  • Experience building lightweight product or developer-facing tools.

Skills Required

  • Experience building or operating production systems focused on reliability, scalability, and maintainability.
  • Systems mindset: identify bottlenecks, failure modes, interfaces, and resource usage.
  • Hands-on experience with cloud infrastructure, Linux, and infrastructure automation.
  • Experience with Kubernetes and operating distributed workloads in production.
  • Strong coding skills in Python or similar backend languages.
  • Experience building internal platforms, developer tooling, or infrastructure abstractions.
  • Operating ML infrastructure or model serving systems in production.
  • Supporting research or data-intensive workloads.
  • Experience with GPU-based systems or performance-sensitive infrastructure.
  • Experience with observability and debugging in distributed systems.
  • Familiarity with Terraform, Datadog, and GitHub Actions.
  • Experience building agentic or LLM-powered internal tools, and workflow orchestration (e.g., Temporal).

Synthesia Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Synthesia and has not been reviewed or approved by Synthesia.

  • Leave & Time Off Breadth Leave benefits are positioned as generous, including substantial annual leave plus public holidays and an additional long-tenure sabbatical with a cash award. Flexible working hours and hybrid/remote arrangements further strengthen perceived time-off and flexibility value.
  • Healthcare Strength Health coverage is described as robust, including private medical insurance with mental health support and dental/vision coverage. Added features like cashback options and gym discounts extend the package beyond basic medical coverage.
  • Equity Value & Accessibility Equity is framed as a meaningful part of total rewards through a generous stock options plan and a recent employee liquidity event tied to a major funding round. This can materially improve the perceived value and accessibility of long-term incentives versus options that remain purely paper value.

Synthesia Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
428 Employees
Year Founded: 2017

What We Do

Synthesia is the #1 rated AI video communications platform. Thousands of companies use it to create videos in 140 languages, saving up to 80% of their time and budget. 👉 Trusted by Zoom, Xerox, Teleperformance, Amazon and mor

Similar Jobs

GitLab Logo GitLab

Marketing Manager

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
27 Locations
2500 Employees

GitLab Logo GitLab

Security Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
30 Locations
2500 Employees

GitLab Logo GitLab

Senior Back-end Engineer

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
31 Locations
2500 Employees
118K-252K Annually

GitLab Logo GitLab

Business Development Representative

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
28 Locations
2500 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account