Staff Software Engineer, ML Performance & Systems

Reposted 3 Days Ago
San Francisco, CA, USA
In-Office
180K-250K Annually
Mid level
Cloud • Digital Media • Information Technology
Generative media platform for developers.
The Role
Design and implement model serving architectures, develop monitoring tools, and optimize performance for generative media models working with Applied ML teams.
Summary Generated by Built In

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.

As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.

About this role: 

Help fal maintain its frontier position on model performance for generative media models. Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage. Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Key Responsibilities:

  • Help fal maintain its frontier position on model performance for generative media models.

  • Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage.

  • Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.

  • Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Requirements:

  • Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.

  • Deep understanding of cutting edge ML infrastructure stack (anything from PyTorch, TensorRT, TransformerEngine to Nsight), including model compilation, quantization, and serving architectures. Ideally following closely the developments in all these systems as they happen.

  • Have a fundamental view of the underlying hardware (Nvidia based systems at the moment), and when necessary go deeper into the stack to fix bottlenecks (custom GEMM kernels with CUTLASS for common shapes).

  • Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.

  • New frontier: multi-dimensional model parallelism (combining multiple parallelism techniques like TP with context parallel / sequence parallel).

  • Familiar with internals of Ring Attention, FA3, FusedMLP implementations.

What we offer at fal:
  • Interesting and challenging work

  • Competitive salary and equity

  • A lot of learning and growth opportunities

  • We offer relocation assistance to San Francisco.

  • Health, dental, and vision insurance (US)

  • Regular team events and offsite

Compensation:
  • $180,000 - $250,000 + equity + comprehensive benefits package

Location:
  • We are currently hiring in downtown San Francisco.

Skills Required

  • Strong foundation in systems programming with expertise in identifying and fixing bottlenecks
  • Deep understanding of ML infrastructure stack including model compilation, quantization, and serving architectures
  • Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming
  • Understanding of Nvidia based systems and ability to fix bottlenecks
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
73 Employees

What We Do

Generative Media Cloud

Similar Jobs

MetLife Logo MetLife

Customer Care Advocate Disability Service- Omaha NE 7.20.26

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

Airwallex Logo Airwallex

Data Science Director, Growth

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
2000 Employees

Airwallex Logo Airwallex

Customer Insights Lead

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
2000 Employees

Nexthink Logo Nexthink

Client Director- West

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software
Remote or Hybrid
San Diego, CA, USA
1200 Employees
113K-176K Annually

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account