Staff Software Engineer, ML Performance & Systems

Reposted 7 Days Ago
San Francisco, CA
In-Office
180K-250K
Mid level
Cloud • Digital Media • Information Technology
Generative media platform for developers.
The Role
Design and implement model serving architectures, develop monitoring tools, and optimize performance for generative media models working with Applied ML teams.
Summary Generated by Built In

Help fal maintain its frontier position on model performance for generative media models. Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage. Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Key Responsibilities:

  • Help fal maintain its frontier position on model performance for generative media models.

  • Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage.

  • Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.

  • Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Requirements:

  • Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.

  • Deep understanding of cutting edge ML infrastructure stack (anything from PyTorch, TensorRT, TransformerEngine to Nsight), including model compilation, quantization, and serving architectures. Ideally following closely the developments in all these systems as they happen.

  • Have a fundamental view of the underlying hardware (Nvidia based systems at the moment), and when necessary go deeper into the stack to fix bottlenecks (custom GEMM kernels with CUTLASS for common shapes).

  • Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.

  • New frontier: multi-dimensional model parallelism (combining multiple parallelism techniques like TP with context parallel / sequence parallel).

  • Familiar with internals of Ring Attention, FA3, FusedMLP implementations.

What we offer at fal:
  • Interesting and challenging work

  • Competitive salary and equity

  • Employee-friendly equity terms (early exercise, extended exercise)

  • A lot of learning and growth opportunities

  • We offer visa sponsorship and will help you relocate to San Francisco.

  • Health, dental, and vision insurance (US)

  • Regular team events and offsite

Compensation:
  • $180,000 - $250,000 + equity + comprehensive benefits package

Location:
  • We are currently hiring in downtown San Francisco.

Top Skills

Nsight
PyTorch
Tensorrt
Transformerengine
Triton
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
73 Employees

What We Do

Generative Media Cloud

Similar Jobs

Anduril Logo Anduril

Senior Director of Design and Construction

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
254K-336K Annually

General Motors Logo General Motors

Engineering Manager

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
2 Locations
165000 Employees
200K-285K Annually

General Motors Logo General Motors

Machine Learning Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
2 Locations
165000 Employees
170K-240K Annually

General Motors Logo General Motors

Human Resources Manager

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
2 Locations
165000 Employees
135K-211K Annually

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account