Staff Software Engineer, ML Performance & Systems

Reposted 19 Days Ago
Easy Apply
San Francisco, CA
In-Office
180K-250K Annually
Mid level
Cloud • Digital Media • Information Technology
Generative media platform for developers.
The Role
Design and implement model serving architectures, develop monitoring tools, and optimize performance for generative media models working with Applied ML teams.
Summary Generated by Built In

Help fal maintain its frontier position on model performance for generative media models. Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage. Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Key Responsibilities:

  • Help fal maintain its frontier position on model performance for generative media models.

  • Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage.

  • Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.

  • Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Requirements:

  • Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.

  • Deep understanding of cutting edge ML infrastructure stack (anything from PyTorch, TensorRT, TransformerEngine to Nsight), including model compilation, quantization, and serving architectures. Ideally following closely the developments in all these systems as they happen.

  • Have a fundamental view of the underlying hardware (Nvidia based systems at the moment), and when necessary go deeper into the stack to fix bottlenecks (custom GEMM kernels with CUTLASS for common shapes).

  • Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.

  • New frontier: multi-dimensional model parallelism (combining multiple parallelism techniques like TP with context parallel / sequence parallel).

  • Familiar with internals of Ring Attention, FA3, FusedMLP implementations.

What we offer at fal:
  • Interesting and challenging work

  • Competitive salary and equity

  • A lot of learning and growth opportunities

  • We offer visa sponsorship and will help you relocate to San Francisco.

  • Health, dental, and vision insurance (US)

  • Regular team events and offsite

Compensation:
  • $180,000 - $250,000 + equity + comprehensive benefits package

Location:
  • We are currently hiring in downtown San Francisco.

Top Skills

Nsight
PyTorch
Tensorrt
Transformerengine
Triton
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
73 Employees

What We Do

Generative Media Cloud

Similar Jobs

Anduril Logo Anduril

Technical Operations Engineer, Bolt

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
113K-1M Annually

CrowdStrike Logo CrowdStrike

Infrastructure Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

CrowdStrike Logo CrowdStrike

Patent Attorney (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
USA
10000 Employees

CrowdStrike Logo CrowdStrike

Security Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
2 Locations
10000 Employees

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account