Data Engineer, Generative AI / Large Models

Posted 3 Days Ago
San Francisco, CA
In-Office
180K-300K Annually
Mid level
Artificial Intelligence • Software
The Role
The Data Engineer will develop scalable infrastructure for large datasets, manage data transfers, optimize workflows, and implement ML models for data processing.
Summary Generated by Built In

At Black Forest Labs, we’re on a mission to advance the state of the art in generative deep learning for media, building powerful, creative, and open models that push what’s possible.

Born from foundational research, we continuously create advanced infrastructure to transform ideas into images and videos.

Our team pioneered Latent Diffusion, Stable Diffusion, and FLUX.1 – milestones in the evolution of generative AI. Today, these foundations power millions of creations worldwide, from individual artists to enterprise applications.

We are looking for a Data Engineer to help create large-scale datasets that power the next generation of generative models.

Role and Responsibilities: 

  • Develop and maintain scalable infrastructure for large-scale image and video data acquisition
  • Manage and coordinate data transfers from various licensing partners
  • Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
  • Implement scalable and efficient tools to visualize, cluster, and deeply understand the data
  • Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently
  • Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
  • Getting training data from alternative sources such as user preferences into trainable format
  • Work closely in the model development loop to update data as necessitated by the training trajectory

What we look for:

  • Proficiency in Python and various file systems for data intensive manipulation and analysis
  • Familiarity with cloud computing platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing
  • Experience with image and video processing libraries (e.g., OpenCV, FFmpeg)
  • Demonstrated ability to optimize and parallelize data processing workflows across CPUs and GPUs
  • Familiarity with data annotation and captioning processes for ML training datasets
  • Knowledge of machine learning techniques for data cleaning and preprocessing

Nice to have:

  • Background or keen interest in developing large-scale data acquisition systems
  • Experience with natural language processing for image/video captioning
  • Experience with data deduplication techniques at scale
  • Experience with big data processing frameworks (e.g., Apache Spark, Hadoop)
  • Experience shipping a SOTA model
  • Understanding of ethical considerations in data collection and usage

Base Annual Salary: $180,000 - $300,000 USD 

Top Skills

Spark
AWS
Azure
Ffmpeg
GCP
Hadoop
Hpc
Opencv
Python
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
27 Employees

What We Do

A new era of creation

Similar Jobs

Navan Logo Navan

Business Systems Analyst

Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
Easy Apply
Hybrid
2 Locations
3000 Employees
94K-165K Annually

CoreWeave Logo CoreWeave

Counsel

Cloud • Information Technology • Machine Learning
In-Office
4 Locations
1450 Employees
161K-237K Annually

Instawork Logo Instawork

Revenue Operations and Analytics Associate

eCommerce • Food • HR Tech • Information Technology • Mobile • Retail • Software
Easy Apply
Hybrid
San Francisco, CA, USA
400 Employees
100K-140K Annually
Easy Apply
Hybrid
3 Locations
2674 Employees
129K-189K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account