Engineer, Supercomputing & Distributed Systems

Posted Yesterday
Be an Early Applicant
San Francisco, CA, USA
In-Office
Entry level
Artificial Intelligence • Software • Design • Generative AI
The Role
The role involves designing and managing distributed systems infrastructure for AI workloads, optimizing data pipelines, and collaborating on machine learning projects.
Summary Generated by Built In

About Krea

At Krea, we are building next-generation AI creative tools.

We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it.

We believe AI is a new medium that allows us to express ourselves through various formats—text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium.

Supercomputing / AI Infra at Krea

We build and operate the infrastructure for Krea's research and inference. Distributed training, 1000+ K8s GPU clusters, petabyte scale data pipelines, etc. We build a lot of this from scratch — custom distributed datastores, job orchestration systems, and streaming pipelines that replace tools like Kafka and Ray for modern AI workloads at scale.

Example projects:

Distributed data systems

  • Design multi-stage pipelines that turn petabytes of raw data into clean, annotated datasets

  • Run classification models on billions of images

  • Deploy and combine LLMs to caption massive multimedia data

GPU infrastructure

  • Manage distributed training and inference on 1000+ GPU Kubernetes clusters

  • Solve orchestration and scaling for large-scale GPU job processing

  • Scale workloads and research between clusters in multiple datacenters

Distributed training

  • Profile and optimize dataloaders streaming thousands of images per second

  • Profile and debug InfiniBand networking on huge training runs

  • Build fault tolerance systems for large-scale pretraining

  • Collaborate with researchers on evolving RL infrastructure

Applied ML pipelines

  • Find clean scenes in millions of videos using distributed shot-boundary detection

  • Customize and train models to filter billions of images for questions like "is this a screenshot?"

  • Build the systems that bridge raw cluster capacity and research output

Who we're looking for:

Systems people. If you've read a blog post about InfiniBand debugging or building a custom distributed database and thought "I want to do that" — this is that team.

You'll spend your time working heavily with Python, Kubernetes, Torch, and data tools like DuckDB, Arrow, etc. It's OK if you don't have K8s or ML experience — the main thing we hire for is an intuition for distributed systems, and a great mental model of how systems interact and function under different conditions.

Strong candidates may have experience with…

  • Python, PyArrow, DuckDB, SQL, massive relational databases, PyTorch, Pandas, NumPy…

  • Kubernetes

  • Designing and implementing large-scale ETL systems

  • Fundamental knowledge of containerization, operating systems, file-systems, and networking

  • Distributed systems design

  • Distributed training systems (NCCL, InfiniBand, RDMA)

  • Streaming and event processing systems (Kafka, Pulsar, or similar)

  • PyTorch internals, custom dataloaders, and training infrastructure

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
1,019 Employees
Year Founded: 2007

What We Do

Krea is a generative AI creative platform offering AI tools for creatives to generate, edit, and enhance images, video, and 3D content. It uses artificial intelligence to generate visuals tailored to unique styles, concepts, or products.

Similar Jobs

Doximity Logo Doximity

Senior Software Engineer

Healthtech • Information Technology • Mobile • Productivity • Software • Analytics • Telehealth
Easy Apply
In-Office or Remote
2 Locations
740 Employees
164K-220K Annually

SoFi Logo SoFi

Director, Internal Communications

Fintech • Mobile • Software • Financial Services
Easy Apply
Remote or Hybrid
United States
4500 Employees
154K-264K Annually

SailPoint Logo SailPoint

Deal Pricing Manager

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
United States
2461 Employees
73K-123K Annually

SailPoint Logo SailPoint

Director, Product Management for AI/Data Platform

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
United States
2461 Employees
173K-291K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account