Data Engineer – ML Training Infrastructure

Reposted 4 Days Ago
Be an Early Applicant
2 Locations
In-Office
Mid level
Artificial Intelligence • Information Technology • Software • Generative AI
The Role
The Big Data Engineer will architect and build data infrastructure, develop cloud-based processing pipelines, and ensure code quality for ML projects.
Summary Generated by Built In

SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content.

We’re seeking a Data Engineer to build the pipelines and infrastructure that fuel our large-scale model training. As the first engineer focused on data, you’ll shape the backbone of how we handle terabytes of multimodal training data (images, video, and 3D). This role is ideal for someone who thrives at the intersection of data systems and machine learning—designing reliable, scalable, and efficient ways to get high-quality data into cutting-edge training runs.

Responsibilities

  • Architect and manage data infrastructure for large-scale ML training datasets (e.g., Apache, Iceberg, Parquet, Spark).

  • Build and operate ingestion pipelines for multimodal data (e.g., images, videos, 3D), including metadata generation and quality signals.

  • Design data loaders, caching, and serving strategies optimized for ML training.

  • Develop tools for dataset inspection, experiment tracking, and evaluation workflows.

  • Partner closely with ML researchers to ensure infrastructure scales with training demands.

  • Uphold code quality and best practices in testing, CI/CD, and reproducibility.

Key Qualifications:

  • 3+ years professional software/data engineering experience with production systems.

  • Proven experience in large-scale data processing for ML training (not just analytics/BI).

  • Hands-on with distributed data frameworks (e.g., Spark, Beam, Cloud SQL) and modern data formats (Parquet, Iceberg).

  • Proficiency in cloud platforms (AWS, GCP, or Azure).

  • Strong Python development skills, including testing and code quality.

  • Experience building and maintaining CI/CD pipelines.

Preferred Qualifications

  • Familiarity with ML frameworks (e.g., PyTorch, TensorFlow).

  • Experience preparing multimodal datasets (images, video, 3D) for ML pipelines.

  • Background in computer vision or 3D reconstruction (e.g., Structure-from-Motion).

  • Interest in AI-assisted developer tools (Cursor, Windsurf, etc.).

At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.

Top Skills

Airflow
Apache Iceberg
AWS
Azure
GCP
Parquet
Pyspark
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
14 Employees
Year Founded: 2024

What We Do

SpAItial is pioneering Spatial Foundation Models (SFMs), a groundbreaking AI paradigm designed to generate and reason about the appearance and physics of real and imagined environments. SFMs possess an intrinsic understanding of space-time, enabling transformative shifts in applications at the intersection of virtual and physical worlds.
Unlike existing generative AI technologies such as LLMs, image, or video models, SFMs operate natively in physical space. This significantly advances their cognitive capabilities, which mimics human understanding. SFMs promise to revolutionize various applications across industries, from creating immersive virtual worlds for gaming and entertainment, to advancing CAD engineering and construction, to powering next-generation VR/AR experiences, and enabling sophisticated, physically-intelligent robotics.

Similar Jobs

Bose Logo Bose

Marketing Manager

Automotive • eCommerce • Hardware • Music • Retail • Software • Wearables
In-Office
Munich, Bayern, DEU

Altium Logo Altium

Strategic Account Manager

Cloud • Enterprise Web • Software • Analytics • Design
Remote or Hybrid
3 Locations

Tulip Logo Tulip

Enterprise Account Executive

Enterprise Web • Hardware • Internet of Things • Software
Easy Apply
Hybrid
2 Locations

monday.com Logo monday.com

Customer Success Manager

Productivity • Sales • Software
Hybrid
München, Bayern, DEU

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account