SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content.
We’re seeking a Data Engineer to build the pipelines and infrastructure that fuel our large-scale model training. As the first engineer focused on data, you’ll shape the backbone of how we handle terabytes of multimodal training data (images, video, and 3D). This role is ideal for someone who thrives at the intersection of data systems and machine learning—designing reliable, scalable, and efficient ways to get high-quality data into cutting-edge training runs.
Responsibilities
Architect and manage data infrastructure for large-scale ML training datasets (e.g., Apache, Iceberg, Parquet, Spark).
Build and operate ingestion pipelines for multimodal data (e.g., images, videos, 3D), including metadata generation and quality signals.
Design data loaders, caching, and serving strategies optimized for ML training.
Develop tools for dataset inspection, experiment tracking, and evaluation workflows.
Partner closely with ML researchers to ensure infrastructure scales with training demands.
Uphold code quality and best practices in testing, CI/CD, and reproducibility.
Key Qualifications:
3+ years professional software/data engineering experience with production systems.
Proven experience in large-scale data processing for ML training (not just analytics/BI).
Hands-on with distributed data frameworks (e.g., Spark, Beam, Cloud SQL) and modern data formats (Parquet, Iceberg).
Proficiency in cloud platforms (AWS, GCP, or Azure).
Strong Python development skills, including testing and code quality.
Experience building and maintaining CI/CD pipelines.
Preferred Qualifications
Familiarity with ML frameworks (e.g., PyTorch, TensorFlow).
Experience preparing multimodal datasets (images, video, 3D) for ML pipelines.
Background in computer vision or 3D reconstruction (e.g., Structure-from-Motion).
Interest in AI-assisted developer tools (Cursor, Windsurf, etc.).
At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.
Top Skills
What We Do
SpAItial is pioneering Spatial Foundation Models (SFMs), a groundbreaking AI paradigm designed to generate and reason about the appearance and physics of real and imagined environments. SFMs possess an intrinsic understanding of space-time, enabling transformative shifts in applications at the intersection of virtual and physical worlds.
Unlike existing generative AI technologies such as LLMs, image, or video models, SFMs operate natively in physical space. This significantly advances their cognitive capabilities, which mimics human understanding. SFMs promise to revolutionize various applications across industries, from creating immersive virtual worlds for gaming and entertainment, to advancing CAD engineering and construction, to powering next-generation VR/AR experiences, and enabling sophisticated, physically-intelligent robotics.