Machine Learning Data Engineer, Replica Pipelines

Posted 15 Days Ago
Be an Early Applicant
Karlsruhe, Baden-Württemberg
Hybrid
Mid level
Artificial Intelligence • Computer Vision • Machine Learning
The Role
The role involves building and scaling data pipelines, ensuring efficient data flow for ML model training and evaluation, and collaborating on data needs.
Summary Generated by Built In
Parallel Domain is building the world’s most advanced simulation and digital twin platform for autonomy, robotics, and computer vision. Our Replica product creates large-scale, photorealistic digital twins of real-world environments used for testing, validation, and development of autonomous systems.

About the role:

  • We are hiring a Machine Learning Data Engineer responsible for building and scaling the data pipelines that support Replica and ML model development. You will ensure that data flows efficiently from raw customer inputs through validated, structured formats suitable for training, evaluation, and production systems.

What you'll do:

  • Own data ingestion: Build reliable pipelines to normalize and validate customer and synthetic data.
  • Define data standards: Create schemas, validation checks, and quality metrics for Replica datasets.
  • Build curation tooling: Implement tools for dataset filtering, versioning, and annotation support.
  • Enable ML workflows: Generate high-quality data feeds for training and evaluation across ML models.

What you’ll bring:

  • Data engineering experience: Proven experience building scalable data pipelines and tooling.
  • ML-aware engineering: Understanding of how data is used in model training and evaluation. 
  • 3D Foundations: Practical experience with 3D concepts, geometry, and the linear algebra principles underpinning computer vision (e.g., projections, transformations)
  • Technical skills: Strong Python proficiency and comfort with large datasets.
  • Collaborative mindset: Experience working closely with ML engineers on data needs.

What will help you stand out:

  • Advanced degree: MS or PhD in ML, computer vision, robotics, or related field.
  • Cloud/infra experience: Familiarity with cloud storage and distributed processing frameworks.
  • Robotics data knowledge: Experience handling camera, lidar, or radar data 
  • Visualization tools experience: Familiarity with data visualization systems like Foxglove, Rerun, or Voxel51
  • MLOps tooling exposure: Experience with dataset versioning, preprocessing automation, or training pipeline orchestration.

What we offer:

  • Competitive compensation: Salary dependent on your skills, qualifications, experience, and location.
  • Impactful work: The chance to contribute to the advancement of autonomous systems and AI.
  • Collaborative culture: A dynamic and supportive work environment where your ideas are valued.
  • Professional growth: Opportunities to learn and develop your skills in a cutting-edge field.

If you're passionate about machine learning, 3D reconstruction, generative AI, and the future of autonomous systems, we'd love to hear from you. Apply today and help us revolutionize the world of AI!

This position is available in Vancouver, BC and Karlsruhe DE.

Top Skills

Cloud Storage
Data Visualization Systems
Dataset Versioning
Distributed Processing Frameworks
Machine Learning
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
67 Employees
Year Founded: 2017

What We Do

Training and testing autonomous systems in the real world is a slow, expensive and cumbersome process. Parallel Domain is the smartest way to prepare both your machines and human operators for the real world, while minimizing the time and miles spent there. Connect to the Parallel Domain API and tap into the power of synthetic data to accelerate your autonomous system development.

Parallel Domain works with perception, machine learning, data operations, and simulation teams at autonomous systems companies, from autonomous vehicles to delivery drones. Our platform generates synthetic labeled data sets, simulation worlds, and controllable sensor feeds so they can develop, train, and test their algorithms safely before putting these systems into the real word.

#syntheticdata #autonomy #AI #computervision #AV #ADAS #machinelearning

Similar Jobs

Capco Logo Capco

Senior AI Delivery Lead for Continental Europe

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
Germany
6000 Employees

SailPoint Logo SailPoint

Technical Advisor

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
Germany
2461 Employees

CrowdStrike Logo CrowdStrike

Regional Sales Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
Germany
10000 Employees
8-8 Annually

CrowdStrike Logo CrowdStrike

Regional Sales Director, Alpine (Remote)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
3 Locations
10000 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account