Responsibilities
- Design and implement scalable, production-grade pipelines for data ingestion, transformation, storage, and retrieval from vehicle fleets and simulation environments.
- Build internal tools and services for data labeling, curation, indexing, and cataloging across large and diverse datasets.
- Collaborate with ML researchers, autonomy engineers, and data scientists to design schemas and APIs that power model training, evaluation, and debugging.
- Develop and maintain feature stores, metadata systems, and versioning infrastructure for structured and unstructured data.
- Support the generation and integration of synthetic datasets with real-world logs to enable hybrid training and simulation workflows.
- Optimize pipelines for cost, latency, and traceability, ensuring reproducibility and consistency across environments.
- Partner with simulation and cloud platform teams to automate workflows for closed-loop testing, scenario mining, and performance analytics.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- 8+ years of experience building data-intensive software systems, ideally in robotics, autonomous driving, or large-scale ML environments.
- Proficient in Python, SQL, and familiar with C++.
- Experience designing ETL pipelines using modern frameworks (e.g., Apache Spark, Flyte, Union).
- Strong knowledge of cloud-native architectures, including AWS services (e.g., S3, or equivalents (Google Cloud platform)
- Familiarity with sensor data types (camera, lidar, radar, GPS/IMU) and common data serialization formats (e.g., protobuf. ROS2bag, MCAP).
- Deep understanding of data quality, observability, and lineage in high-volume systems.
- Track record of building reliable and performant infrastructure that supports both ad-hoc exploration and repeatable production workflows.
Bonus Qualifications
- Experience in AD/ADAS, robotics, or autonomous systems — especially handling perception or planning datasets.
- Familiarity with ML pipeline orchestration frameworks (e.g. Kubeflow, SageMaker, etc).
- Experience working with temporal or spatial data, including geospatial indexing and time-series alignment.
- Exposure to synthetic data generation, simulation logging, or scenario replay pipelines.
- Strong software engineering fundamentals, CI/CD, testing, code review, and service deployment best practices.
- Experience collaborating with cross-functional, distributed teams across research and production orgs.
Top Skills
What We Do
Toyota Research Institute (TRI) envisions a future where Toyota products, enabled by TRI technology, dramatically improve quality of life for individuals and society. To achieve its Vision, TRI’s Mission is to create new tools and capabilities focused on improving the human condition through research in Energy & Materials, Human-centered AI, Human Interactive Driving, and Robotics.
We’re on a mission to improve the quality of human life. To lead this transformative shift, we are looking for the world's best talent -- people who enjoy solving tough problems while having fun doing it.
Why Work With Us
TRI is fueled by a diverse and inclusive community of people with unique backgrounds, education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture.
Gallery
