- Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
- Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
- Training Throughput Optimization: Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
- Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.
- Engineering Excellence: BS/MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
- Work Experience: 5-8 + years of industry experience.
- Programming Mastery: Proficient in Python, C++, or Java, with a deep understanding of high-performance concurrent programming and systems design.
- Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark.
- Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance.
- Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
- Deep understanding of data access patterns in deep learning frameworks like PyTorch, DeepSpeed, or Megatron.
- Practical experience with Vector Databases, automated labeling toolchains, or data-centric AI workflows.
- Knowledge of storage formats optimized for AI (e.g., Parquet, Lance) and high-performance file systems.
- A fun, supportive and engaging environment.
- Infrastructures and computational resources to support your work.
- Opportunity to work on cutting edge technologies with the top talents in the field.
- Opportunity to make significant impact on the transportation revolution by the means of advancing autonomous driving.
- Competitive compensation package.
- Snacks, lunches, dinners, and fun activities.
What We Do
Xpeng Motors is a leading Chinese electric vehicle and technology company that designs and manufactures intelligent automobiles that are seamlessly integrated with the Internet and utilize the latest advances in artificial intelligence. Focusing on China’s young and tech-savvy consumer base, XPENG Motors strives to offer smart mobility solutions with technology innovation and cutting-edge R&D. The company’s initial backers include its CEO & Chairman He Xiaopeng, the founder of UCWeb Inc. and a former Alibaba executive. It was co-founded in 2014 by Henry Xia and He Tao, former senior executives at Guangzhou Auto with expertise in innovative automotive technology and R&D. It has received funding from prominent Chinese and international investors including Alibaba Group, Foxconn Group and IDG Capital. Currently with 3,000 employees, the company is headquartered in Guangzhou and has design, R&D, manufacturing and sales & marketing divisions in Silicon Valley, San Diego, Beijing, Shanghai, Zhaoqing (Guangdong Province) and Zhengzhou (Henan Province).






