- Research and implement multi-modal large models (image-text, image-audio, etc.) training, fine-tuning, and inference optimization strategies, continuously improving model performance, efficiency, and generalization ability.
- Design and optimize computer vision models and algorithms (e.g., detection, classification, segmentation, feature extraction) to support real-world applications.
- Collaborate with cross-functional teams (product, engineering, data) to translate research into scalable, reliable, and production-ready solutions.
- Use C++ to implement and optimize models and systems, including deployment, performance tuning, and integration, ensuring low latency and high throughput.
- Stay up to date with advances in computer vision and multi-modal AI, and apply new methods to improve model performance and product impact.
- Contribute to technical discussions, code reviews, and knowledge sharing to improve code quality and engineering best practices.
- Master’s or Ph.D. in Computer Science or a related field, with strong expertise in computer vision and machine learning.
- 1-3 years of experience in multi-modal large model training, fine-tuning, and optimization (e.g., CLIP, Flamingo, BLIP, or self-developed multi-modal models), with a deep understanding of multi-modal fusion mechanisms.
- Strong foundation in computer vision, including object detection, image classification, feature matching, and image enhancement.
- Strong C++ development skills, with proficiency in STL, multi-threading, memory management, and performance optimization; experience in production-level implementation and deployment is required.
- Familiar with deep learning frameworks (e.g., PyTorch, TensorFlow) and computer vision libraries (e.g., OpenCV, OpenMMLab).
- Strong problem-solving ability, self-driven, and passionate about technological innovation; ability to work independently and in a team.
- Experience in edge device algorithm deployment, published papers in top computer vision conferences (CVPR, ICCV, ECCV), or open-source project contributions in related fields.
- A fun, supportive and engaging environment.
- Opportunity to make a significant impact on the transportation revolution by the means of advancing autonomous driving.
- Opportunity to work on cutting edge technologies with the top talent in the field.
- Competitive compensation package.
- Snacks, lunches and fun activities.
Skills Required
- Master's or Ph.D. in Computer Science or a related field
- 1-3 years of experience in multi-modal large model training and optimization
- Strong foundation in computer vision including object detection and image classification
- Strong C++ development skills with proficiency in STL and multi-threading
- Familiar with deep learning frameworks like PyTorch and TensorFlow
What We Do
Xpeng Motors is a leading Chinese electric vehicle and technology company that designs and manufactures intelligent automobiles that are seamlessly integrated with the Internet and utilize the latest advances in artificial intelligence. Focusing on China’s young and tech-savvy consumer base, XPENG Motors strives to offer smart mobility solutions with technology innovation and cutting-edge R&D. The company’s initial backers include its CEO & Chairman He Xiaopeng, the founder of UCWeb Inc. and a former Alibaba executive. It was co-founded in 2014 by Henry Xia and He Tao, former senior executives at Guangzhou Auto with expertise in innovative automotive technology and R&D. It has received funding from prominent Chinese and international investors including Alibaba Group, Foxconn Group and IDG Capital. Currently with 3,000 employees, the company is headquartered in Guangzhou and has design, R&D, manufacturing and sales & marketing divisions in Silicon Valley, San Diego, Beijing, Shanghai, Zhaoqing (Guangdong Province) and Zhengzhou (Henan Province).









