GPGPU Software Architect/ Principal Engineer

Reposted 25 Days Ago
Be an Early Applicant
2 Locations
In-Office
242K-409K Annually
Senior level
Automotive
The Role
The role involves developing a software stack for GPGPU architecture, focusing on CUDA compatibility, performance modeling, and cross-functional collaboration in AI frameworks. Requires extensive experience in GPU software design.
Summary Generated by Built In
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
 
Our pioneering first-generation NPU, utilizing DSA architecture, has successfully entered mass production. We're currently validating the architecture of our second generation and are making the strategic decision to transition towards General Purpose GPU (GPGPU) architecture.
 
We're completely overhauling our software stack and embracing the CUDA ecosystem. Our goal is to achieve over 90% compatibility with cuBLAS/cuDNN on Linux across PCIe and CXL connections, all while delivering at least 1.3 times the performance of existing solutions on Transformer and Stable-Diffusion workloads.
 
Job Responsibilities:
 
Software Technical Strategy
  • Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries
  • Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features
  • Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others
Architecture & Design
  • Create a modular, layered Runtime architecture: CUDA → HAL → Kernel → Hardware, applicable across emulators, and actual silicon
  • Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model
  • Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX→ISA microcode caching
  • Develop GPU virtualization schemes(MIG) that work across processes and containers
Performance & Observability
  • Implement an end-to-end performance model: Python API → CUDA Runtime → Driver → ISA → Micro-architecture → Board-level interconnect
  • Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically
  • Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM
Cross-functional Collaboration
  • Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team
  • Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries
  • Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies

Minimum Requirements:
  • 10 years + in systems software, with at least 5 years in designing CUDA Compute stacks
  • Led end-to-end development of a GPU Runtime or AI acceleration library generation
  • Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend
  • Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines
  • Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage

The base salary range for this full-time position is $241,800 - $409,200 in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
 
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.
 

Top Skills

Ai Acceleration Libraries
Cublas
Cuda
Cuda Graph
Cudnn
Cufft
Llvm
Ptx
Sass
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Palo Alto, CA
993 Employees
Year Founded: 2014

What We Do

Xpeng Motors is a leading Chinese electric vehicle and technology company that designs and manufactures intelligent automobiles that are seamlessly integrated with the Internet and utilize the latest advances in artificial intelligence. Focusing on China’s young and tech-savvy consumer base, XPENG Motors strives to offer smart mobility solutions with technology innovation and cutting-edge R&D. The company’s initial backers include its CEO & Chairman He Xiaopeng, the founder of UCWeb Inc. and a former Alibaba executive. It was co-founded in 2014 by Henry Xia and He Tao, former senior executives at Guangzhou Auto with expertise in innovative automotive technology and R&D. It has received funding from prominent Chinese and international investors including Alibaba Group, Foxconn Group and IDG Capital. Currently with 3,000 employees, the company is headquartered in Guangzhou and has design, R&D, manufacturing and sales & marketing divisions in Silicon Valley, San Diego, Beijing, Shanghai, Zhaoqing (Guangdong Province) and Zhengzhou (Henan Province).

Similar Jobs

Hybrid
San Jose, CA, USA
213000 Employees
37-66 Hourly
Hybrid
2 Locations
213000 Employees
82K-125K Annually

Wells Fargo Logo Wells Fargo

Personal Banker Ladera Ranch

Fintech • Financial Services
Hybrid
Ladera Ranch, CA, USA
213000 Employees
23-31 Hourly

Wells Fargo Logo Wells Fargo

Personal Banker Temecula

Fintech • Financial Services
Hybrid
Temecula, CA, USA
213000 Employees
23-31 Hourly

Similar Companies Hiring

Cox Enterprises Thumbnail
Software • Other • Information Technology • Greentech • Cybersecurity • Cloud • Automotive
Atlanta, GA
50000 Employees
UL Solutions Thumbnail
Software • Renewable Energy • Professional Services • Energy • Consulting • Chemical • Automotive
Chicago, IL
15000 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account