NVIDIA

Software Engineer, AI and DL Kernel Libraries

Reposted Yesterday

Be an Early Applicant

Shanghai, Shanghai Municipality, Shanghai, CHN

In-Office

Mid level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The Role

Design, implement, and optimize GPU-accelerated deep learning kernels, kernel libraries, and inference runtime components. Build production-quality software (e.g., cuDNN, FlashInfer) and JIT/codegen systems, profile and tune performance for LLM inference, and collaborate across compilers, GPU architecture, and open-source ecosystems to ship scalable AI inference solutions.

Summary Generated by Built In

We're looking for outstanding AI systems software engineers to develop groundbreaking technologies across the inference systems software stack. Our team builds core AI systems software that accelerates high-impact workloads on NVIDIA GPUs, from deep learning primitives and kernel libraries to LLM inference runtimes, serving abstractions, and code generation technologies. As a member of the team, you will help design, build, optimize, and ship production-quality software that powers NVIDIA's AI software stack.

This role spans both foundational library engineering and next-generation inference systems work, with opportunities to contribute across the stack from low-level kernels and performance primitives to serving runtimes and developer-facing abstractions. You may work on GPU-accelerated deep learning primitives, efficient attention kernel implementations, LLM serving components, just-in-time compilation systems, software abstractions, and performance-critical runtime infrastructure for large language models, agents, and other advanced AI workloads. You will collaborate with world-class engineers across deep learning software, compilers, GPU architecture, and open-source inference ecosystems, and your work will directly impact NVIDIA's AI platform and the performance of real-world workloads at scale.

What you'll be doing:

Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads.
Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability.
Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems.
Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems.
Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads.
Analyze workload performance, tune current software, and propose improvements to future software and hardware-software interfaces.
Collaborate closely with engineers across deep learning frameworks, libraries, kernels, compilers, and GPU architecture teams at NVIDIA.
Contribute to open-source communities and ecosystem integrations where relevant, including projects such as FlashInfer, vLLM, and SGLang.

What we need to see:

Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software. More experience is expected for senior-level candidates.
Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software.
Solid experience with CUDA development and GPU programming fundamentals.
Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
Good understanding of linear algebra, performance analysis, profiling, and code optimization.
Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
Familiarity with modern machine learning and inference system trends, especially around LLMs and generative AI.
For senior candidates, strong experience in GPU kernel development and performance optimization, especially using CUDA C/C++, cuTile, Triton, or similar technologies, is expected.

Ways to stand out from the crowd:

Hands-on experience with inference engines and runtimes such as vLLM, SGLang, MLC, TensorRT-LLM, or similar systems.
Background in domain-specific compiler, code generation, or library solutions for LLM inference and training.
Expertise in machine learning compilers or IR systems such as MLIR, Apache TVM, TensorIR, or related technologies.
Practical experience with GPU performance modeling, computer architecture, or accelerator-oriented software design.
Open-source project ownership or meaningful contributions in deep learning systems, compilers, kernels, or inference infrastructure.

Skills Required

Master's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience.
3+ years industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software.
Strong programming skills in C/C++ and Python.
Solid experience with CUDA development and GPU programming fundamentals.
Experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
Good understanding of linear algebra, performance analysis, profiling, and code optimization.
Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
Familiarity with modern machine learning and inference system trends, especially LLMs and generative AI.
GPU kernel development and performance optimization experience (CUDA C/C++, cuTile, Triton) — expected for senior candidates.
Experience with inference engines and runtimes such as vLLM, SGLang, MLC, or TensorRT-LLM.
Background in domain-specific compiler, code generation, or library solutions for LLM inference and training.
Expertise with machine learning compilers or IR systems such as MLIR, Apache TVM, or TensorIR.
Practical experience with GPU performance modeling, computer architecture, or accelerator-oriented software design.
Open-source project ownership or meaningful contributions in deep learning systems, compilers, kernels, or inference infrastructure.

NVIDIA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NVIDIA and has not been reviewed or approved by NVIDIA.

Equity Value & Accessibility — Equity awards and a discounted ESPP are highlighted as core parts of total compensation, enabling employees to share in the company’s success. Stock-based compensation and the two-year lookback ESPP are consistently described as especially valuable.
Healthcare Strength — Health coverage is portrayed as robust, with comprehensive medical, dental, and vision options alongside mental health support and on-site care resources. Employer HSA contributions and wellness perks reinforce the depth of the offering.
Retirement Support — Retirement programs are depicted as strong, featuring a meaningful 401(k) match with Roth options and support for Mega Backdoor Roth contributions. These elements position long-term savings as a notable advantage of the total rewards package.

Learn more about NVIDIA's Compensation & Benefits →

NVIDIA Insights

What's It Like to Work at NVIDIA? NVIDIA Culture & Values NVIDIA Career Growth & Development What's the Work-Life Balance Like at NVIDIA? NVIDIA Leadership & Management NVIDIA Company Growth, Stability & Outlook

View all jobs at NVIDIA

View NVIDIA Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Santa Clara, CA

21,960 Employees

Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”