Senior Software Engineer, NCCL

Posted Yesterday
Be an Early Applicant
Shanghai, Shanghai Municipality, Shanghai, CHN
In-Office
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Role
Design and maintain optimized communication runtimes for Deep Learning and HPC; implement system software for GPU interactions; collaborate in parallel programming specifications.
Summary Generated by Built In

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.

We are looking for a highly motivated senior software engineer for an exciting role in our communication libraries and network software team. The position will be part of a fast-paced crew that develops and maintains software for complex heterogeneous computing systems that power disruptive products in High Performance Computing and Deep Learning.

What you will be doing:

  • Design, implement and maintain highly-optimized communication runtimes for Deep Learning frameworks (e.g. NCCL for TensorFlow/Pytorch) and HPC programming interfaces (e.g. UCX for MPI/OpenSHMEM) on GPU clusters.

  • Participating in and contributing to parallel programming interface specifications like MPI/OpenSHMEM.

  • Design, implement and maintain system software that enables interactions among GPUs and interactions between GPUs and other system components.

  • Creating proof-of-concepts to evaluate and motivate extensions in programming models, new designs in runtimes and new features in hardware.

What we need to see:

  • M.S./Ph.D. degree in CS/CE or equivalent experience.

  • 5+ years of relevant experience.

  • Excellent C/C++ programming and debugging skills.

  • Strong experience with Linux.

  • Expert understanding of computer system architecture and operating systems.

  • Experience with parallel programming interfaces and communication runtimes.

  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Ways to stand out from the crowd:

  • Deep knowledge of high-performance networks like InfiniBand, RoCE etc.

  • Experience with HPC applications. Experience with Deep Learning Frameworks such PyTorch, TensorFlow, JAX/XLA, vLLM/SGLang etc.
  • Experience with AI/DL communication patterns such as Expert Parallelism (EP), TP, DP, PP and how these patterns can be implemented with NCCL. Experience with CUDA kernel optimization and profiling.
  • Experience with large-scale model training and production inference software stack.
  • Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

Top Skills

C/C++
Cuda
Linux
Mpi
Nccl
Openshmem
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
21,960 Employees
Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”

Similar Jobs

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Site Reliability Engineer

eCommerce • Fashion • Other • Retail • Sales • Wearables • Design
Hybrid
Shanghai, Shanghai Municipality, Shanghai, CHN
16000 Employees

Adyen Logo Adyen

Implementation Engineer

Fintech • Payments • Financial Services
Easy Apply
Hybrid
Shanghai, Shanghai Municipality, Shanghai, CHN
4771 Employees

Airwallex Logo Airwallex

Senior Account Executive

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
In-Office
Shanghai, Shanghai Municipality, Shanghai, CHN
2000 Employees

Adyen Logo Adyen

Team Lead

Fintech • Payments • Financial Services
Easy Apply
Hybrid
Shanghai, Shanghai Municipality, Shanghai, CHN
4771 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account