Senior Network Performance Exploration Engineer

Posted 2 Days Ago
Be an Early Applicant
2 Locations
In-Office
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Role
The role involves optimizing network performance for AI infrastructure, analyzing application performance, and conducting benchmarking to enhance system efficiency.
Summary Generated by Built In

Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and state-of-the-art accelerated computing platforms. Because of our work, scientists, researchers, and engineers can advance their ideas. We pioneered a supercharged form of computing loved by the fastest-paced computer users in the world—scientists, designers, artists, and gamers.

We seek a highly motivated Network Performance Exploration Engineer to join our team of experts and help shape the foundational infrastructure for the AI revolution. Our next-generation networking systems are at the forefront of connecting and powering the world's most advanced AI clusters. As a key member of our architecture team, you will be responsible for exploring and identifying critical network optimization opportunities across our entire hardware and software stack, analyzing how system-level changes impact application-level performance.

What You’ll Be Doing:

  • Explore and validate end-to-end application performance, defining comprehensive test plans and critical metrics to identify optimization opportunities in both hardware and software.

  • Establish and maintain a comprehensive database of benchmark results, tracking performance across releases to drive data-informed decisions.

  • Conduct deep-dive analysis into communication libraries (like NCCL), system software, and hardware configurations to investigate performance characteristics, validate architectural theories, and identify bottlenecks.

  • Provide critical performance data to correlate and enhance simulation tools, ensuring our models accurately predict real-world hardware behavior.

  • Analyze application-level traffic patterns (e.g., LLMs) on our advanced networking fabrics to identify hardware and software optimization opportunities and tune system parameters.

  • Lead Proof-of-Concept (POC) projects to prototype and evaluate potential hardware and software optimizations and their impact on application performance.

What We Need To See:

  • B.Sc. or M.Sc. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.

  • 5+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.

  • Hands-on programming skills in Python and/or C/C++ for system analysis, automation, and customizing benchmarks.

  • Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.

  • Proven experience in performance analysis, benchmarking, and identifying system bottlenecks.

  • Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to dive deep into complex software and hardware interactions.

  • Ability to thrive in a a fast-paced, dynamic environment and work concurrently with multiple cross-functional teams.

Ways To Stand Out From The Crowd:

  • Deep understanding of and hands-on experience with communication libraries such as NCCL, UCX, or MPI.

  • Direct experience debugging or modifying the source code of a major communication library.

  • Expertise in the architecture and system-level requirements of large-scale, distributed Deep Learning workloads (e.g., LLMs).

  • Expertise in high-performance network protocols (Ethernet, InfiniBand, RoCE) and interconnect technologies like NVLink.

  • Familiarity with the PyTorch ecosystem, especially for distributed workloads.

NVIDIA has some of the most forward-thinking and hardworking people in the world working for us, and due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.

We are committed to fostering a diverse work environment and are proud to be an equal-opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

C/C++
Ethernet
Infiniband
Mpi
Nccl
Nvlink
Python
PyTorch
Roce
Ucx
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
21,960 Employees
Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”

Similar Jobs

Intelsat, now part of SES Logo Intelsat, now part of SES

Human Resources Business Partner

Aerospace • Digital Media • Information Technology • Internet of Things • Mobile • Software
Hybrid
3 Locations
2100 Employees

Johnson & Johnson Logo Johnson & Johnson

Digital Solutions Senior SW Engineer

Healthtech • Pharmaceutical • Manufacturing
In-Office
Yokneam, ISR
143612 Employees

IFF Family of Companies Logo IFF Family of Companies

QC Analyst

Biotech • Manufacturing
In-Office
Migdal Haemek, ISR
13014 Employees

NVIDIA Logo NVIDIA

Design Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office
2 Locations
21960 Employees
1-1 Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account