Senior Software Research Architect, AI Networking

Posted 6 Days Ago
Be an Early Applicant
Tel Aviv
In-Office
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Role
The Senior Software Architect will lead the development of AI networking solutions, optimizing systems for generative AI workloads in distributed settings. Responsibilities include analyzing architectures, designing systems, collaborating with teams, and publishing research.
Summary Generated by Built In

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. Being an NVIDIAN means being part of a diverse and encouraging setting that encourages everyone to perform at their peak. Come join the team and discover how you can develop a lasting influence on the world. 

NVIDIA is in search of a Senior Software Architect- a creative, forward-thinking, and practical researcher to improve the framework for widespread LLM learning and prediction. As part of our dynamic E2E Architecture group, you will design and optimize systems driving generative AI workloads, working at the intersection of software and hardware on some of the most advanced GPU clusters worldwide. You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and compute scheduling to system-level optimization. This is an opportunity to collaborate with best-in-class engineers and researchers and shape the future of generative AI in real-world applications. Your work will make a lasting impact by enabling generative AI technologies to reach real-world applications and improve global computing capabilities. 

What You’ll Be Doing: 

  • Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, and placement.  

  • Analyze current deployments, develop prototypes, and recommend architectural improvements. 

  • Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies. 

  • Design, simulate, and validate new systems using novel, scalable network simulator NSX. 

  • Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1). 

  • Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features. 

  • Publish patents and present research at leading conferences. 

What We Need to See: 

  • M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience and publications.  

  • 5+ years of relevant experience.

  • Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing). 

  • Strong software engineering skills in C++ and/or Python. 

  • Excellent system-level design and problem-solving abilities. 

  • Outstanding communication and collaboration skills across technical domains.  

Ways to Stand Out from the Crowd: 

  • Proven passion for solving sophisticated technical problems and delivering impactful solutions. 

  • Record of publications in top-tier conferences. 

  • Experience in designing and building large-scale AI training clusters. 

  • Post-PhD research experience 

  • Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows. 

Top Skills

C++
Nccl
Nsx
Python
Rdma
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
21,960 Employees
Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”

Similar Jobs

Taboola Logo Taboola

Program Manager

AdTech • Big Data • Digital Media • Marketing Tech
Hybrid
Tel Aviv, ISR
1900 Employees

HoneyBook Logo HoneyBook

Product Manager

Fintech • Payments • Productivity • Software • Automation
Hybrid
Tel Aviv, ISR
300 Employees
7-10 Annually

Taboola Logo Taboola

Professional Services Subject Matter Expert

AdTech • Big Data • Digital Media • Marketing Tech
Hybrid
Tel Aviv, ISR
1900 Employees

Lusha Logo Lusha

Senior Full-stack Engineer

HR Tech • Sales • Software • Database • Business Intelligence
Remote or Hybrid
Tel Aviv, ISR
300 Employees

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account