Senior Solutions Architect - AI Infrastructure

Posted Yesterday
Be an Early Applicant
3 Locations
In-Office or Remote
184K-357K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Role
Lead GPU and NVLink-based cluster design and validation for large-scale AI and HPC deployments. Advise cloud partners on architectures, perform performance modeling, debug deployment issues, support NPI rollouts, and relay field feedback to engineering.
Summary Generated by Built In

NVIDIA is building the world’s most groundbreaking and innovative accelerated computing platforms for AI and HPC.  Because of our work, scientists, researchers, and engineers can push the boundaries of what’s possible.  We pioneered a supercharged form of computing that powers everything from breakthrough AI research to the world’s fastest supercomputers.

We are seeking a highly motivated Senior Solutions Architect to join the NVIDIA Cloud Partners team with a focus on GPU, NVLink, and infrastructure design. In this role, you will be at the forefront of assisting with designs and architectures for some for the largest next-generation GPU-based clusters enabling the world’s most advanced AI supercomputers and enterprise AI infrastructure in the field. As a Solutions Architect, you will serve as a key technical expert bridging NVIDIA’s ground breaking GPU and NVLink technology designs as well as all of our software solutions directly between engineering and field teams supporting customers with the most demanding requirements.  You will work on end-to-end cluster design and architecture, performance modeling, validation, and NPI cluster deployments.  Your expertise will directly influence how the world’s leading AI companies, cloud providers, hyperscalers, research institutions, and enterprises build their infrastructure.

What you’ll be doing:

  • Partner with NVIDIA Cloud Partners in GPU cluster design and networking and convey architecture and optimal process information for building next-generation architectures.

  • Guide NVIDIA Cloud Partners in cluster design, weighing design principles but also complex, situational limitations to make the most performant and supportable GPU clusters possible.

  • Work closely with NVIDIA Cloud Partners to ensure successful first deployments with new products, including new network architectures and topologies.

  • Feedback customer/field perspectives on cluster design and workflows back to engineering teams designing internal clusters.

  • Perform hands-on work to assist NVIDIA Cloud Partners debugging issues relating to cluster design, configuration, and performance employing internal engineering expertise and known bugs.

  • Support NPI customer deployments with new GPU/Networking architectures.

What we need to see:

  • BS, MS, or PhD in Computer Science, Electrical Engineering, Computer Engineering, Physics, or related field (or equivalent experience).

  • 8+ years of experience in cluster design, validation, and issue resolution, specifically on GPU and HPC clusters.

  • Proven expertise in designing large-scale distributed systems, AI clusters, or HPC infrastructure.

  • Ability to translate sophisticated engineering concepts into customer-ready documentation, diagrams, and reference material.

  • Expertise in driving customer/partner issues to a close with product and engineering teams.

  • Ability to handle multi-functional communications across customer, product team, support team, engineering team, etc.

Ways to stand out from the crowd:

  • Experience leading large-scale AI Factory or HPC cluster bring-ups or builds.

  • Hands-on experience with NVIDIA products including, but not limited to, GPUs, NVLink, NVIDIA Networking, etc.; specifically debugging issues that occur during deployment on NVLink, etc.

  • Knowledge of NCCL, MPI, IMEX, NMX, and collectives in distributed training as it pertains to cluster designs.

  • External customer facing skill-set and background.

  • Effective time management and capability to balance multiple tasks and customers while thinking creatively to debug and solve problems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until July 6, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Skills Required

  • BS, MS, or PhD in Computer Science, Electrical Engineering, Computer Engineering, Physics, or related field (or equivalent experience).
  • 8+ years of experience in cluster design, validation, and issue resolution on GPU and HPC clusters.
  • Proven expertise in designing large-scale distributed systems, AI clusters, or HPC infrastructure.
  • Ability to translate sophisticated engineering concepts into customer-ready documentation, diagrams, and reference material.
  • Expertise in driving customer/partner issues to closure with product and engineering teams.
  • Ability to coordinate cross-functional communications across customers, product, support, and engineering teams.
  • Experience leading large-scale AI Factory or HPC cluster bring-ups or builds.
  • Hands-on experience with NVIDIA products (GPUs, NVLink, NVIDIA Networking) and debugging deployment issues.
  • Knowledge of NCCL, MPI, IMEX, NMX, and collectives in distributed training relevant to cluster design.
  • External customer-facing experience and strong time management for handling multiple tasks/customers.

NVIDIA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NVIDIA and has not been reviewed or approved by NVIDIA.

  • Equity Value & Accessibility Equity awards and a discounted ESPP are highlighted as core parts of total compensation, enabling employees to share in the company’s success. Stock-based compensation and the two-year lookback ESPP are consistently described as especially valuable.
  • Healthcare Strength Health coverage is portrayed as robust, with comprehensive medical, dental, and vision options alongside mental health support and on-site care resources. Employer HSA contributions and wellness perks reinforce the depth of the offering.
  • Retirement Support Retirement programs are depicted as strong, featuring a meaningful 401(k) match with Roth options and support for Mega Backdoor Roth contributions. These elements position long-term savings as a notable advantage of the total rewards package.

NVIDIA Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
21,960 Employees
Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”

Similar Jobs

NVIDIA Logo NVIDIA

Senior Solutions Architect

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office or Remote
3 Locations
21960 Employees
184K-357K Annually

Accuris Logo Accuris

Remote Customer Success Representative (Supply Chain Intelligence)

Information Technology • Machine Learning • Software • Conversational AI • Generative AI • Manufacturing
Remote
United States
1000 Employees
70K-80K Annually

Runpod Logo Runpod

Account Executive

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Easy Apply
Remote
USA
80 Employees
130K-300K Annually

Runpod Logo Runpod

Technical Support

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Easy Apply
Remote
USA
80 Employees
200K-300K Annually

Similar Companies Hiring

Legora Thumbnail
Artificial Intelligence • Legal Tech • Software
Chicago, Illinois
700 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account