We are looking for a Software Solutions Engineer to support NVIDIA AI Enterprise customers and deployments across cloud and datacenter environments. This is a dual role: (1) Support, triage and resolve complex customer software issues end-to-end, and (2) build software features, automation, diagnostics, reproducible test cases, and deployment tooling—to improve product readiness and scale support across enterprise environments.
You will work across compute and cloud-native technologies in CSP environments, including container platforms/orchestrators, enterprise system software, and GPU-accelerated AI frameworks and inference services used to run production AI workloads at scale. In this customer-facing role, you will work closely with customers and internal engineering teams to understand issues, explain root causes, drive resolution, and collaborate on fixes and improvements. Success in this role requires strong debugging skills, crisp communication, and ownership of technically deep escalations from inception to closure.
What you'll be doing:
Develop and maintain product-facing features and deployment assets for AI Enterprise supportability (e.g., scripts, configuration guidance, Kubernetes manifests/Helm charts, and reproducible test cases)
Develop and maintain Python-based tooling/automation (validators, log collectors, repro harnesses) to improve NVIDIA AI Enterprise deployment reliability across NGC and container orchestrators (e.g., Kubernetes)
Contribute code-level fixes, patches, or pull requests (as appropriate) in collaboration with engineering to address customer-impacting issues and improve product readiness
Support enterprise customers deploying NVIDIA AI Enterprise in datacenter and CSP environments, including Kubernetes-based and containerized production AI platforms
Take ownership of customer issues from inception to resolution: reproduce in lab/cloud, collect diagnostics, provide mitigations, and partner with engineering on fixes
Create high-quality bug reports and RFEs with clear repro steps, environment details (CSP/Kubernetes/GPU), impact analysis, and supporting artifacts
Develop customer-facing and internal documentation (KBs, runbooks, deployment guidance) to improve time-to-value and reduce recurring issues
Be on call one weekend per month in the event a customer has a Sev1 outage and requires engineering assistance
What we need to see:
BS in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience)
At least 5+ years system software development and troubleshooting experience, ideally with some customer facing
Strong computer science fundamentals and programming/scripting skills (Python required; Bash; Go/C++ a plus) to automate investigations and build diagnostics/repro tools
Strong troubleshooting fundamentals (networking, concurrency, OS concepts) and a structured approach to isolating issues across application, platform, and infrastructure layers
Deep understanding of at least two of the following: data centers/servers, distributed systems, virtualization, deep learning frameworks, containers (Docker/Kubernetes), hybrid cloud (AWS/Azure/GCP), and CI/CD for reliable deployments
Familiarity with GPU-accelerated AI/ML stacks and production model deployment/serving (e.g., NGC containers, CUDA/tooling concepts, inference serving such as Triton or similar)
Deep Linux knowledge and comfort troubleshooting in production Linux environments; working knowledge of Windows is a plus
Professional-level communication skills, interpersonal skills with a passion to solve problems
Ways to stand out from the crowd:
Hands-on experience deploying and operating NVIDIA AI Enterprise components in production across on-prem or CSP environments
Hands-on experience using AI coding assistants/tools (e.g., Cursor, Claude Code, Codex, or similar) to accelerate debugging, automation, and test creation
Experience operating Kubernetes-based platforms in production (cluster operations, upgrades, control-plane/data-plane failure modes)
Strong performance debugging skills for GPU and cloud workloads (profiling, latency/throughput tuning) and familiarity with observability/tracing tools
Skills Required
- BS in Computer Science, Electrical Engineering, or related field
- At least 5+ years system software development and troubleshooting experience
- Strong programming/scripting skills (Python required; Bash; Go/C++ a plus)
- Deep understanding of data centers/servers, distributed systems, virtualization, deep learning frameworks
- Deep Linux knowledge and comfort troubleshooting in production Linux environments
NVIDIA Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NVIDIA and has not been reviewed or approved by NVIDIA.
-
Equity Value & Accessibility — Equity awards and a discounted ESPP are highlighted as core parts of total compensation, enabling employees to share in the company’s success. Stock-based compensation and the two-year lookback ESPP are consistently described as especially valuable.
-
Healthcare Strength — Health coverage is portrayed as robust, with comprehensive medical, dental, and vision options alongside mental health support and on-site care resources. Employer HSA contributions and wellness perks reinforce the depth of the offering.
-
Retirement Support — Retirement programs are depicted as strong, featuring a meaningful 401(k) match with Roth options and support for Mega Backdoor Roth contributions. These elements position long-term savings as a notable advantage of the total rewards package.
NVIDIA Insights
What We Do
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”
.png)








