NVIDIA

Senior Software Development Engineer in Test

Posted 3 Days Ago

Be an Early Applicant

Santa Clara, CA, USA

In-Office

168K-270K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The Role

Lead design, implementation, and automation of large-scale cloud and data center test infrastructure. Develop CI/CD pipelines, validate performance, scalability, and reliability, debug clusters (network, storage, security), manage Kubernetes and cloud environments, leverage AI tools to accelerate testing, and coordinate cross-team bring-up and issue resolution.

Summary Generated by Built In

We are seeking a highly skilled and hard-working Senior Test Developer / test engineer to join our multifaceted Enterprise Software QA team. This role offers an outstanding opportunity to leave your mark on the design, construction, optimization and testing of large-scale infrastructure for various foundational NVIDIA unified cloud services and data center offerings. If you are a dedicated engineer with strong expertise in cloud infrastructure and distributed systems and want to apply your skills with AI tools, this role could fit you perfectly. You will thrive in an exciting, innovative environment.

What you'll be doing:

Work with development teams on test plans for all layers of SW stack for cloud infrastructure, execution, reviews, failure analysis and assessing overall quality and risk. Work with customer PMs on software issues including technical feedback from OEMs and CSPs. Develop key benchmarks to track execution and deploy process improvements to improve efficiency
Leverage AI skills to expedite the test scope, test plan, execution and automation workflows.
Lead NVIDIA Cloud and Data Center bring up activities which will involve validation, reporting, working with engineering to debug issues, providing design input at times, adding coverage in different areas.
Design, develop and maintain CI/CD pipelines for continuous testing in cloud environments when needed.
Perform performance, scalability, and reliability testing of cloud services.
Implement and maintain test environments in cloud platforms such as AWS, Azure, or Google Cloud.
Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.
Work with various different partner teams to ensure availability of clusters to test on and take the lead in resolve all issues.
Working with teams to ensure quality of the cloud products getting delivered focusing on critical areas like security, storage, workloads, performance on latest SW and FW components.

What we need to see:

A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.
Experience with AI development tools used in creating test cases, automating test cases, code coverage, triaging.
8+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
2+ years strong experience with cloud infrastructure platforms like AWS, Azure, Google, OCI Cloud.
Hands-on experience with network, storage, security, cluster configuration and debugging, cloud infrastructure management tools like terraform, ansible.
Expertise in administering, operating, and configuring Kubernetes.
Experience in CI/CD tools such as Gitlab and Jenkins and the GitOps model.
Proficiency in various monitoring tools :Prometheus, Grafana, Cloudwatch, and Thanos.
Proficiency in debugging issues involving networks, DHCP, DNS, HTTP, Linux, and containers.

Ways to Stand Out from the Crowd:

Familiarity with "Base Command Manager" for managing and monitoring high performance computing.
Experience in writing automation for web application using tools like selenium, playwright.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 270,250 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until July 6, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Skills Required

Master's or Ph.D. in Computer Science or related field, or equivalent experience.
Experience with AI development tools for creating and automating test cases, code coverage, and triaging.
8+ years hands-on experience in cluster management and related tools including Docker, Slurm, Kubernetes, and Ansible.
2+ years strong experience with cloud infrastructure platforms such as AWS, Azure, Google Cloud, OCI Cloud.
Hands-on experience with network, storage, security, cluster configuration and debugging; cloud infrastructure management tools like Terraform and Ansible.
Expertise in administering, operating, and configuring Kubernetes.
Experience with CI/CD tools such as GitLab and Jenkins and familiarity with the GitOps model.
Proficiency with monitoring tools such as Prometheus, Grafana, CloudWatch, and Thanos.
Proficiency in debugging network, DHCP, DNS, HTTP, Linux, and container issues.
Familiarity with Base Command Manager for HPC management and monitoring.
Experience writing web automation using Selenium or Playwright.

NVIDIA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NVIDIA and has not been reviewed or approved by NVIDIA.

Equity Value & Accessibility — Equity awards and a discounted ESPP are highlighted as core parts of total compensation, enabling employees to share in the company’s success. Stock-based compensation and the two-year lookback ESPP are consistently described as especially valuable.
Healthcare Strength — Health coverage is portrayed as robust, with comprehensive medical, dental, and vision options alongside mental health support and on-site care resources. Employer HSA contributions and wellness perks reinforce the depth of the offering.
Retirement Support — Retirement programs are depicted as strong, featuring a meaningful 401(k) match with Roth options and support for Mega Backdoor Roth contributions. These elements position long-term savings as a notable advantage of the total rewards package.

Learn more about NVIDIA's Compensation & Benefits →

NVIDIA Insights

What's It Like to Work at NVIDIA? NVIDIA Culture & Values NVIDIA Career Growth & Development What's the Work-Life Balance Like at NVIDIA? NVIDIA Leadership & Management NVIDIA Company Growth, Stability & Outlook

View all jobs at NVIDIA

View NVIDIA Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Santa Clara, CA

21,960 Employees

Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”