Capacity Operations Manager

Reposted 4 Days Ago
Be an Early Applicant
Hiring Remotely in Santa Clara, CA, USA
In-Office or Remote
136K-276K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Role
Manage GPU capacity and HPC clusters across cloud platforms, enhance data models and dashboards, identify performance issues, and optimize resource usage through collaboration with teams.
Summary Generated by Built In

Our technology is limitless! NVIDIA is developing the world’s most innovative and groundbreaking computing platforms. Due to our work, scientists, researchers, and engineers are able to advance their ideas. At its essence, our visual computing technology offers not only an outstanding computing experience but also energy efficiency! We led the way in a supercharged style of computing embraced by the fastest-moving computer users globally—scientists, designers, artists, and gamers. However, it’s more than just technology! It’s our people, some of the brightest in the world, and our company makes NVIDIA one of the most fun, inventive, and dynamic workplaces! At the core of NVIDIA are values such as innovation, excellence, determination, and collaboration that inspire us to achieve our best.
 

What you will be doing:

  • Coordinate the development of High Performance Computing (HPC) clusters, collaborating closely with internal and external engineering teams.

  • Direct and improve GPU capacity and additional compute resources across diverse cloud service platforms to satisfy rising needs and secure efficient deployment.

  • Design, improve, and manage data models, reporting platforms, data automation solutions, dashboards, and performance measures that back NVIDIA Infrastructure governance programs and strategic capacity decisions.

  • Assess the technical and business requirements for GPU capacity and other compute resources from different internal and external groups.

  • Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with relevant infrastructure teams to resolve them.

  • Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.

  • Develop and enhance tooling for our cloud infrastructure and analytics platform to optimize resource usage and performance for NVIDIA and its customers. This includes crafting and developing tools for automating workflows and potentially bringing to bear AI techniques to extract useful signals and insights from generated data.

  • Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level benchmarks to match Customer satisfaction.

What we need to see:

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent experience.

  • 8+ years of overall experience in cloud computing, specifically in managing or using GPU capacity for high performance computing. A proven record of large-scale computing operations and planning is a plus.

  • Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets. Experience with command line interfaces and shell scripting languages.

  • Comprehensive knowledge of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies. Practical experience with Cloud Service Providers including AWS, Azure, GCP, and OCI is essential.

  • Demonstrated experience in bringing to bear AI tools and techniques to extract useful signals and insights from data, specifically to improve resource usage and automation.

  • Deep knowledge and active use of statistical modeling and machine learning approaches for boosting operational efficiency and supporting strategic capacity decisions.

  • Understanding of analytics, statistical modeling, and machine learning methodologies.

  • Strong communication and relationship-building skills, with the ability to work well across different departments and contribute to strategic decisions.

  • Self-starter, self-motivated, focused, and self-sufficient, with a willingness to learn new challenges and adapt quickly in a dynamic environment.

  • Ability to operate effectively amidst uncertainty and rapidly changing business conditions, with an agile approach and a commitment to ongoing improvement.

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. NVIDIA is widely considered one of the technology world’s most desirable employers. Some of the world's most forward-thinking and hardworking people are working for us. If you're creative and self-motivated, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 218,500 USD for Level 4, and 176,000 USD - 276,000 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until March 24, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
21,960 Employees
Year Founded: 1993

What We Do

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”

Similar Jobs

Wheel Logo Wheel

Senior Marketplace Operations & Capacity Planning Manager

Healthtech • Sales • Software • Telehealth
Remote
USA
150 Employees
165K-185K Annually

Tempus AI Logo Tempus AI

Manager, Biostatistics

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
Remote or Hybrid
4 Locations
3775 Employees

ServiceNow Logo ServiceNow

Creative Director

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
28000 Employees
166K-290K Annually

ServiceNow Logo ServiceNow

Creative Director

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
28000 Employees
114K-156K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account