NVIDIA is seeking a technical leader to define, craft, implement, and guide firmware architecture for reliability, availability, serviceability, and power management across next-generation NVIDIA Networking products and platforms. You will take a strong hands-on role, working with hardware, firmware, software, validation, customer engineering, and external partners to build robust, diagnosable, power-efficient systems for large-scale deployments.
NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI, with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as the AI computing company. We are looking to grow our teams with the smartest people in the world. If you're creative and autonomous, we want to hear from you!
What you'll be doingDefine platform-level firmware architecture for RAS and power management across SoCs, accelerators, DPUs, servers, embedded systems, and data center platforms.
Own error detection, classification, containment, recovery, escalation, and reporting architecture.
Define firmware architecture for power sequencing, power states, reset flows, thermal and power fault handling, idle management, and recovery from power-related failures.
Create firmware specifications for hardware error handling, health monitoring, crash capture, telemetry, diagnostics, debug data, and field serviceability.
Define interfaces and contracts between firmware, hardware, operating systems, BMCs, management controllers, platform software, and cloud/service infrastructure.
Drive architecture reviews, tradeoff discussions, failure-mode analysis, validation strategy, and long-term RAS and power management roadmap planning.
Establish standards for error logs, event schemas, telemetry flows, recovery policies, service diagnostics, and production debug infrastructure.
Guide engineering teams through implementation, validation, silicon bring-up, platform integration, and production deployment of RAS and power management features.
Analyze customer and field failures, identify architectural gaps, and feed lessons learned into future platform designs.
BSc, MS, or PhD in Electrical Engineering, Computer Science, Computer Engineering, or equivalent experience.
7+ years of relevant experience in firmware, platform architecture, embedded systems, or low-level systems software.
Deep understanding of RAS principles, fault modeling, error containment, recovery policies, diagnosability, and serviceability requirements.
Experience architecting firmware for complex hardware platforms such as SoCs, accelerators, DPUs, servers, networking devices, or embedded systems.
Strong knowledge of power management concepts, including power sequencing, reset architecture, thermal and power fault handling, power state transitions, and platform recovery flows.
Familiarity with boot firmware, UEFI/BIOS, BMC, embedded controllers, RTOS, embedded Linux, or platform management stacks.
Strong understanding of hardware/software interfaces, registers, interrupts, telemetry paths, debug infrastructure, and firmware-to-hardware contracts.
Programming and debugging fundamentals across languages such as C/C++, Python/Perl scripting, Verilog, assembly, or RISC-V assembly.
Ability to lead cross-functional architecture discussions and drive alignment across hardware, firmware, software, validation, product, and customer-facing teams.
Excellent communication skills, strong technical leadership, and a real passion for working collaboratively.
Experience with PCIe AER, CXL RAS, memory RAS, ECC/parity, accelerator RAS, networking RAS, high-availability systems, or large-scale data center platforms.
Knowledge of ACPI, SMBIOS, UEFI, PLDM, MCTP, Redfish, IPMI, or cloud telemetry systems.
Experience with power/thermal fault handling, dynamic power management, platform power sequencing, low-power states, or autonomous recovery mechanisms.
Background in silicon bring-up, platform validation, production diagnostics, or customer failure analysis.
Prior technical leadership experience as a firmware architect, principal engineer, platform lead, or domain owner.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Skills Required
- BSc, MS, or PhD in Electrical Engineering, Computer Science, Computer Engineering, or equivalent experience
- 7+ years of relevant experience in firmware, platform architecture, embedded systems, or low-level systems software
- Deep understanding of RAS principles and fault modeling
- Experience architecting firmware for complex hardware platforms
- Strong knowledge of power management concepts
- Familiarity with boot firmware, UEFI/BIOS, BMC, RTOS
- Strong understanding of hardware/software interfaces
- Programming and debugging fundamentals across multiple languages
- Ability to lead cross-functional architecture discussions
- Excellent communication skills and technical leadership
NVIDIA Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NVIDIA and has not been reviewed or approved by NVIDIA.
-
Equity Value & Accessibility — Equity awards and a discounted ESPP are highlighted as core parts of total compensation, enabling employees to share in the company’s success. Stock-based compensation and the two-year lookback ESPP are consistently described as especially valuable.
-
Healthcare Strength — Health coverage is portrayed as robust, with comprehensive medical, dental, and vision options alongside mental health support and on-site care resources. Employer HSA contributions and wellness perks reinforce the depth of the offering.
-
Retirement Support — Retirement programs are depicted as strong, featuring a meaningful 401(k) match with Roth options and support for Mega Backdoor Roth contributions. These elements position long-term savings as a notable advantage of the total rewards package.
NVIDIA Insights
What We Do
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”







