Manager, HPC Support Engineering

Reposted 3 Days Ago
Be an Early Applicant
Hiring Remotely in USA
Remote
160K-240K Annually
Senior level
Software
The Role
Lead a team of HPC Support Engineers, overseeing technical support for GPU clusters, managing customer escalations, and driving support excellence. Collaborate with other teams to improve product support and ensure high-quality service for enterprise clients.
Summary Generated by Built In

Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU.


If you'd like to build the world's best deep learning cloud, join us. 


About the role

We are looking for a hands-on and customer-focused HPC Support Engineering Manager to lead our Tier III Support Engineering team supporting customers on Lambda’s Private Cloud GPU clusters.

You’ll be responsible for guiding a team of HPC Support Support Engineers, ensuring escalations are handled with speed and consistency, and driving a high standard of technical excellence and customer experience. This role requires both strong technical depth in HPC and the ability to lead, mentor, and collaborate across Support, Product, Engineering, and Sales. You’ll also play a critical role in shaping the supportability of Lambda’s products by representing customer experience in internal discussions.

This position reports to the Manager of Support Operations and includes participation in an on-call rotation.

What You'll Do

  • Lead, coach, and mentor a team of HPC Support Engineers, fostering both technical growth and customer-first execution.

  • Ensure the highest quality of support for Lambda’s customers, who depend on our products for mission-critical workloads.

  • Own customer escalations and incidents, engaging directly with enterprise customers during high-visibility situations.

  • Partner with Product and Engineering teams to influence design decisions and ensure future offerings are supportable and reliable.

  • Stay current on the latest HPC and NVIDIA technologies, applying that knowledge to improve customer outcomes.

  • Develop and refine support processes, documentation, and workflows to ensure consistency and best practices.

  • Monitor and report on team performance, driving improvements in responsiveness, resolution quality, and customer satisfaction.

  • Manage team schedules, including on-call responsibilities, to ensure 24/7 coverage for critical issues.

  • Lead by example — actively participating in troubleshooting and case resolution when needed.

You

  • Proven experience leading technical support or engineering teams, with a track record of building high-performing groups that deliver strong customer outcomes.

  • Skilled at managing escalations, providing clear direction under pressure, and serving as the point of leadership in critical customer situations.

  • Strong knowledge of HPC clusters, including GPU/InfiniBand systems, networking, and node-level troubleshooting.

  • Advanced Linux administration and diagnostic skills.

  • Skilled at motivating teams, setting direction, and developing engineers into strong technical contributors.

  • Strong analytical and problem-solving skills with a proactive, action-oriented mindset.

  • Action-oriented, accountable, and able to align team priorities with company and customer goals.

Nice to have

  • Advanced degree in Computer Science, Engineering, or related field.

  • Certifications in HPC, networking, or related technologies.

  • Experience with Slurm, Kubernetes, InfiniBand, and other high-performance interconnects (RoCE, NVLink/NVSwitch).

  • Background supporting Private Cloud environments or other dedicated enterprise clusters.

  • Experience supporting enterprise AI workloads across startups and Fortune 500 companies.

Salary Range Information

This is a salaried exempt role. The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~400 employees (2025) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and Commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Top Skills

Gpu
Hpc
Infiniband
Kubernetes
Linux
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
106 Employees
Year Founded: 2012

What We Do

Lambda provides computation to accelerate human progress. We're a team of Deep Learning engineers building the world's best GPU workstations and servers. Our products power engineers and researchers at the forefront of human knowledge. Customers include Microsoft, MIT, Los Alamos National Lab, Disney, Tencent, Kaiser Permanente, Stanford, Harvard, Caltech, and the Department of Defense.

Similar Jobs

Motive Logo Motive

Program Manager

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
United States
4000 Employees
76K-116K Annually

Luxury Presence Logo Luxury Presence

Staff Product Designer

Marketing Tech • Real Estate • Software • PropTech • SEO
Easy Apply
Remote or Hybrid
USA
417 Employees
185K-230K Annually
Easy Apply
Remote
United States
900 Employees
150K-195K Annually

AcuityMD Logo AcuityMD

Senior Data Engineer

Healthtech • Software
Easy Apply
In-Office or Remote
2 Locations
213 Employees
175K-200K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account