High Performance Compute (HPC) Software Engineer – HPC SW Systems

Reposted 10 Days Ago
Be an Early Applicant
Ann Arbor, MI, USA
In-Office
106K-180K Annually
Mid level
Hardware
The Role
Design, develop, and optimize HPC software and systems on large-scale Linux clusters, focusing on performance, power efficiency, and hardware integration. Collaborate across teams to define requirements and document architectures.
Summary Generated by Built In

Company Overview

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.

Job Description/Preferred Qualifications

Key Responsibilities

HPC Software Engineering

· Design, develop, and optimize HPC software running on large-scale Linux clusters, including distributed and parallel workloads (MPI, multithreading, GPU-accelerated pipelines, containerized workloads).

· Optimize application performance and power utilization across CPU, memory, storage, and network subsystem, with attention to throughput, latency, and scaling behavior.

· Develop and maintain system-level tooling for cluster bring-up, diagnostics, monitoring including component power usages, and health checks.

· Work closely with algorithms, systems and application teams to understand and translate workload characteristics into power-efficient HPC software solutions.

HPC Systems & Hardware Awareness

· Collaborate with hardware and systems teams to define HPC node, storage, and interconnect requirements based on software and algorithm needs.

· Understand and influence CPU/GPU selection, memory sizing, PCIe layout, NUMA behavior, and network topology to ensure optimal software performance.

· Participate in HW/SW co-debug activities, including performance bottlenecks, stability issues, and failure analysis.

Rack & Infrastructure Engineering

· Understand rack-level integration of HPC systems, focusing on power, cooling, cabling, networking, and physical layout considerations.

· Understand data-center and lab constraints such as power budgets, thermal limits, network drops, and serviceability.

· Contribute to best practices, and design reviews for new platforms and refresh cycles.

Cross-Functional Collaboration

· Act as a technical bridge between software, hardware, systems teams.

· Provide clear technical documentation covering software and system architecture, deployment flows, performance assumptions.

Required Qualifications

· Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience.

· Strong experience developing HPC or systems software on Linux.

· Proficiency in Java and/or C++ and/or other system-level or performance-oriented languages.

· Hands-on experience with parallel computing (MPI, OpenMP, multithreading). Candidates with GPU computing (CUDA, ROCm, or equivalent) would be preferred.

· Solid understanding of HPC hardware fundamentals: CPUs, memory hierarchies, storage, networking (Ethernet / InfiniBand).

· Practical experience working with clusters, servers, or rack-scale systems in lab or production environments.

· Strong debugging skills across software, OS, and hardware boundaries.

Preferred Qualifications

· Experience with containerized HPC environments (Docker, Singularity/Apptainer, Kubernetes in HPC contexts).

· Familiarity with high-speed interconnects, storage architectures, and performance benchmarking.

· Exposure to rack integration, including cabling, power distribution, cooling, and system bring-up.

· Experience in semiconductor, manufacturing, or high-reliability systems environments.

· Ability to reason about system reliability, MTBF/MTBA, and failure modes in large compute installations.

What Makes This Role Unique at KLA

· Work on mission-critical HPC platforms that directly impact semiconductor manufacturing capability.

· Influence both software architecture and physical system design, not just code in isolation.

· Collaborate with world-class experts across algorithms, hardware, systems, and operations.

· See your work deployed at scale in real production tools—not just in the data center.

Minimum Qualifications

Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years

Base Pay Range: $105,900.00 - $180,000.00 Annually

Primary Location: USA-MI-Ann Arbor-KLA

KLA’s total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.

Interns are eligible for some of the benefits listed. Our pay ranges are determined by role, level, and location. The range displayed reflects the pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including state minimum pay wage rates, location, job-related skills, experience, and relevant education level or training. We are committed to complying with all applicable federal and state minimum wage requirements where applicable. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.

KLA is proud to be an Equal Opportunity Employer. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us at [email protected] or at +1-408-352-2808 to request accommodation.

Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees.  KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA’s Careers website for legitimate job postings.  KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers.  If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to [email protected] to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

Skills Required

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • Strong experience developing HPC or systems software on Linux
  • Proficiency in Java and/or C++ and/or other system-level or performance-oriented languages
  • Hands-on experience with parallel computing (MPI, OpenMP, multithreading)
  • Solid understanding of HPC hardware fundamentals
  • Practical experience working with clusters, servers, or rack-scale systems in lab or production environments
  • Strong debugging skills across software, OS, and hardware boundaries

KLA Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about KLA and has not been reviewed or approved by KLA.

  • Retirement Support Retirement offerings include a 401(k) plan with company matching and financial planning support. Student debt assistance and related financial benefits reinforce long-term savings and security.
  • Equity Value & Accessibility Ownership programs include an Employee Stock Purchase Plan and broad-based RSU participation that extend equity beyond a narrow group. These elements complement competitive pay and bonuses to strengthen total rewards.
  • Leave & Time Off Breadth Time-off programs span paid time off, paid company holidays, and paid volunteer time. Family care and bonding leave and back-up care services add flexibility during life events.

KLA Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Milipitas, CA
10,001 Employees

What We Do

KLA develops industry-leading equipment and services that enable innovation throughout the electronics industry. We provide advanced process control and process-enabling solutions for manufacturing wafers and reticles. In close collaboration with leading customers across the globe, our expert teams of physicists, engineers, data scientists and problem-solvers design solutions that move the world forward.

Similar Jobs

ChowNow Logo ChowNow

Back-end Engineer

Food • Software
Easy Apply
Remote or Hybrid
USA
208 Employees
170K-221K Annually

King's Hawaiian Logo King's Hawaiian

Account Manager

Food • Retail • Sales • Manufacturing
Remote or Hybrid
United States
1411 Employees
85K-115K Annually

PwC Logo PwC

Anthropic Alliance Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
37 Locations
370000 Employees
212K-244K Annually

PwC Logo PwC

Identity and Access Management (AI Focus) Experienced Associate

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
21 Locations
370000 Employees
63K-140K Annually

Similar Companies Hiring

Blissway Thumbnail
Computer Vision • Fintech • Hardware • Internet of Things • Machine Learning • Software • Transportation
Denver, Colorado
24 Employees
Turion Space Thumbnail
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Irvine, CA
150 Employees
Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account