At Graphcore, we’re building the future of AI compute.We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Job Summary
Responsible for system-level reliability of AI servers with liquid cooling and HVDC architectures, owning reliability validation, shock & vibration robustness, and failure analysis from board to rack level to ensure safe transport, deployment, and long-term datacenter operation.
Key Responsibilities and skills
- Plan and execute reliability validation across board, server, and rack levels.
- Define and run environmental, accelerated, and mechanical tests, including thermal/power cycling, humidity, corrosion, shock & vibration, and HALT/HASS.
- Lead shock & vibration validation for transportation, handling, seismic, and operational conditions.
- Assess reliability risks for liquid cooling systems (leakage, fatigue, pump life, corrosion, coolant stability).
- Evaluate HVDC mechanical and electrical robustness (busbars, connectors, power interfaces).
- Perform reliability prediction and life data analysis (Weibull, MTBF).
- Lead cross-functional design reviews and drive risk mitigation.
- Conduct failure analysis and RCA using standard FA methodologies.
- Define and maintain reliability and S&V test specifications (JEDEC, Telcordia GR-63, JESD22, MIL-STD-810, ISTA, ASHRAE, UL, IEC).
- Implement On-going Reliability Test (ORT) for production quality.
- Document results and support customer audits and certifications.
Qualifications
- Bachelor’s or Master’s degree in Mechanical, Electrical, Reliability, Materials, or related Engineering.
- 10+ years of reliability engineering experience in AI servers, datacenter systems, HPC, or complex electronics.
- Hands-on experience with environmental, shock, and vibration testing.
- Strong knowledge of reliability methodologies and statistical analysis.
- Practical experience with liquid cooling and HVDC systems.
- Proven failure analysis and RCA capability.
- Strong communication skills in English; Mandarin a plus.
Preferred Experience
- AI server architecture and large-scale liquid cooling systems.
- FEA/modal analysis and test correlation.
- Datacenter, telecom, and transportation standards knowledge.
- Reliability certification (e.g., ASQ CRE).
Benefits
In addition to a competitive salary, Graphcore offers a competitive benefits package. We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Skills Required
- Bachelor's or Master's degree in Mechanical, Electrical, Reliability, Materials, or related Engineering
- 10+ years of reliability engineering experience in AI servers or datacenter systems
- Hands-on experience with environmental, shock, and vibration testing
- Strong knowledge of reliability methodologies and statistical analysis
- Practical experience with liquid cooling and HVDC systems
- Proven failure analysis and RCA capability
- Strong communication skills in English
Graphcore Compensation & Benefits Highlights
-
Healthcare Strength — Health coverage includes medical and dental insurance, with US plans through Cigna and Kaiser, HDHP options with employer‑funded HSA contributions, a health cash plan, EAP access, and dedicated mental‑health support. These provisions extend to family options in some regions, reinforcing broad medical and wellbeing support.
-
Retirement Support — Retirement programs include a UK pension match up to 5% and a US 401(k) with a 100% company match up to 6% (with a true‑up). This pairing signals strong, predictable long‑term savings support across key locations.
-
Leave & Time Off Breadth — Time‑off policies feature “unlimited” holiday in the UK and flexible, generous PTO with paid US holidays. Paid family leave for birthing parents and bonding further broadens time‑away support.
Graphcore Insights
What We Do
At Graphcore, we’re building the future of AI compute. We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Why Work With Us
Our team is at the forefront of the machine intelligence revolution, enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.
Gallery
Graphcore Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
At Graphcore, we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week, with trusted flexibility built on trust and transparency for everyone.





