Graphcore Jobs

Server CPU Systems Engineer

Graphcore

Server CPU Systems Engineer

Posted 23 Days Ago

Be an Early Applicant

Austin, TX, USA

Hybrid

Expert/Leader

Artificial Intelligence • Semiconductor

Joining Graphcore gives you a seat at the top-table, shaping the future of Artificial Intelligence.

The Role

Lead server and blade rack bring-up, install and configure servers, manage inventory via DCIM, run post-silicon validation and debug for CPU/GPU/HBM/IO, develop lab validation tools and scripts, coordinate data center projects and vendors, and drive technical improvements in system validation.

Summary Generated by Built In

About us

We are looking for a disciplined and dynamic Systems Engineer with focus on server CPU based system to join our growing compute rack validation team. Candidate we are seeking should have demonstrated work-experience in leading server rack and blade hardware systems deployment, hardware installation, and inventory management activities in the Austin, TX area. As a diligent leader in Systems Engineering, you will drive multiple aspects of post-silicon validation throughout the life cycle of the program. In this high visibility position, you will be part of a technical team chartered to innovate and improve system bring-up and enablement capabilities, as well as silicon and system validation to deliver the highest quality, industry leading technologies to market. Your technical leadership skills, systems engineering and hardware bring-up, validation and debug expertise will be necessary towards product development, definition, root cause and resolution. Your agility and collaborative approach will be essential to work within System Validation & other engineering teams (System Architects, SoC and Rack FW etc).

The ideal candidate will be driving key areas around at-scale system validation including ARM based server and rack level systems bring-up (nodes and rack level systems). Candidate will be immersed in challenging system enablement work, ramp-up post-silicon capabilities in engineering lab environments, validation tests execution/triage. The candidate will be leading contributor towards state-of-the-art HW bring-up and lab capabilities for Grapchore’s system engineering. The candidate should be able to work in a global environment while maintaining a synergetic culture.

Primary Responsibilities:

Install, configure, commission (and decommission if needed) blade servers, chassis, switches, and supporting infrastructure.
Lead rack and stack activities, including mounting equipment, cable management, and labelling. Execute hardware upgrades, replacements, and troubleshooting of server and network components.
Maintain accurate asset records within DCIM platforms and inventory management systems.
Conduct physical audits and reconcile inventory discrepancies.
Track hardware movements, deployments, and decommissions through established change management processes.
Document installation procedures, rack layouts, cabling diagrams, and inventory updates.
Support data center migration, expansion, and refresh projects.
Collaborate with engineering, operations, logistics, and project management teams.
Adhere to all data center safety, security, and operational standards.
Develop, setup and scale key methodologies for at-scale test execution, lab HW and system SW capabilities as well as system visibilities and debug tools necessary for successful system (HW/SW/FW) bring-up and system validation at blade and rack level for AI compute rack.
Ability to work independently in a production ready environment, and a commitment tomaintaining accurate inventory and asset records.
Triage issues found during server rack validation bring-up, Post-Silicon Validation, and production phases of the program. Ensure issues are solved on time with quality.
Lead test execution of key domains within AI compute solutions like CPU, GPU, memory, HBM, IO etc.
Drive technical innovation to improve capabilities across system validation, including tools, script development, technical and procedural methodology enhancement, and various internal and cross-functional technical initiatives.

Qualifications:

Strong analytical/problem-solving skills and pronounced attention to details
Experience in Blade server installation and maintenance (Cisco UCS, HPE Synergy, Dell MX, or similar).
Rack and stack deployments in enterprise or hyperscale environments.
Copper and fiber cabling installation and management.
DCIM and asset management platforms.
Strong understanding of server, storage, and networking hardware.
Experience performing inventory audits and maintaining asset accuracy.
Ability to read rack elevation diagrams, cabling schematics, and deployment documentation.
Familiarity with ticketing and change management systems.
Exposure to Linux (ubuntu) OS bootable images and system firmware basics for image building, provisioning and firmware flashing.
Exposure to automation testing, to enable execution of hardware acceptance tests, best-known-config testing etc.
Exposure to python script development and execution.
Proven experience in understanding, defining and enabling storage (storage rack), networking capabilities (network rack, DNS, DHCP etc) in a lab environment to help add end-to-end validation and debug capabilities for rack and blade validation.
Excellent communication and coordination skills.
Detailed oriented, highly organized, able to prioritize, and juggle multiple work streams to tight deadlines.
Technical leadership: capable of championing new tools, methods, and capabilities to drive platform validation improvements in schedule, quality, or coverage.
Experience working with data center technical staff, 3rd party vendors, ODMs etc throughout the life cycle of server system product development.
Must be a self-starter, and able to independently drive tasks to completion

Preferred Qualifications:

Masters or PhD in Electrical Engineering, Computer Engineering or a related field.
10+ years of work experience demonstrating working on complex systems engineering challenges to validate and debug HW-FW-SW challenges in a server compute rack or data center blade environment.
Experience designing and deploying modern AI/ML rack scale systems
Knowledge of industry standards and best practices for hardware development
Familiarity with emerging technologies in AI and Data Center infrastructure.
Comfortable meeting, engaging and collaborating with ODM partners and staffing vendors across the globe.

USA Benefits
In addition to a competitive salary, Graphcore offers flexible working and a comprehensive benefits package designed to support your health, wellbeing and financial future. Our benefits include medical, dental and vision coverage, Flexible Spending Accounts (FSAs), Health Savings Accounts (HSAs), disability and life insurance, a 401(k) retirement plan, commuter benefits, wellness services and an Employee Assistance Programme (EAP). We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.

Skills Required

Strong analytical/problem-solving skills and attention to detail
Experience in blade server installation and maintenance (Cisco UCS, HPE Synergy, Dell MX or similar)
Rack and stack deployments in enterprise or hyperscale environments
Copper and fiber cabling installation and management
DCIM and asset management platforms
Strong understanding of server, storage, and networking hardware
Experience performing inventory audits and maintaining asset accuracy
Ability to read rack elevation diagrams, cabling schematics, and deployment documentation
Familiarity with ticketing and change management systems
Exposure to Linux (Ubuntu) OS bootable images and system firmware basics for image building, provisioning and firmware flashing
Exposure to automation testing for hardware acceptance and best-known-config testing
Exposure to Python script development and execution
Experience defining and enabling storage and networking capabilities (DNS, DHCP) in a lab environment
Excellent communication and coordination skills
Highly organized, detail oriented, able to prioritize multiple work streams to tight deadlines
Technical leadership capable of championing new tools, methods, and capabilities
Experience working with data center technical staff, 3rd party vendors, and ODMs throughout product lifecycle
Self-starter able to independently drive tasks to completion
Masters or PhD in Electrical Engineering, Computer Engineering or related field
10+ years of work experience on complex systems engineering challenges for server compute racks or data center blade environments
Experience designing and deploying modern AI/ML rack scale systems
Knowledge of industry standards and best practices for hardware development
Familiarity with emerging technologies in AI and Data Center infrastructure
Comfortable engaging and collaborating with ODM partners and staffing vendors globally

What the Team is Saying

Graphcore Compensation & Benefits Highlights

Healthcare Strength — Healthcare is presented with day-one U.S. medical coverage, plus dental/vision, mental health support, and concierge services. Wellbeing extras such as an EAP and pet insurance broaden the health-focused package.
Retirement Support — Retirement programs include a company-matched 401(k) in the U.S. and matched pensions in the U.K. Employer-provided life insurance and disability/income protection further reinforce long-term financial security.
Leave & Time Off Breadth — Time off is framed as flexible or unlimited in multiple locations, with paid holidays and generous parental leave highlighted. Flexible hours are emphasized to accommodate personal needs alongside a hybrid model.

Learn more about Graphcore's Compensation & Benefits →

Graphcore Insights

What's It Like to Work at Graphcore? Graphcore Culture & Values Graphcore Career Growth & Development What's the Work-Life Balance Like at Graphcore? Graphcore Leadership & Management Graphcore Company Growth, Stability & Outlook

View all jobs at Graphcore

View Graphcore Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Bristol

762 Employees

Year Founded: 2016

What We Do

At Graphcore, we’re building the future of AI compute. We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.

Why Work With Us

Our team is at the forefront of the machine intelligence revolution, enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.

Gallery

Graphcore Offices

Learn More

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

At Graphcore, we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week, with trusted flexibility built on trust and transparency for everyone.

Typical time on-site: 3 days a week

HQHeadquarters

Austin Office

Bengaluru Office

Cambridge Office

Gdańsk Office

Hsinchu Office

London Office

Learn more