Lead HPC Hardware Engineer

Reposted 17 Days Ago
Be an Early Applicant
Dallas, TX
In-Office
Senior level
Big Data • Fintech • Information Technology • Machine Learning • Financial Services
The Role
As Lead HPC Hardware Engineer, manage and optimize a large compute infrastructure of GPUs and CPUs, collaborating with various engineering teams.
Summary Generated by Built In

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas.

We are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cutting-edge environment.

The role

As G-Research’s Lead HPC Hardware Engineer, you will play a critical role in managing, scaling and optimizing a large compute infrastructure, which is composed of numerous GPUs and CPU nodes.

In this role, you will work closely with Infrastructure Engineers, Data Centre Operations, AI Engineers, Security Experts and Software Engineers to deliver a robust compute platform that supports high-performance computing needs.

Your expertise will be pivotal in ensuring that our compute infrastructure operates efficiently, while also planning for its growth and maintenance.

Our approach is centred on automation, hardware optimisation and infrastructure best practices. You will help drive improvements, mentor junior engineers and ensure our infrastructure is both secure and scalable.

Key responsibilities of the role include:

  • Designing, configuring, and manage a high-performance compute infrastructure
  • Growing and optimizing our infrastructure to meet business demands
  • Ensuring the efficient operation of the OpenStack-powered environment, with a primary focus on OpenStack Ironic
  • Monitoring hardware performance, identifying areas for improvement and implementing solutions
  • Developing and maintaining hardware management procedures to increase server uptime and minimise failures
  • Performing diagnostics, tuning and capacity planning to ensure smooth scale-out
  • Performing analysis of existing hardware lifecycle processes and providing recommendations for improvement and optimization
  • Collaborating with various teams to integrate hardware improvements aligned to organizational goals
  • Implementing best practices for security hardening of the platform and associated systems
  • Mentoring junior engineers and fostering a culture of continuous learning and improvement

Who are we looking for?

The ideal candidate will have the following skills and experience:

  • Demonstrable experience managing large-scale HPC infrastructure
  • Strong understanding of server hardware architecture, including processors, memory, storage, networking and power systems
  • Deep understanding of bare-metal provisioning and infrastructure automation
  • Proven ability to troubleshoot hardware issues, including diagnostics and repairs for both GPU and CPU nodes in production environments
  • Experience with hardware monitoring, management tools and familiarity with hardware automation techniques and tools, such as Ansible, Puppet and Chef
  • Knowledge of Redfish API, including iDRAC, iLO, BMC, IPMI
  • Experience with hardware diagnostics, optimization, performance tuning and capacity planning
  • Familiarity with thermal management and optimizing data centre layout for efficiency
  • Knowledge of security best practices for hardware infrastructure
  • Strong problem-solving skills with the ability to work under pressure in a fast-paced environment
  • Excellent communication skills and the ability to work collaboratively with cross-functional teams

The following would be beneficial:

  • Experience with large compute farms or hyperscale data centres
  • Familiarity with high-performance networking, such as InfiniBand, Ethernet
  • Knowledge of server configuration management and software deployment in HPC environments
  • Understanding of Linux-based environments and proficiency in scripting languages such as Python, Bash or PowerShell for automation
  • Experience with OpenStack or similar cloud platforms
  • Experience with NVIDIA-SMI and debugging GPU-related issues
  • Leadership experience including team management, mentoring and developing engineers

Why should you apply?

  • Market-leading compensation plus annual discretionary bonus
  • Lunch provided in the office (via GrubHub)
  • Informal dress code and excellent work/life balance
  • Excellent paid time off allowance of 25 days
  • Sick days, military leave, and family and medical leave
  • Generous 401(k) plan
  • 16-weeks’ fully paid parental leave
  • Medical and Prescription, Dental, and Vision insurance
  • Life and Accidental Death & Dismemberment (AD&D) insurance
  • Employee Assistance and Wellness programs
  • Generous relocation allowance and support
  • Great selection of office snacks, and hot and cold drinks
  • On-site gym and car parking

G-Research is committed to cultivating and preserving an inclusive work environment. We are an ideas-driven business and we place great value on diversity of experience and opinions.

We want to ensure that applicants receive a recruitment experience that enables them to perform at their best. If you have a disability or special need that requires accommodation please let us know in the relevant section

Top Skills

Ansible
Bash
Bmc
Chef
Idrac
Ilo
Ipmi
Ironic
Linux
Openstack
Powershell
Puppet
Python
Redfish Api
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Dallas, TX
1,039 Employees
Year Founded: 2001

What We Do

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas.

We hire the brightest minds in the world to tackle some of the biggest questions in finance. We pair this expertise with machine learning, big data, and some of the most advanced technology available to predict movements in financial markets.

We take pride in our dynamic, flexible and highly stimulating culture where world-beating ideas are prized and rewarded. We employ some of the best people in their field and are keen to nurture their talent in a supportive working environment.

Similar Jobs

Hybrid
2 Locations
213000 Employees

Wells Fargo Logo Wells Fargo

Equipment Operator

Fintech • Financial Services
Hybrid
Irving, TX, USA
213000 Employees

Wells Fargo Logo Wells Fargo

Teller 25 Hours Olympia Pkwy

Fintech • Financial Services
Hybrid
Selma, TX, USA
213000 Employees
Hybrid
Houston, TX, USA
213000 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Rain Thumbnail
Web3 • Payments • Infrastructure as a Service (IaaS) • Fintech • Financial Services • Cryptocurrency • Blockchain
New York, NY
40 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account