High-Performance Networking Engineer - Supercomputing

Reposted 12 Hours Ago
Easy Apply
Be an Early Applicant
2 Locations
In-Office
180K-440K Annually
Mid level
Information Technology
The Role
Design and optimize low-latency, high-bandwidth networking solutions for supercomputing clusters using NVIDIA technologies. Collaborate with researchers and troubleshoot performance issues.
Summary Generated by Built In
About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

High-Performance Networking Engineer on xAI’s Supercomputing team, you will design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.

Focus
  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.
Ideal Experience
  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
  • Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).
Tech Stack
  • NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
  • RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
  • Kubernetes
  • Rust and C/C++
  • MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)
Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Top Skills

C
C++
Gpudirect Rdma
Infiniband
Kubernetes
Mellanox Nics
Mpi
Nccl
Nvidia Gpus
Rdma
Roce
Rust
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
96 Employees

What We Do

Understand the Universe

Similar Jobs

Boeing Logo Boeing

Linux System Administrator

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
In-Office
Seal Beach, CA, USA
141000 Employees
77K-131K Annually

Boeing Logo Boeing

Senior SAP Security Analyst

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
In-Office
11 Locations
141000 Employees
129K-187K Annually

Boeing Logo Boeing

Sapphire SEIT Project Management Specialist

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
In-Office
18 Locations
141000 Employees
141K-205K Annually

Boeing Logo Boeing

ASIC/FPGA Verification Engineer - (Associate, Experienced, or Lead) - SoCal

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
In-Office
2 Locations
141000 Employees
86K-141K Annually

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
17 Employees
Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
LayerOne Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account