Software Engineer - Network (C++)

Posted 2 Days Ago
Be an Early Applicant
2 Locations
In-Office
180K-440K Annually
Junior
Information Technology
The Role
Design, implement, and operate core networking software for a large-scale GPU datacenter fabric. Develop routing and traffic-engineering algorithms, real-time switch software, prototypes and experiments, deployment and CI tooling, monitoring, and testing to maximize performance and reliability of AI training infrastructure.
Summary Generated by Built In
ABOUT xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

ABOUT THE ROLE:

At xAI, we design, build, and operate Colossus from the ground up. This includes the massive GPU clusters, high-speed interconnect fabric, and the software that makes it all work at unprecedented scale. Colossus powers Grok and our frontier AI models with a custom, high-performance datacenter network that delivers ultra-low latency and massive bandwidth across hundreds of thousands of GPUs.

As a Software Engineer on the Colossus Networking team, you will develop the core networking software that maximizes the performance and reliability of our datacenter fabric. Your work will directly impact training efficiency, model convergence, and the speed at which we can push the frontier of AI.

Our engineers own the full lifecycle of their software — from design and implementation to deployment, monitoring, and iteration based on real-world performance at scale. You will solve hard problems in distributed systems, high-performance networking, and real-time control of one of the largest AI supercomputers on Earth.

RESPONSIBILITIES:
  • Develop routing and traffic-engineering algorithms for the Colossus high-performance datacenter network.
  • Develop highly reliable, real-time software designed to run on the switches that form the backbone of our low-latency, high-bandwidth AI training fabric.
  • Participate in and lead architecture, design, and code reviews.
  • Develop prototypes and run experiments to validate key design decisions at both small and full-cluster scale.
  • Build tools for software development, deployment, data analysis, visualization, and testing across virtualized environments, hardware-in-the-loop setups, and live production clusters.
  • Deploy reliable software updates through continuous integration and release systems with rigorous testing and monitoring.
BASIC QUALIFICATIONS:
  • Bachelor’s degree in computer science, engineering, math, or a related technical discipline; OR 2+ years of professional software development experience in lieu of a degree.
  • Strong development experience in C or C++.
PREFERRED SKILLS AND EXPERIENCE:
  • Strong professional experience writing high-performance C/C++ in production environments.
  • Experience developing, debugging, and deploying software that runs at scale in real-world systems.
  • Deep knowledge of networking protocols (UDP, TCP/IP, RDMA, etc.), distributed systems, and large-scale datacenter fabrics.
  • Background in real-time systems, high-performance computing, low-latency networking, or resource-constrained environments.
  • Creative problem-solving ability with exceptional analytical skills and strong engineering fundamentals.
  • Excellent written and verbal communication skills.
  • Ability to thrive in a fast-paced, dynamic environment with evolving requirements.
  • Experience with security considerations in large-scale distributed systems.
ADDITIONAL REQUIREMENTS:
  • Must be willing to work extended hours and weekends as needed.
COMPENSATION AND BENEFITS

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Skills Required

  • Bachelor's degree in computer science, engineering, math, or related technical discipline; OR 2+ years of professional software development experience in lieu of a degree.
  • Strong development experience in C or C++.
  • Willingness to work extended hours and weekends as needed.
  • Strong professional experience writing high-performance C/C++ in production environments.
  • Experience developing, debugging, and deploying software that runs at scale in real-world systems.
  • Deep knowledge of networking protocols (UDP, TCP/IP, RDMA), distributed systems, and large-scale datacenter fabrics.
  • Background in real-time systems, high-performance computing, low-latency networking, or resource-constrained environments.
  • Experience with security considerations in large-scale distributed systems.
  • Excellent written and verbal communication skills.
  • Ability to thrive in a fast-paced, dynamic environment with evolving requirements.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
96 Employees

What We Do

Understand the Universe

Similar Jobs

Chewy Logo Chewy

Senior Program Manager

eCommerce • Healthtech • Pet • Retail • Pharmaceutical
Hybrid
Bellevue, WA, USA
17800 Employees
130K-207K Annually

Optum Logo Optum

Medical Assistant, Primary Care Totem Lake

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Kirkland, WA, USA
160000 Employees
22-30 Hourly

Wells Fargo Logo Wells Fargo

Senior Premier Banker Sequim

Fintech • Financial Services
Hybrid
Sequim, WA, USA
205000 Employees
34K-60K Hourly

Wells Fargo Logo Wells Fargo

Branch Manager Cascadia District

Fintech • Financial Services
Hybrid
Everett, WA, USA
205000 Employees
38K-67K Hourly

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account