Elon Musk has officially gone all in on artificial intelligence. Musk and his AI startup xAI have been turning heads with their rapid progress, even earning praise from Nvidia CEO Jensen Huang, to which Musk replied in a post on X, “I resisted AI for too long … Now it is game on.” At the center of xAI’s strategy to achieve AI supremacy is Colossus — xAI’s supercomputer designed specifically to meet the demands of AI workloads.
What Is Colossus?
Colossus is currently the world’s largest supercomputer and is made to handle the computational demands of AI tasks. It was built by xAI for the explicit purpose of training and improving its Grok language models.
Colossus holds the key to faster AI development and higher-quality outputs, making its completion all the more imperative to xAI amid increasing competition. At the same time, the supercomputer’s energy needs and accelerated construction have raised questions around what exactly the public cost would be for Musk to fulfill his AI ambitions.
What Is xAI’s Supercomputer?
Built by xAI, Colossus is currently the world’s largest supercomputer, located in an industrial park in Tennessee’s South Memphis neighborhood. According to the company, Colossus was constructed in just 122 days, shattering its 24-month timeline. It only took another 92 days for xAI to double Colossus’ number of graphics processing units (GPUs) from 100,000 to 200,000, with the goal of eventually reaching 1 million GPUs. Because of its GPU network-based architecture, Colossus is more accurately described as an AI supercluster.
Instead of using fans, Colossus relies on liquid cooling rack technology, specifically Supermicro’s 4U Liquid-cooled Systems. These systems support various tools and services like Nvidia’s Spectrum-X Ethernet networking platform, which helped Colossus initially reach 100,000 GPUs. Meanwhile, Tesla Megapacks — lithium-ion batteries built for energy storage — manage power demands and outages, keeping the supercomputer stable.
With these components working in tandem, Colossus aims to meet the computational needs of AI-related tasks.
A Quick Overview of xAI
xAI is an AI startup founded by Musk in 2023 to directly challenge OpenAI and what Musk perceived as “woke” political bias among chatbots like ChatGPT. In response, xAI created a chatbot called Grok, which forgoes political correctness in favor of more colorful answers. A co-founder of OpenAI, Musk also criticized OpenAI’s leadership for attempting to shift the company from its nonprofit roots toward a for-profit structure, further fanning the flames beneath his rivalry with OpenAI and its CEO Sam Altman.
What Does Colossus Do?
Colossus is intended to handle AI workloads, with its infrastructure enabling it to calculate computations in parallel and process the massive volumes of data involved in training AI models. It can make training AI models faster and more efficient while leading to improvements in performance and accuracy.
As a result, xAI wants to use Colossus to train, test and refine its Grok family of large language models. But Musk’s other companies stand to benefit from Colossus’ capabilities as well, with the supercomputer potentially being used to enhance AI-based products like Tesla’s autonomous vehicles and Optimus robot.
Colossus vs. Other Supercomputers
Colossus is currently the largest supercomputer in the world at 200,000 GPUs, and xAI continues to invest in the supercomputer as it competes with computing giants like Oracle. For reference, Oracle’s zettascale AI supercomputer has 131,072 Nvidia GPUs, Lawrence Livermore National Lab’s El Capitan has 44,544 GPUs and Oak Ridge National Laboratory’s Frontier supercomputer has 37,632 GPUs.
However, the title for fastest supercomputer goes to El Capitan, which can reach a peak performance of more than 2.79 exaflops per second. An exaflop is a unit for measuring how fast a computer completes a computation, calculating at least 1 quintillion floating-point operations per second. Under its current setup, Colossus could theoretically reach 497.9 exaflops per second — but this hasn’t been proven yet.
Plans to expand Colossus will become crucial as more competitors enter the field, especially with advances in quantum computing. Google’s Sycamore processor can already surpass the fastest supercomputers in performance, and the Willow quantum chip promises even better performance with improved error correction. Microsoft has also developed its own quantum processor known as Majorana 1, and it’s working with OpenAI on a data center project that would house an AI supercomputer called Stargate.
Concerns Over Colossus
The computational capabilities of Colossus come at a price that includes environmental harm and national security concerns.
Environmental Impact
Musk’s plans to expand the Colossus site in Memphis have raised serious questions around power usage and its implications for the city. While consuming energy contributes to air pollution in a neighborhood already plagued by low air quality, it also places a greater burden on Memphis’ electrical and water systems. In a worst-case scenario, many residents won’t be able to keep the lights on and will have to deal with a rapidly deteriorating environment as Colossus taxes the grid and leaves tainted air in its wake.
Ethical Issues
Another problem is related to the AI tools Colossus will be supporting, particularly Grok. The chatbot has been known to deliver controversial responses to questions, even demonstrating a preference for Nazi beliefs. As Colossus boosts the performance of Grok, it’s unclear what kind of outputs the chatbot will produce. These concerns become more urgent in light of Tesla’s plans to incorporate Grok into the Optimus robot, initiating more human-machine interactions.
National Security Risks
On a broader level, supercomputers could become weaponized as part of the AI race between the U.S. and China. The House Select Committee wrote a letter to Secretary of Commerce Howard Lutnick expressing concern over foreign entities — especially the Chinese government — accessing university supercomputers to outpace the U.S. in the AI competition. With its focus on AI workloads, Colossus could very well be a target for those looking to more quickly train and deploy advanced AI technologies.
Expansion Plans for Colossus
Any arguments against Colossus haven’t slowed down the project. In fact, Musk is ramping up expansion efforts. An additional 168 Tesla Megapacks have arrived in Memphis to lay the foundation for sustaining the growing supercomputer. To further accelerate the project, Musk is also having an overseas powerplant shipped to Memphis, with the belief that it would take much longer to construct a powerplant onsite.
It remains to be seen what kind of impact this accelerated build-up will have on the local ecosystem and surrounding communities. But one thing’s for sure: The shadow of Colossus will loom large over the city of Memphis, the direction of Musk’s tech empire and the future of AI as a whole for years to come.
Frequently Asked Questions
What is the xAI supercomputer being used for?
xAI plans on using its Colossus supercomputer to train and refine the performance of its Grok large language models. In addition, Colossus could also improve products at Elon Musk’s other companies, including Tesla’s autonomous vehicles and Optimus robot.
How much does xAI’s Colossus cost?
The first phase of Colossus alone cost xAI as much as $4 billion, and the supercomputer will rack up an even higher price tag as xAI begins phase 2 in earnest.
Does Elon own xAI?
Yes, Elon Musk is the primary owner of xAI. He founded xAI in 2023 to compete with OpenAI and push back against what he saw as “woke” tendencies among popular chatbots.
How powerful is Colossus compared to other AI systems?
Colossus is one of the most powerful supercomputers in the world and is the largest by far, with 200,000 GPUs. But it’s still not the fastest supercomputer — that title belongs to El Capitan, which can achieve a peak performance of more than 2.79 exaflops per second. While Colossus could theoretically achieve 497.9 exaflops per second under its current setup, this hasn’t been proven. For reference, an exaflop is a unit that measures how fast a computer performs a computation, calculating at least 1 quintillion floating-point operations per second.