With Grok 4, Elon Musk Promises a Smarter, More Capable AI. Does it Deliver?

Grok 4 is one of xAI’s most intelligent models yet, touting real-time search, agentic capabilities and unmatched performance. A more powerful, Grok 4 Heavy version is out too, available only through a $300/month SuperGrok Heavy subscription.

Written by Ellen Glover
Grok on a smartphone in front of a screen with the xAI logo
Image: Shutterstock
UPDATED BY
Brennan Whitfield | Sep 24, 2025
REVIEWED BY
Ellen Glover | Sep 24, 2025
Summary: Elon Musk’s xAI has launched Grok 4, one of its most powerful models yet, touting real-time search, tool use and top-tier performance. But its overall intelligence — and apparent political slant — are continuing to raise questions.

Just four months after releasing Grok 3, Elon Musk’s artificial intelligence startup xAI has debuted a successor: Grok 4, one of its most advanced AI models yet. Positioned as a major step forward in both reasoning and problem-solving, Grok 4 introduces features like native tool use, real-time search and scaled reinforcement learning — capabilities that xAI claims put it on par with (or even ahead of) models developed by competitors like OpenAI, Google and Anthropic.

What Is Grok 4?

Grok 4 is an AI model developed by Elon Musk’s startup xAI. Using advanced reinforcement learning and real-time search capabilities, the model is designed to tackle complex topics with up-to-date information from the web and social media platform X, aiming to generate accurate, contextually relevant responses across a wide range of subjects. 

Looking ahead, the company says it will continue to scale Grok 4’s reinforcement learning and expand its multimodal capabilities, integrating vision and audio to enable more “intuitive interactions.” xAI also plans to push beyond narrow, verifiable domains and into more dynamic real-world applications that the model can learn from and adapt to in real-time. Whether Grok 4 will live up to these ambitious goals remains to be seen, but its rapid release underscores xAI’s growing presence in the generative AI landscape and indicates that the race to build the next best model is far from over.

Grok 4 is available through an API on xAI and serves as the engine behind the Grok chatbot for Premium+ and SuperGrok subscribers. A more powerful version of the model called Grok 4 Heavy is also being offered under a “SuperGrok Heavy” subscription tier, priced at $300 a month.

Related ReadingGrok: What We Know About Elon Musk’s AI Chatbot

 

What Is Grok 4?

Grok 4 is a foundation model developed by xAI. Released in July 2025, the model builds on the work of its predecessors, Grok 3 and Grok 3 Reasoning, which focused on next-token prediction and reasoning — with “reasoning” being the ability to break down problems into steps and refine its outputs before providing a final answer. With Grok 4, xAI significantly scaled up its use of reinforcement learning, using its own internal 200,000-GPU supercomputer called Colossus to train the model on a larger and more diverse data set with greater efficiency.

Another major improvement is Grok 4’s ability to use external tools like search engines and code interpreters. As it tackles a complex programming task or searches for up-to-date information on a given subject, the model can create its own search queries and pull live information from the web to inform its responses. 

Grok 4 is particularly integrated with X, a social media platform now owned by xAI. It can find information from “deep within” the social media site, according to xAI, using advanced keyword and semantic filters to identify specific posts. It can also analyze media like images and video to improve the relevance and accuracy of its answers.

Related ReadingEverything You Need to Know About xAI, the Company Behind Grok

 

Grok 4 Features

Large-Scale Reinforcement Learning

For the most part, language models have been trained using next-token prediction, meaning they learn to guess the next word or phrase in a sequence based on the context that came before it. While this approach helps LLMs generate fluent and coherent sentences, it doesn’t always allow them to fully understand a subject or apply what they know across broader applications. Grok 4 attempts to address this by incorporating large-scale reinforcement learning into its training process, allowing it to “think” through problems and refine its answers rather than simply predicting the next most likely word. 

Native Tool Use

Grok 4 can autonomously decide when and how to use various external tools to help inform its answers. It can browse the web, sift through X posts and analyze images and videos to respond to questions that other models might struggle with since they require up-to-date or niche information. What’s more, this feature is built right into the model through its training process rather than being bolted on as a post-processing step.

Voice Mode

Grok 4 accepts audio and visual inputs in addition to text. With the model’s new “Voice Mode,” users can carry on a natural, spoken conversation with Grok like they would a human — and even use their phone camera as a way for the Grok to “see” its surroundings and analyze them in real time. xAI also unveiled a new voice for Grok 4 that can whisper, laugh and even sing.

Grok 4 Heavy

Grok 4 Heavy is a more advanced version of the Grok 4 model, and it is only available through xAI’s new SuperGrok Heavy subscription tier. Grok 4 Heavy is designed to “consider multiple hypotheses” in parallel to complete complex reasoning tasks, according to xAI — basically creating a sort of “study group” of AI agents, as Musk put it during a livestream on X.

“All those agents do work independently, and then they compare their work,” Musk said. “Often only one of the agents actually figures out the trick or figures out the solution. But once they share the trick or figure out what the real nature of the problem is, they share that solution with the other agents. And then they essentially compare notes and yield an answer.”

This kind of multi-agent coordination is intended to enable Grok 4 Heavy to perform better on more advanced, open-ended tasks, especially in situations where a single line of reasoning might otherwise miss a subtle point or pattern. In fact, xAI says the model is the first ever to score a 50 percent on Humanity’s Last Exam, a benchmark designed to gauge how close a given model is to achieving expert-level reasoning capabilities, particularly in fields where human expertise is typically required.

API

Available to Premium+ and SuperGrok subscribers, the Grok 4 API offers:

  • A 256,000-token context window for handling long documents and extended reasoning. 
  • Real-time access to data via xAI’s new live search API, which pulls information from X, the web and various unnamed news sources.
  • Multimodal input support — specifically vision, voice and text
  • Enterprise-grade security and compliance with GDPR, CCPA and SOC 2 Type II certifications.

All told, these features may make Grok 4 a solid option for developers looking to build applications that require long-context understanding, access to up-to-date information and robust privacy standards.

Related ReadingTesla, SpaceX, Neuralink and More: A Guide to Elon Musk’s Tech Empire

 

How to Access Grok 4

As of August 2025, Grok 4 is free for a limited time for all Grok and X users, and can be accessed through the Grok chatbot or on the social media platform X by using the Grok tab or by tagging the Grok X account. Traditionally, only X Premium+ and SuperGrok subscribers can access Grok 4 through these options. As for developers, they can work with Grok 4 directly through the xAI API. Those with a SuperGrok Heavy subscription can also access Grok 4 Heavy, a more powerful version of the model.

 

How Does Grok 4 Compare to Other AI Models?

xAI claims Grok 4 is the “most intelligent model in the world,” citing its performance on a handful of academic, reasoning and problem-solving benchmarks. And the numbers shared by the company appear to back this statement:

  • Grok 4 scored 15.9 percent on ARC-AGI, a test that evaluates abstract reasoning and pattern recognition. This was nearly double what the next-best model achieved.
  • On competitive coding and math benchmarks (LiveCodeBench, AIME’25 and HMMT), both Grok 4 and Grok 4 Heavy beat out most competitors.
  • Both Grok 4 and Grok 4 Heavy scored the highest on the GPQA (Graduate-Level Google-Proof Q&A) benchmark, which evaluates a model’s question-answering capabilities, with a particular focus on scientific reasoning and knowledge.
  • On the USAMO 2025 benchmark, which evaluates mathematical capabilities using high school math Olympiad problems, Grok 4 Heavy led the pack with a score of 61.9 percent — well ahead of the other models, including the standard Grok 4.

Perhaps most notably, Grok 4 Heavy outperformed not only all the other models but also human participants in Vending-Bench, a simulated environment that evaluates a model’s ability to manage a simple vending machine business over time. The test is designed to assess multi-step planning and economic reasoning — areas that other models typically struggle with.

“It’s smarter than almost all graduate students in all disciplines simultaneously,” Musk said of Grok 4 in the livestream. A few minutes later he said it was “post-graduate, Ph.D. level in everything,” then “better than Ph.D. level.”

Still, there are some important caveats to keep in mind. For one, xAI has not shared Grok 4’s performance on several other widely used industry benchmarks, such as MMLU and HumanEval, making a comprehensive comparison against other top AI models impossible. And the only other models Grok 4 was compared to were OpenAI’s o3, Anthropic’s Claude Opus 4 and Google’s Gemini 2.5 Pro, leaving out many others. Independent leaderboards like LMArena also show Grok 4 trailing behind several of its competitors in both text and image understanding.

More broadly, industry experts caution against using high benchmark scores as a definitive measure of real-world intelligence. After all, xAI is not the first company to say its latest product is smarter than human experts. Google DeepMind CEO Demis Hassabis made similar statements back in 2023 when Gemini 2 Ultra was released. While both of these models yield impressive results, Hassabis’ claims were an exaggeration then and Musk’s claims are likely an exaggeration now. Especially considering the fact that Grok 4 is susceptible to the same issues of any other generative AI product — namely hallucinations and bias.

Related ReadingPerplexity is Leading the AI Search Revolution. Here’s How

 

Grok 4 Controversies

In the hours following the release of Grok 4, the model began exhibiting some troubling behavior. It consistently appeared to use Musk’s own social media posts as sources of truth when asked about the Israel-Palestine conflict, abortion, immigration in the United States and other controversial topics, suggesting the model may have been trained or tuned to consider the founder’s personal politics. In another especially alarming instance, Grok referred to itself as “Hitler” on the X profile it powers.

This isn’t the first time Grok has generated controversy. Just a few days before the Grok 4 rollout, the chatbot’s automated X account fired off several antisemetic replies to users and even claimed to be “MechaHitler.” A couple months earlier, it referred to a “white genocide” in Musk’s native South Africa — even when responding to posts that had absolutely nothing to do with the subject. Despite the company’s stated mission to build a “maximally truth-seeking AI,” xAI has had to repeatedly delete offensive content and issue correction statements

These issues are especially striking given the motivations behind Grok’s creation. Musk founded xAI in response to what he perceived as political bias in other AI systems — specifically OpenAI’s ChatGPT, which he has criticized for being overly “woke” and left-leaning. But Grok’s recent behavior suggests it may have swung too far the other way rather than offering a balanced perspective.

In the wake of the latest controversy, xAI appears to have updated Grok 4’s internal instructions, removing prompts that might encourage politically incorrect responses. There are also a few new lines directing the model to source information from a diverse range of perspectives when addressing sensitive or controversial topics.

 

Notable Grok 4 Developments

Since the release of Grok in 2023, xAI has iterated quickly, pushing its products to the bleeding edge of what is possible with AI.. The timeline below highlights some of the most consequential releases and platform moves shaping Grok’s capabilities and reach.

Grok 4 Fast Release (September 2025)

xAI released Grok 4 Fast, a cost‑efficient reasoning model built on the Grok 4 architecture. Grok 4 Fast achieved near‑Grok 4 benchmark scores while using 40 percent fewer thinking tokens on average, translating to a 98 percent reduction in cost for equivalent performance. It supports web and X search via native tool use, uses a 2‑million‑token context window and combines both reasoning and non‑reasoning modes into one model. All users can access Grok 4 Fast (Fast and Auto modes) in the Grok app, as well as on X for a limited time. Developers can access Grok 4 Fast through the xAI API, OpenRouter and Vercel AI Gateway.

Expanded Free Access of Grok 4 (August 2025)

In early August 2025, xAI made Grok 4 free to all Grok users worldwide for a limited period, with generous daily usage limits. However, premium features like Grok 4 Heavy remained gated behind subscriptions.

Antisemitism Controversy (July 2025)

Days before Grok 4’s release, the previous version of the model made antisemitic remarks on X, referring to itself as “MechaHitler.” The incident triggered a lot of backlash  — including a partial ban in Turkey — and delayed Grok 4’s integration into Microsoft’s Azure AI Foundry. Following the controversy, xAI pledged to tighten moderation and remove the offending system prompts.

AI Companions Release (July 2025)

xAI introduced animated companion avatars within the Grok app, including anime-style characters and NSFW modes. The feature reflected xAI’s push toward more personalized and entertainment-oriented AI experiences.

Grok U.S. Government Contract Announcement (July 2025)

In July 2025, xAI announced it secured a $200 million U.S. Department of Defense contract, marketed as “Grok for Government.” The program is aimed at adapting Grok’s capabilities for defense and intelligence use cases, signaling a move into high-value government AI markets.

Grok 4 Release (July 2025)

xAI unveiled Grok 4, which features native tool use and real‑time search capabilities, as well a new “Heavy” variant, which is designed to “consider multiple hypotheses” in parallel to complete complex reasoning tasks, according to Musk. The rollout included access for SuperGrok and Premium+ users, plus API availability, signaling a push for developer adoption. xAI claimed Grok 4 outperformed several other leading models across various industry benchmarks, and touted its agentic capabilities. 

Grok Announced to Be Coming to Tesla Vehicles (July 2025)

Elon Musk said Grok will be made available in Tesla cars, extending the chatbot’s reach beyond phones and the web into in-vehicle experiences. If broadly deployed, this move would put Grok usage in front of millions of drivers and tie xAI more closely to Musk’s automotive ecosystem.

Grok 3 Release (February 2025)

xAI debuted Grok 3, introducing step‑by‑step reasoning and  DeepSearch web browsing as core capabilities. The model was trained on the Colossus supercluster and was the primary model powering the Grok chatbot, with access expanding via X subscriptions and SuperGrok. Grok 3 set the stage for Grok 4’s agentic features and higher‑end tiers.

Grok‑1.5V Multimodal Preview (April 2024)

xAI previewed Grok‑1.5V, adding image understanding for documents, charts and photos alongside text generation. This broadened Grok’s input modalities and foreshadowed the multimodal, agent‑like direction later emphasized with Grok 3 and Grok 4.

Grok is Announced (November 2023)

xAI introduced Grok as a chatbot inspired by The Hitchhiker’s Guide to the Galaxy, designed to answer “almost anything” using a cheeky, rebellious tone. The announcement established xAI’s overall brand, and has rapidly evolved since with every new model rollout.

Frequently Asked Questions

Yes, Grok 4 is now available to Premium+ and SuperGrok subscribers of both X and the Grok chatbot. Grok 4 Heavy is only available to SuperGrok Heavy subscribers.

In early August 2025, Grok 4 was offered free to users for a limited time, with restrictions on usage. It is also available to those with a subscription to xAI’s Premium+ and SuperGrok plans, which start at $40/month and $30/month respectively, or through the company’s API, which has varying pricing tiers. The Grok 4 Heavy version is only available with a SuperGrok Heavy subscription, which costs $300/month.

No, Grok 4 is not an open source model.

Grok 4 appears to outperform OpenAI’s o3 model on several industry benchmarks related to coding, math, science and abstract reasoning. However, whether Grok is better than ChatGPT or vice versa largely comes down to the specific task and use case.

Abel Rodriguez contributed reporting to this story.

Explore Job Matches.