The massive strides made by artificial intelligence in recent years can be mostly attributed to its ability to comprehend natural language. But that doesn’t mean AI perceives the objects and actions represented by human words. That all could be about to change, though, with the advent of so-called “world models” — models that can instill in AI systems a basic understanding of how the physical world works, eventually enabling them to rival human levels of intelligence.
What Are World Models?
World models are AI models that understand how the physical world works, such as concepts like gravity, inertia and impact dynamics. Trained on multimodal, real-time data, they can then predict the consequences of actions and changes in the environment, demonstrating reasoning capabilities and awareness that more closely resemble human intelligence compared to LLMs.
Yann LeCun, former chief AI scientist at Meta, placed the spotlight on world models at the AI Action Summit in Paris in February 2025, claiming that they’re better suited for “human-level intelligence” than today’s large language models, or models designed to process and produce natural language. The rest of the world has also taken notice, with projects underway in countries like China and the United Arab Emirates.
As concerns swirl around whether the billions spent on AI will ever yield returns, world models renew the hope that higher forms of artificial intelligence are within reach. It’s just a question of who can pivot the fastest to pursue more human-like systems while avoiding the familiar pitfalls that have plagued the industry in the past.
What Are World Models?
World models originate from the concept of mental models, first introduced by Scottish psychologist Kenneth Craik in his 1943 book The Nature of Explanation. Craik argued that the human mind creates an internal model of how the world works, using it to explain why events happen, predict events based on possible scenarios and inform a person’s actions accordingly. This process allows us to anticipate, for instance, what would happen if we stepped in front of a semi-truck on a highway — without having to find out for ourselves.
Similarly, a world model is a neural network that has an internal representation of its surroundings, allowing it to understand basic physical and spatial principles. To develop this internal representation, a model must be trained on large volumes of multimodal, real-world data, particularly images and videos. The model analyzes this data to generate a 3D representation of its surroundings, essentially building a virtual world that can be used to train advanced systems on simulations of realistic scenarios.
It’s this kind of life-like training that could give artificial intelligence the understanding required to interact directly with its environment, opening up the possibilities for what AI could do when it takes on unprecedented forms.
Why Are World Models Suddenly All the Rage?
The excitement around world models stems from what they could mean for the evolution of AI, especially as the limitations of large language models become more apparent.
Real-World Knowledge Could Be the Key to AGI and Robotics
For the longest time, the artificial intelligence industry has remained fixated on the AI “body” problem, which refers to AI systems being confined to devices that don’t facilitate real-time interactions. These interactions would provide systems with the real-time data needed to become more aware of their surroundings and adapt to a range of scenarios. In response, companies like OpenAI and Anthropic have invested more heavily in robotics initiatives, believing that AI requires new, physical forms to upgrade its capabilities.
But world models offer a shortcut. By simulating real-world scenarios, world models can equip AI systems with the real-time data to consider various possibilities, adjust their decision-making based on the situation and choose the best path forward. This process would not only enable systems to direct robots in intricate settings, but also better inform the decisions of AI agents — AI systems charged with completing complex, multi-step tasks.
Still, the ultimate prize that world models could lead to is artificial general intelligence, or AI that can think and learn like humans. World models effectively teach AI the core principles of how the physical world works, rather than feed them step-by-step instructions on how to complete specific tasks. AI systems could then tailor this broader, real-world knowledge to different situations, taking what they learn from solving one problem and applying it to another, unfamiliar one — the very definition of AGI.
Large Language Models Could Be Nearing Their Limits
At the same time, LLMs have lost much of their luster. In fact, LeCun has suggested that these models are dumber than house cats, urging users not to “confuse the superhuman knowledge accumulation and retrieval abilities of current LLMs with actual intelligence.” A 2024 study by researchers at MIT, Harvard University and Cornell University reaffirms this claim, finding that LLMs failed to produce realistic maps of New York City for turn-by-turn directions, especially when faced with surprise variables like detours.
Piling onto the negative publicity, Hugging Face CEO Clem Delangue proposed that fears over an AI bubble may actually expose an LLM bubble. After all, language models lack the reasoning capabilities and awareness of world models, severely limiting the products they can power.
In other words, LLMs might be book smart, but it takes street smarts to make it in the real world — something that only world models possess. As a result, more companies are turning to world models, potentially signaling the beginning of the end for language models.
Top Players to Watch in the World Model Race
From established tech titans to incoming AI startups, here are some of the biggest names to keep in mind as the race for world models heats up.
Google DeepMind
Google DeepMind entered the world models field in 2024 with the introduction of Genie, a model that can produce “action-controllable virtual worlds.” It followed this up with Genie 2, which can generate 3D games based on a single image input. Doubling down on these efforts in 2025, DeepMind has invested resources in a world models team as it continues to add to its Genie family and realize its vision of converting Gemini into a world model.
If DeepMind can successfully upgrade Gemini, it could become the main player in the robotics sector. The company has already designed its Gemini Robotics model to directly control robots, with its more recent Gemini Robotics 1.5 model instilling agentic capabilities in robots to accomplish complicated, multi-step tasks. Hiring Aaron Sanders, the former CTO at robotics company Boston Dynamics, further reinforces DeepMind’s commitment to using world models to develop more intelligent robots.
World Labs
Described as a “leading spatial intelligence company,” World Labs is the brainchild of Fei-Fei Li, better known as the “godmother of AI” for her work on ImageNet — a database of online images that spearheaded the rise of deep learning. World Labs broke onto the scene to the tune of $230 million in funding, and it has so far lived up to the hype.
In November 2025, the company released its Marble world model, which has the ability to create entire 3D worlds from a text prompt, image, video or rough 3D layout. Users can also edit existing worlds, expand them and even combine them with other worlds. Having made substantial progress in just under a year, World Labs is shaping up to be a major player as the world model landscape evolves.
Nvidia
In January 2025, Nvidia launched its Nvidia Cosmos platform to support a suite of world models for the purpose of achieving physical AI. The company is well on its way to reaching its goal, releasing new libraries and world foundation models that make it easier to build digital twins and simulations for training AI models and agents. Nvidia has continued to update these tools, now providing developers with synthetic worlds they can use to train physical AI models for running autonomous vehicles, robots and more.
Meta
Meta has also made a push into world models with its video joint embedding predictive architecture 2 (V-JEPA 2) model. Equipped with physical reasoning capabilities, V-JEPA 2 enables AI to anticipate the outcomes of its actions in the physical world, laying the groundwork for intelligent robots and AI agents. The company could become a more serious contender in the world models space now that it has hired Tim Brooks away from Google DeepMind, where he previously led the new world models team.
Yann LeCun’s AI Startup
Yann LeCun may be best remembered at Meta for founding the Fundamental AI Research (FAIR) laboratory to achieve “advanced machine intelligence.” However, the company’s recent investments in superintelligence indicate a greater commitment to LLMs, which LeCun has long criticized. With LeCun’s deeper interest in world models, a split seemed inevitable.
In a LinkedIn post, LeCun officially announced his departure from Meta to found his own startup, with the goal of designing systems that “understand the physical world, have persistent memory, can reason, and can plan complex action sequences.” Further details are still to come, although LeCun has confirmed that Meta will be a partner of his startup.
Jeff Bezos’ Project Prometheus
Another big name to enter the ring is Jeff Bezos, who is set to become co-CEO of his new startup, Project Prometheus. Interestingly enough, Bezos’ fellow co-CEO Vik Bajaj used to be a member of the Google X team responsible for launching the Wing drone delivery service and Waymo self-driving car service. Focused on creating “AI for the physical economy,” Project Prometheus is reportedly off to a flying start, with a recent acquisition of agentic AI startup General Agents and more than $6 billion in funding, including from Bezos himself. Still, much about the company’s roadmap, technology and ambitions remains undisclosed.
What’s Taking World Models So Long to Arrive?
Ever since Craik introduced the concept of mental models in the ‘40s, researchers have attempted to apply this idea to artificial intelligence. In the late 1960s, Terry Winograd invented an AI system known as SHRDLU, which could follow basic commands and engage in conversations about a set of toy blocks. Later on in the early 1990s, Richard Sutton built Dyna, an AI architecture that relied on a world model to predict future outcomes and plan its actions accordingly.
So, after more than five decades of work, why haven’t world models gone mainstream yet? For one, world models must be trained on massive volumes of multimodal, real-world data, which has only just become readily available thanks to the proliferation of cameras and the internet. Still, even if companies have the resources to compile all of this data, humans must spend thousands of hours preparing it pre-training, making the process of building world models costly and time-consuming.
In addition, world models are subject to the same issues that have affected LLMs, including biases getting baked into the training data and hallucinations undermining their performance later on. Failing to collect adequate amounts of multimodal, real-time data can worsen these issues, further complicating efforts to develop world models.
Of course, these factors are unlikely to deter organizations from pursuing world models, given their potential to reshape various sectors and revitalize the AI industry.
How World Models Could Impact the Future of AI
Big Tech’s spending splurge on AI has raised the specter of a market bubble, where companies fail to get meaningful returns on their investments and their stocks plunge as a result. Because the main products driven by AI rely on LLMs, the limitations of these models suggest that the industry could hit a ceiling and come crashing down. But world models could change that.
Models that can train AI systems on real-world scenarios may lead to more intelligent agents adept at handling advanced, multi-step tasks and enhanced video games that can be personalized based on a user’s preferences. Not to mention all of the drones, wearable devices, robots and autonomous vehicles that could all be improved with more human-like intelligence, increasing the opportunities for AI to revolutionize everyday life.
World models may be imperfect and accessible only to companies that have enough resources for the time being, but the promise of AI that truly replicates human intelligence may be impossible to pass up — especially for those eager to see their AI investments finally translate into technologies that have lasting value.
Frequently Asked Questions
How are world models different from large language models?
Large language models are built to predict the next word or phrase in a sentence, and are trained on either static or real-time data to understand human language. Meanwhile, world models are designed to understand basic physical and spatial principles, requiring multimodal, real-time data for training. World models can then equip AI systems with an understanding of how the physical world works — something that LLMs aren’t able to do.
How do world models help AI think like a human?
World models can simulate real-world scenarios, enabling AI systems to learn the general principles of how the world works, as opposed to memorizing step-by-step instructions to accomplish certain tasks. AI tools can then apply this broader knowledge to different situations, adjusting their actions and decisions accordingly. As a result, world models are seen as the key to unlocking artificial general intelligence — AI that thinks and learns like humans.
Why haven’t world models become mainstream until now?
World models require vast amounts of multimodal, real-time data to be trained in understanding spatial and physical principles. This complex data isn’t widely available, and teams that attain enough data will then need to spend thousands of hours processing it before it can be used for training. Like any AI model, world models are also impacted by biases in the training data and hallucinations. These challenges may discourage companies from developing world models, especially if they have limited resources.
