If you spend enough time using chatbots and content generators, it won’t take long before you get outputs that are irrelevant, nonsensical and, at times, just downright wrong. These instances are known as AI hallucinations, and they are a problem for every organization and individual using generative AI to obtain information and get their work done (and done accurately).
What is an AI hallucination?
An AI hallucination is when a generative AI model generates inaccurate information as if it were correct. AI hallucinations are often caused by limitations or biases in training data and algorithms, and can result in producing content that is wrong or even harmful.
A term borrowed from human psychology, AI hallucinations occur when an AI model generates false or illogical information that isn’t based on real data or events, but is presented as fact. Because the grammar and structure of these AI-generated sentences are so eloquent, they appear accurate. But they are not.
AI hallucinations are caused by a variety of factors, including biased or low-quality training data, a lack of context provided by the user or insufficient programming in the model that keeps it from correctly interpreting information. They can occur in image recognition systems and AI image generators, but they are most commonly associated with AI text generators.
Why Do AI Hallucinations Happen?
AI hallucinations are a direct result of large language models (LLMs), which are what allow generative AI tools (like ChatGPT and Bard) to process language in a human-like way. Although LLMs are designed to produce fluent and coherent text, they have no understanding of the underlying reality that they are describing. All they do is predict what the next word will be based on probability, not accuracy.
To understand how this happens, it’s important to know how LLMs work. LLMs are fed massive amounts of text data, including books, news articles, blogs and social media posts. That data is then broken down into smaller units, called tokens, which can be as short as a single letter or as long as a word.
LLMs use neural networks to figure out how these words and letters work together. Neural networks are made up of processing units, called nodes, that are connected to each other via weights. Those weights are set by giving the model some text and having it try to fill in the word that comes next, and then comparing its output to what was actually in the text. This happens over and over again, with the model adjusting its internal parameters each time to get better at making those predictions.
But the model never actually learns the meaning of the words themselves.
Emily M. Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics laboratory, explained it like this: “If you see the word ‘cat,’ that immediately evokes experiences of cats and things about cats. For the large language model, it is a sequence of characters C-A-T,” she told Built In. “Then eventually it has information about what other words and what other sequences of characters it co-occurs with.”
As the model processes more and more text, it begins recognizing patterns in the language, such as grammar rules and word association, learning to understand which words are likely to follow others in a sentence. Over time, the model develops a form of semantic understanding, where it learns to associate words and phrases with their meanings. This is what allows LLMs to write cover letters, create recipes, offer advice and perform all the other tasks people are using generative AI for. Still, LLMs cannot fully grasp the underlying reality of what they’re talking about.
“It’s designed to give an answer, even if the answer is not factually correct.”
“[Generative AI] is not really intelligence, it’s pattern matching,” Shane Orlick, the president of AI content generator Jasper, told Built In. “It’s designed to have an answer, even if the answer is not factually correct.”
In fact, Bender thinks even assigning the term “hallucination” to what these systems are doing is too generous, since it implies that artificial intelligence is capable of perception. It isn’t.
Still, if their training data is inaccurate or biased, or if the model is too complex and not given enough guardrails, LLMs have a tendency to get things wrong. But their “verbosity” and “confidence” can make it difficult to spot exactly where or how a model has screwed up, said Christopher Riesbeck, an associate professor and co-director of Northwestern University’s Center for Computer Science and Learning Sciences.
“They’re always generating something that’s statistically plausible,” Riesbeck told Built In. “It’s only when you look closely that you may say, ‘Wait a minute, that doesn’t make any sense.’”
Types of AI Hallucinations (With Examples)
Some AI hallucinations are more obvious than others. They can range from minor factual inconsistencies to completely fabricated information. Here are a few types of hallucinations one can encounter when using generative AI, along with some real-world examples.
1. Factual Inaccuracies
Factual inaccuracies are among the most common forms of AI hallucinations, where a model generates text that appears true but isn’t. The basic gist of the statement might be based on reality and sound plausible, but the particulars are wrong.
In February 2023, Google’s chatbot, Bard, incorrectly claimed that the James Webb Space Telescope took the first image of a planet outside the solar system. This is incorrect — the first images of an exoplanet were taken in 2004, according to NASA, and the James Webb Space Telescope was not launched until 2021.
Similarly, in a launch demo of Microsoft Bing AI, the chatbot (which uses the same LLM as ChatGPT), analyzed earnings statements from Gap and Lululemon, reportedly providing an incorrect summary of their facts and figures.
2. Fabricated Information
AI text generators and chatbots have been known to put out information that is completely fabricated and not based on any kind of fact. For example, ChatGPT can generate URLS, code libraries and even people that do not exist, and it can reference made-up news articles, books and research papers — all of which can be detrimental to someone who is using the tool for research (a common, yet ill-advised use of ChatGPT).
In June 2023, it was reported that a New York attorney used ChatGPT to craft a motion that turned out to be full of phony judicial opinions and legal citations. The attorney, who was later sanctioned and fined, claimed he “did not comprehend that ChatGPT could fabricate cases.”
“It has been developed to produce output that is plausible and pleasing to the user,” Bender explained. “So when a lawyer comes in and says ‘Show me some case law that supports this point,’ the system is developed to come up with a sequence of words that looks like case law that supports the point.”
3. Harmful Misinformation
Generative AI can also generate false information about real people, compiling bits and pieces of information — some of it true, some of it not — and concocting stories that some users may take as truth.
When asked to provide cases of sexual harassment in the legal profession, ChatGPT fabricated a story about a real law professor, alleging that he harassed students on a school trip. That trip never happened, and he has never actually been accused of sexual harassment in real life. But he had done some kind of work to address and stop sexual harassment, and that’s why his name came up.
In another incident, ChatGPT falsely claimed that a mayor in Australia was found guilty in a bribery case from the 1990s and early 2000s. In reality, he was a whistleblower in the case.
This kind of misinformation has the potential to be damaging to the people involved, and through no fault of their own. The problem has even risen to the attention of the U.S. Federal Trade Commission, which is now investigating OpenAI to see if its false statements have caused reputational harm to consumers.
4. Weird or Creepy Answers
Some AI hallucinations are just plain weird or creepy. By nature, AI models aim to generalize and be creative with their output. This creativity can sometimes lead to some wacky outputs, which isn’t necessarily a problem if accuracy isn’t the goal.
In fact, Orlick said that the creativity that sometimes comes out of an AI hallucination can actually be a “bonus,” depending on what an AI product is being used for. Jasper is mainly used among marketers, who always need to be coming up with creative and imaginative ideas. If Jasper comes up with some outside-the-box copy or concept in that process, that could actually be useful to a marketing team.
“Coming up with ideas, coming up with different ways of looking at a problem, that’s really great,” Orlick said. “But when it actually comes to writing the content, it has to be accurate. That’s where hallucinations kind of cross that line and are bad.”
Why Are AI Hallucinations a Problem?
AI hallucinations are a part of a growing list of ethical concerns about generative AI. These tools can create mountains of fluent, yet factually inaccurate content in a matter of seconds — far faster than any human could on their own. And this leads to several problems.
1. Spread of Misinformation
For one, AI hallucinations can bleed into AI-generated news articles if there is no fact-checking mechanism in place, which can lead to a mass spread of misinformation, potentially affecting people’s livelihoods, government elections and even society’s grasp on what is true. And they can be harnessed by internet scammers and hostile nations alike to spread disinformation and cause trouble.
2. User Harm
AI hallucinations can also be flat-out dangerous — not just in the context of reputational harm, but also bodily harm.
For instance, AI-generated books on mushroom foraging have been popping up on Amazon. This has led some to wonder if any falsehoods within those books could cause someone to get sick or even die. If one of these books gives bad advice for how to distinguish between a deadly destroying angel fungi and a perfectly safe button mushroom, “that’s an instance where a sequence of words that look like information, could be immediately fatal,” Bender said.
3. Loss of Trust
The echoes of AI hallucinations carry far beyond just one text or the individual who reads it. Fill the internet up with enough of this misinformation, and you have a self-perpetuating cycle of inaccurate content that Bender calls the “pollution of the information ecosystem.”
“These systems that produce non-information — which is sometimes problematic and sometimes not — and produce it in a way that looks authoritative and looks like it is produced by humans, is now sort of mixing in with our actual, legitimate information sources in a way that’s really hard to detect and mitigate,” she explained. “On the one hand, it makes it harder for us to trust the things we should be able to trust. And on the other hand, it would be really hard to fix without marking the synthetic non-information at the source.”
In the end, hallucinations stand to alter people’s trust. Not only in our “legitimate information sources” as Bender mentioned, but also in generative AI. If people don’t think the quality of their outputs are factual or based on real data, then they may avoid using it. That could be bad news for the throngs of companies who are innovating and adopting this technology.
“If we don’t solve hallucinations, I think it’s definitely going to hurt adoption,” Orlick said.
How to Prevent AI Hallucinations
All of the leaders in the generative AI space are working to help solve the problem of AI hallucinations.
Google connected Bard to the internet so that its responses are based on both its training data and information it’s found on the web. OpenAI did the same for ChatGPT. And OpenAI has worked to refine ChatGPT with feedback from human testers, using a technique called reinforcement learning. The company also proposed a new strategy to reward models for each individual, correct step of reasoning when arriving at an answer, instead of just rewarding the final answer. The approach is called process supervision and it could lead to more explainable AI, according to the company. But some experts are doubtful this could be an effective way of fighting fabrications.
Northwestern’s Riesbeck said generative AI models are “always hallucinating.” Just by their very nature, they are always “making up stuff.” So removing the possibility of AI hallucinations ever generating false information could be difficult, if not impossible. But there are some steps both companies and users can take to counteract them and limit their harm.
1. Use Diverse and Representative Training Data
The companies making and customizing these models need to ensure that the training data used is diverse and representative of the real world. This can help reduce the potential that the outputs are inaccurate due to bias. They should also regularly update and expand their training data sets as time goes on to account for evolving events and cultural shifts.
“As companies come out with more powerful models, you’ve got to train on better content, richer content, more accurate content,” Jasper’s Orlick said.
2. Ground the Model With Relevant Data
Artificial intelligence is only as good as the data it is fed. You wouldn’t expect a human to give factual legal or medical advice without first knowing a lot about the law and medicine, and the same goes for AI. Companies can ground their generative AI models with industry-specific data, enhancing its understanding so that it can generate answers based on context instead of just hallucinating.
3. Experiment With Temperature
Temperature is a parameter that controls the randomness of an AI model’s output. It essentially determines the degree of creativity or conservatism in its generated content, where a higher temperature increases randomness and a lower temperature makes the output more deterministic. In short: the higher the temperature, the more likely a model is to hallucinate.
Companies can provide users with the ability to adjust the temperature settings to their liking, and set a default temperature that strikes a proper balance between creativity and accuracy.
4. Always Verify
Even with these systems and controls in place, it is ultimately up to the user to verify the answers generated by a model, as that is the most surefire way of detecting AI hallucinations. So whether someone is using AI to write code, carry out research or draft an email, they should always review the content generated before using or sharing it.
Frequently Asked Questions
What are AI hallucinations?
AI hallucinations are instances where a generative AI system produces information that is inaccurate, biased, or otherwise unintended. Because the grammar and structure of this AI-generated content is so eloquent, the statements may appear accurate. But they are not.
What is an example of an AI hallucination?
Examples of AI hallucinations include when a chatbot gives an answer that is factually inaccurate, or when an AI content generator fabricates information but presents it as truth.
Why are AI hallucinations a problem?
AI hallucinations are problematic because they can lead to the rapid generation of false or misleading information, which can harm decision-making processes and lead to the spread of disinformation. They may also lead to content that is offensive or biased, potentially causing harm to users and society.