What Is a Large Language Model (LLM)?

Large language models are the backbone of generative AI, driving advancements in areas like content creation, language translation and conversational AI.

Written by Ellen Glover
A robot hand holding a pencil as if it is writing, with various books and letters flying off the surface it is writing upon.
Image: Shutterstock
UPDATED BY
Brennan Whitfield | Jul 16, 2024

A large language model (LLM) is a machine learning model designed to understand and generate natural language. Trained using enormous amounts of data and deep learning techniques, LLMs can grasp the meaning and context of words. This makes LLMs a key component of generative AI tools, which enable chatbots to talk with users and text-generators to assist with writing and summarizing.

Large Language Model Definition

Large language models (LLMs) are machine learning models that leverage deep learning techniques and vast amounts of training data to understand and generate natural language. Their ability to grasp the meaning and context of words and sentences enable LLMs to excel at tasks such as text generation, language translation and content summarization.

What Is a Large Language Model? 

A large language model is a type of foundation model trained on vast amounts of data to understand and generate human language.

It operates by receiving a prompt or question and then using neural networks to repeatedly predict the next logical word, generating an output that makes sense. To do this, LLMs rely on petabytes of data, and typically consist of at least a billion parameters. More parameters generally means a model has a more complex and detailed understanding of language.

Large language models are built on neural network-based transformer architectures to understand the relationships words have to each other in sentences. Transformers use encoders to process input sequences and decoders to process output sequences, both of which are layers within its neural network.

 

Why Are Large Language Models Important? 

Today’s LLMs are the result of years of natural language processing and artificial intelligence innovation, and are accessible through interfaces like OpenAI’s ChatGPT and Google’s Gemini. They are foundational to generative AI tools and automating language-related tasks, and are revolutionizing the way we live, work and create.

Learn MoreWhat Is Natural Language Generation?

 

How Do Large Language Models Work?

LLMs work by (1) receiving an input like a command or query, (2) applying knowledge gained from extensive training data, and then (3) using neural networks to accurately predict and generate contextually relevant outputs.

1. Gathering Large Amounts of Data 

LLMs have to first be trained on petabytes of text data. Typically, this is unstructured data, which has been scraped from the internet and used with minimal cleaning or labeling. The dataset can include Wikipedia pages, books, social media threads and news articles — adding up to trillions of words that serve as examples for grammar, spelling and semantics.

2. Training the Language Models

Then comes the actual training process, when the model learns to predict the next word in a sentence based on the context provided by the preceding words. 

In training, the transformer model architecture attributes a probability score to a string of words that have been tokenized, meaning they have been broken down into smaller sequences of characters and given a numerical representation. This places weights on certain characters, words and phrases, helping the LLM identify relationships between specific words or concepts, and overall make sense of the broader message. 

“If you type the phrase ‘I will,’ then it will predict something like ‘I will survive,’ ‘I will always love you,’ ‘I will remember you,’” Mikayel Harutyunyan, the CMO of AI company Activeloop, told Built In. “The algorithm basically tries to estimate which [word] would be the best to put in this particular text.”

Training happens through unsupervised learning, where the model autonomously learns the rules and structure of a given language based on its training data. Over time, it gets better at identifying the patterns and relationships within the data on its own. 

“You don’t have to teach [LLMs] how to solve the problem, all you have to do is show them enough samples of correct and wrong answers, and the model usually picks it up,” Vinod Iyengar, VP of product for AI company ThirdAI, told Built In.

3. Generating Model Outputs

Eventually, the LLM gets to the point where it can understand the command or query given to it by a user, and generate a coherent and contextually relevant response — a capability that can be used for a wide range of text-generation tasks.

 

Follow along with this brief explainer on large language models and how they work. | Video: Google Cloud Tech

Types of Large Language Models

There are many different types of large language models, each with their own distinct capabilities that make them ideal for specific applications.

Zero-Shot Learning Model

Zero-shot learning models are able to understand and perform tasks they have never come across before. They don’t need specific examples or training for each new task. Instead, they apply their generalized understanding of language to figure things out on the spot. VideoPoet is an example of a zero-shot learning model.

“If you have a zero-shot LLM and you provide it with a prompt like, ‘Translate the following English text into French: The weather is beautiful today,’ the model can generate the translation without ever having been trained specifically on translation tasks,” Beerud Sheth, the CEO for conversational AI company Gupshup, told Built In.

Fine-Tuned Model

Fine-tuned models are essentially zero-shot learning models that have been trained using additional, domain-specific data so that they are better at performing a specific job, or more knowledgeable in a particular subject matter. Fine-tuning is a supervised learning process, which means it requires a dataset of labeled examples so that the model can more accurately identify the concept. GPT 3.5 Turbo is one example of a large language model that can be fine-tuned. 

If you want a model to generate more accurate medical diagnoses, it needs to be fine-tuned on a large dataset of medical records. Or if you want a model to be able to generate marketing content that is on-brand for a particular company, it needs to be trained using that company’s data.   

Multimodal Model 

Multimodal models can handle not just text, but also images, videos and even audio by using complex algorithms and neural networks. “They integrate information from different sources to understand and generate content that combines these modalities,” Sheth said. An example of a large multimodal model is GPT-4.

Language Representation Model

Language representation models specialize in assigning representations to sequence data, helping machines understand the context of words or characters in a sentence. These models are commonly used for natural language processing tasks, with some examples being the BERT and RoBERTa language models.

More on Language ModelsA Beginner’s Guide to Language Models

 

Large Language Model Applications

Large language models are applicable across a broad spectrum of use cases in various industries. Below are some of the most prevalent applications of this technology.

Text Generation

LLMs can generate text on virtually any topic, whether that be an Instagram caption, blog post or mystery novel. By extension, these models are also good at what Iyengar calls “style transfer,” meaning they can mimic certain voices and moods — so you could create a pancake recipe in the style of William Shakespeare, for instance. 

Code Generation

LLMs can be a useful tool in helping developers write code, find errors in existing code and even translate between different programming languages. They can also answer coding-related questions in plain language. 

Content Retrieval and Summarization

LLMs excel at summarizing and retrieving key information from lengthy documents. For example, a lawyer can use an LLM to summarize contracts, or extract important information from thousands of pages of evidence in the discovery process.

Conversational AI

LLMs enable AI assistants to carry out conversations with users in a way that is more natural and fluent than older generations of chatbots. Through fine-tuning, they can also be personalized to a particular company or purpose, whether that’s customer support or financial assistance. 

Language Translation

LLMs are good at providing quick and accurate language translations of any form of text. A model can also be fine-tuned to a particular subject matter or geographic region so that it can not only convey literal meanings in its translations, but also jargon, slang and cultural nuances.

Related ReadingHow to Develop Large Language Model Applications

 

Advantages of Large Language Models

Large language models have become one of the hottest areas in tech, thanks to their many advantages.

LLMs Are Versatile and Customizable

LLMs are probably best known for their versatility. They can perform all kinds of tasks, from writing business proposals to translating entire documents. Their ability to understand and generate natural language also ensures that they can be fine-tuned and tailored for specific applications and industries. Overall, this adaptability means that any organization or individual can leverage these models and customize them to their unique needs.

LLMs Can Speed Up Time-Consuming Tasks

Typically, LLMs generate real-time responses, completing tasks that would ordinarily take humans hours, days or weeks in a matter of seconds. 

These models can sift through hundreds of pages of documents or extensive datasets and automatically extract valuable insights from them. They can write 100 individually unique marketing emails (including subject lines) in response to a single-sentence prompt. The upshot is that LLMs can automate routine, time-consuming tasks so that humans have more time to pursue more complex and strategic endeavors.

LLMs Are Always Improving

LLMs can continuously learn and advance when given fresh data. As language models encounter new information, they are able to dynamically refine their understanding of evolving circumstances and linguistic shifts, thus improving their performance over time.

LLMs Have Seemingly Endless Applications

Because they are so versatile and capable of constant improvement, LLMs seem to have infinite applications. From writing music lyrics to aiding in drug discovery and development, LLMs are being used in all kinds of ways. And as the technology evolves, the limits of what these models are capable of are continually being pushed, promising innovative solutions across all facets of life.

Looking to the FutureAre You Ready for the Era of Co-Creation with AI?

 

Challenges of Large Language Models

With all of that being said, LLMs certainly aren’t perfect. Like any technology, they come with a fair amount of challenges and disadvantages.

LLMs Can Generate Inaccurate Responses

LLMs often struggle with common-sense, reasoning and accuracy, which can inadvertently cause them to generate responses that are incorrect or misleading — a phenomenon known as an AI hallucination. Perhaps even more troubling is that it isn’t always obvious when a model gets things wrong. Just by the nature of their design, LLMs package information in eloquent, grammatically correct statements, making it easy to accept their outputs as truth. But it is important to remember that language models are nothing more than highly sophisticated next-word prediction engines.

“They’re trying to predict which word or which token will be the most correct, statistically speaking,” Activeloop’s Harutyunyan said. “They might come up with something that sounds sound, but is not actually truthful.”

LLMs Tend to Be Biased

When an LLM is fed training data, it inherits whatever biases are present in that data, leading to biased outputs that can have much bigger consequences on the people who use them. After all, data tends to reflect the prejudices we see in the larger world, often encompassing distorted and incomplete depictions of people and their experiences. So if a model is built using that as a foundation, it will inevitably reflect and even magnify those imperfections. This could lead to offensive or inaccurate outputs at best, and incidents of AI automated discrimination at worst.

LLMs Spark Plagiarism Concerns

Some companies are using copyrighted materials for training data, the legality of which is under discussion as it’s not entirely established at the federal scale. This has sparked a larger debate — and even some lawsuits — among news outlets, authors and various other creatives, who fear that these models are generating responses that resemble or even flat-out copy their work, posing ethical and legal concerns regarding the balance between intellectual property rights, plagiarism and the state of the fair use doctrine. Meanwhile, the U.S. Copyright Office has stated unequivocally that AI-generated work cannot be copyrighted.

LLMs’ Outputs Aren’t Always Explainable

Solving issues like AI hallucinations, bias and plagiarism won’t be easy going forward, considering that it’s very difficult (if not impossible at times) to figure out exactly how or why a language model has generated a particular response. This is true even of AI experts, who understand these algorithms and the complex mathematical patterns they operate on better than anyone. 

“With 100 billion parameters all working and interacting with each other, it’s really hard to tell which set of parameters are contributing to a particular response,” ThirdAI’s Iyengar said.

LLMs Face Regulatory Challenges 

Federal legislation related to large language model use in the United States and other countries remains in ongoing development, making it difficult to apply an absolute conclusion across copyright and privacy cases. Due to this, legislation tends to vary by country, state or local area, and often relies on previous similar cases to make decisions. There are also sparse government regulations present for large language model use in high-stakes industries like healthcare or education, making it potentially risky to deploy AI in these areas.

LLMs Contribute to Environmental Concerns

Training deep learning models requires a significant amount of computational power, often leaving a rather large carbon and water footprint behind.

A 2019 research paper found that training just one model can emit more than 626,000 pounds of carbon dioxide — nearly five times the lifetime emissions of the average American car, including the manufacturing of the car itself. A 2023 paper found that training the GPT-3 language model required Microsoft’s data centers to use 700,000 liters of fresh water a day.

Of course, artificial intelligence has proven to be a useful tool in the ongoing fight against climate change, too. And work is being done to reduce LLMs’ water and carbon footprints. But the duality of AI’s effect on our world is forcing researchers, companies and users to reckon with how this technology should be used going forward.

More on Artificial IntelligenceExplore Built In’s AI Coverage

 

Examples of Large Language Models

Some of the most prominent large language models used today include the following:

GPT-4

GPT-4 is a large language model developed by OpenAI, and is the fourth version of the company’s GPT models. The multimodal model powers ChatGPT Plus, and GPT-4 Turbo helps power Microsoft Copilot. Both GPT-4 and GPT-4 Turbo are able to generate new text and answer user questions, though GPT-4 Turbo can also analyze images. The GPT-4o model allows for inputs of text, images, videos and audio, and can output new text, images and audio.

Gemini

Gemini is a family of large multimodal models developed by Google AI, and includes Gemini Ultra, Gemini Pro, Gemini Flash and Gemini Nano. Gemini models can input and interpret text, images, videos and audio, plus generate new text and images. Gemini Pro powers the Gemini chatbot, and it can be integrated into Gmail, Docs and other apps through Gemini Advanced.

Llama 3

Llama 3 is the third generation of Llama large language models developed by Meta. It is an open-source model available in 8B or 70B parameter sizes, and is designed to help users build and experiment with generative AI tools. Llama 3 is text-based, though Meta aims to make it multimodal in the future. Meta AI is one tool that uses Llama 3, which can respond to user questions, create new text or generate images based on text inputs.

Claude

Claude, developed by Anthropic, is a family of large language models comprised of Claude Opus, Claude Sonnet and Claude Haiku. It is a multimodal model able to respond to user text, generate new written content or analyze given images. Claude is said to outperform its peers in common AI benchmarks, and excels in areas like nuanced content generation and chatting in non-English languages. Claude Opus, Sonnet and Haiku are available as model options for the Claude AI assistant.

Frequently Asked Questions

A large language model is a type of algorithm that leverages deep learning techniques and vast amounts of training data to understand and generate natural language. Their ability to grasp the meaning and context of words and sentences enable LLMs to excel at tasks such as text generation, language translation and content summarization.

Prominent examples of large language models include GPT-3.5, which powers OpenAI’s ChatGPT and Claude 2.1, which powers Anthropic’s Claude.

A GPT, or a generative pre-trained transformer, is a type of language learning model (LLM). Because they are particularly good at handling sequential data, GPTs excel at a wide range of language related tasks, including text generation, text completion and language translation.

Explore Job Matches.