What Is Natural Language Generation?

Natural language generation is the use of artificial intelligence programming to produce written or spoken language from a data set. It is used to not only create songs, movies scripts and speeches, but also report the news and practice law.

What Is Natural Language Generation?

Natural language generation, or NLG, is a subfield of artificial intelligence that produces natural written or spoken language. NLG enhances the interactions between humans and machines, automates content creation and distills complex information in understandable ways.

At its best, NLG output can sound so natural that it appears to be produced by a human. This has only been possible for a few years, and “it’s only the tip of the iceberg,” said Jay Alammar, director and engineering fellow at natural language processing company Cohere. “As with every new technology, it takes some time to really understand where this technology excels and where it falls short.”

Find out who's hiring.

See all Developer + Engineer jobs at top tech companies & startups

View Jobs

How Does Natural Language Generation Work?

To better understand how natural language generation works, it may help to break it down into a series of steps.

1. The first step is content analysis, which is where all the data, both structured and unstructured, is analyzed and filtered so that the final text generated addresses the user’s needs. (Structured data is searchable and organized, while unstructured data is in its native form.)

2. Next, the NLG system has to make sense of that data, which involves identifying patterns and building context.

3. Then comes data structuring, which involves creating a narrative based on the data being analyzed and the desired result (blog, report, chat response and so on).

4. Then, through grammatical structuring, the words and sentences are rearranged so that they make sense in the given language.

5. Finally, before the output is produced, it runs through any templates the programmer may have specified and adjusts its presentation to match it in a process called language aggregation.

One result of these steps is written natural language. This can come in the form of a blog post, a social media post or a report, to name a few. But there are more use cases.

Natural Language Generation Use Cases

Personalizing Customer Engagement Materials

Qualtrics, the experience management software company, has a program in beta called Automated Call Summaries, which can be used in call centers as a way to take and maintain notes about specific customers and their experiences. Qualtrics summarizes calls using two primary approaches: extractive and abstractive.

“Extractive works well when the original body of text is well-written, is well-formatted, is single speaker. It’s highly grammatical, well organized,” Ellen Loeshelle, Qualtrics director of product management, told Built In, adding that it “completely falls apart” when applied to an automatically transcribed conversation, or when informal language is used.

So, Qualtrics uses a “hybrid” approach, where they have their customers build out a structure or format dictating exactly what they want their summaries to say and how they want them to look, allowing them to then plant in variables that align with a given conversation.

Creating Written Content

NLG is especially useful for producing content such as blogs and news reports, thanks to tools like ChatGPT. ChatGPT can produce essays in response to prompts and even responds to questions submitted by human users. The latest version of ChatGPT, ChatGPT-4, can generate 25,000 words in a written response, dwarfing the 3,000-word limit of ChatGPT. As a result, the technology serves a range of applications, from producing cover letters for job seekers to creating newsletters for marketing teams.

Powering Conversational AI

NLG can also produce natural spoken language in the form of conversational AI — a common example is AI voice assistants like Amazon’s Alexa, Apple’s Siri or the Google Home Assistant. Interactions with these devices exist solely as conversations, where a user asks a question or makes a statement and the device offers an answer.

Monitoring Industrial IoT Devices

When it comes to interpreting data contained in Industrial IoT devices, NLG can take complex data from IoT sensors and translate it into written narratives that are easy enough to follow. Professionals still need to inform NLG interfaces on topics like what sensors are, how to write for certain audiences and other factors. But with proper training, NLG can transform data into automated status reports and maintenance updates on factory machines, wind turbines and other Industrial IoT technologies.

Interpreting Graphs, Tables and Spreadsheets

AI art generators already rely on text-to-image technology to produce visuals, but natural language generation is turning the tables with image-to-text capabilities. By studying thousands of charts and learning what types of data to select and discard, NLG models can learn how to interpret visuals like graphs, tables and spreadsheets. NLG can then explain charts that may be difficult to understand or shed light on insights that human viewers may easily miss.

Common Uses of Natural Language Generation

Personalizing customer engagement materials
Creating written content
Powering conversational AI
Monitoring industrial IoT devices
Interpreting graphs, tables and spreadsheets

Types of Natural Language Generation Algorithms

NLG’s improved abilities to understand human language and respond accordingly are powered by advances in its algorithms. Below are four NLG algorithms to keep in mind.

Markov Chain

The Markov chain is one of the earliest NLG algorithmic models. Within the context of natural language generation, a Markov chain assesses the relationships between current words in a sentence, considers the probability of what the next word could be based on these relationships and then tries to predict the next word in the sentence. Text suggestions on smartphone keyboards is one common example of Markov chains at work.

Recurrent Neural Network

Recurrent neural networks mimic how human brains work, remembering previous inputs to produce sentences. For every word in the dictionary, RNNs assign a probability weight. As the text unfolds, they take the current word, scour through the list and pick a word with the closest probability of use. Although RNNs can remember the context of a conversation, they struggle to remember words used at the beginning of longer sentences. As a result, their lengthier sentences tend to make less sense.

Long Short-Term Memory

Like RNNs, long short-term memory (LSTM) models are good at remembering previous inputs and the contexts of sentences. LSTMs are equipped with the ability to recognize when to hold onto or let go of information, enabling them to remain aware of when a context changes from sentence to sentence. They are also better at retaining information for longer periods of time, serving as an extension of their RNN counterparts.

Transformer

First introduced by Google, the transformer model displays stronger predictive capabilities and is able to handle longer sentences than RNN and LSTM models. While RNNs must be fed one word at a time to predict the next word, a transformer can process all the words in a sentence simultaneously and remember the context to understand the meanings behind each word. This process makes it faster to generate cohesive sentences.

NLP vs. NLU vs. NLG

To understand how natural language generation fits into the larger artificial intelligence ecosystem, one must first understand natural language processing (NLP) — a subset of computational linguistics that refers to the use of computers to understand both written and spoken human language. If NLG is a building, NLP is the foundation.

NLP vs. NLU vs. NLG

Natural language processing: NLP converts unstructured data into a structured data format, so machines can not only understand written and spoken language, but formulate a relevant and coherent response.
Natural language understanding: NLU focuses on enabling computers to actually comprehend the intent of written or spoken language using syntactic and semantic analyses.
Natural language generation: NLG focuses on producing natural written or spoken language based on a given data set.

Natural Language Processing

Natural language processing (NLP) uses both machine learning and deep learning techniques in order to complete tasks such as language translation and question answering, converting unstructured data into a structured format. It accomplishes this by first identifying named entities through a process called named entity recognition, and then identifying word patterns using methods like tokenization, stemming and lemmatization.

“I think of natural language processing as very much the foundational technology that makes natural language generation possible.”

In short: Natural language processing is understanding the “pieces” of language, Qualtrics’ Loeshelle said, which is essential to generating language.

“In order to create language, you have to understand language. You have to understand its component parts, how they work together, what they mean, what it sounds like to be a native speaker,” she explained. “I think of natural language processing as very much the foundational technology that makes natural language generation possible.”

More on Natural Language Processing 13 Natural Language Processing Examples to Know

Natural Language Understanding

Natural language understanding (NLU) is another branch of the NLP tree. Using syntactic (grammar structure) and semantic (intended meaning) analysis of text and speech, NLU enables computers to actually comprehend human language. NLU also establishes relevant ontology, a data structure that specifies the relationships between words and phrases.

Humans are able to do all of this intuitively — when we see the word “banana” we all picture an elongated yellow fruit; we know the difference between “there,” “their” and “they’re” when heard in context. But computers require a combination of these analyses to replicate that kind of understanding.

NLU has many practical applications. One is text classification, which analyzes a piece of open-ended text and categorizes it according to pre-set criteria. For instance, if you have an email coming in, a text classification model could automatically forward that email to the correct department.

It can also be applied to search, where it can sift through the internet and find an answer to a user’s query, even if it doesn’t contain the exact words but has a similar meaning. A common example of this is Google’s featured snippets at the top of a search page.

In some cases, natural language understanding also consists of speech recognition. While speech recognition technology captures spoken language in real-time, transcribes it and returns it as text, natural language understanding goes beyond that — determining a user’s intent through machine learning.

Natural Language Generation

NLG derives from the natural language processing method called large language modeling, which is trained to predict words from the words that came before it. If a large language model is given a piece of text, it will generate an output of text that it thinks makes the most sense.

“If you train a large enough model on a large enough data set,” Alammar said, “it turns out to have capabilities that can be quite useful.” This includes summarizing texts, paraphrasing texts and even answering questions about the text. It can also generate more data that can be used to train other models — this is referred to as synthetic data generation.

But NLP and NLU are equally vital to a successful NLG model. According to the principles of computational linguistics, a computer needs to be able to both process and understand human language in order to general natural language.

The Future of Natural Language Generation

Natural language generation’s ability to analyze and describe massive amounts of data in a human-like manner at rapid speeds continues to not only dazzle, but stoke ongoing fears of AI’s capacity to take human jobs. But NLG software can be quite beneficial to their human counterparts, Alammar said, particularly when it comes to helping writers scale their work.

Instead of replacing humans altogether, natural language generation can “help the creative process,” Alammar said. “[NLG] might not necessarily generate the final draft for you, but it can help you brainstorm,” he continued, likening it to a new tool in a toolbox, joining other longtime writer lifesavers like spell check. “This is an extension of this family of writing aids.”

Like most other artificial intelligence, NLG still requires quite a bit of human intervention. We’re continuing to figure out all the ways natural language generation can be misused or biased in some way. And we’re finding that, a lot of the time, text produced by NLG can be flat-out wrong, which has a whole other set of implications.

“It can make mistakes. It can generate text that is totally plausible, but is factually incorrect.”

“It can make mistakes. It can generate text that is totally plausible, but is factually incorrect,” Loeshelle said. And with grade school students and news outlets alike beginning to incorporate NLG in their own work, it’s easy to see how natural language generation could lead to fake news generation. “It could go really wrong really fast. … That’s the part that scares me about generative text or imagery or video or audio — there’s no signature to say that this is real or not real,” she continued. “I think that’s a huge challenge that this space is still tackling. How do we ensure its integrity?”

Of course, this doesn’t change the fact that natural language generation has come a long way in a fairly short amount of time and holds exciting possibilities.

“Natural language generation is going to give us the ability to provide information to everyone in the format that they want to receive it, at the time that they want to receive it, at hopefully a much lower cost,” Loeshelle said. “It’s going to be pretty awesome.”