Believe it or not, a computer could have written this entire article using artificial intelligence, and you may never have noticed the difference. For the record, a computer didn’t write this article. But it could have — and that is absolutely mindblowing if you think about it.

Consider all of the work that goes into writing an article like this. A person has to not only understand what each and every word means, but also how to string them together in a way that makes sense. A person can use figurative language to get a specific point across, or rhetorical questions to engage with the reader. It can take hours or even days of work to piece together a couple thousand words in a style anyone can understand.

What Is Natural Language Generation?

Natural language generation, or NLG, is a subfield of artificial intelligence that produces natural written or spoken language. NLG is especially useful in automating content creation, fostering higher quality interactions between humans and machines, distilling complex information in understandable ways, and much more. 

But, in truth, all of this can be easily replicated in a matter of minutes by a computer using a process called natural language generation. 

Otherwise known as NLG, natural language generation makes machines capable of producing written or spoken language in the same style as a human with nothing more than some data. It is being used to not only create songs, movies scripts and speeches, but also report the news and practice law. It’s official: Natural language is not exclusive to humans anymore.

Take a Deeper Dive Press Tab to Accept: How AI Redefines Authorship

 

What Is Natural Language Generation?

Natural language generation is the use of artificial intelligence programming to produce written or spoken language from a data set. Put simply, it is “creating language from scratch,” Ellen Loeshelle, a director of product management at experience management software company Qualtrics, told Built In.

The language produced by NLG software is the result of both structured and unstructured data. Structured data is searchable and organized, while unstructured data is in its native form. And sophisticated software can mine large quantities of this data, identify patterns within it and communicate that information in a way that is easy for people to understand. 

NLG is especially useful for producing content such as blogs and news reports, or creating personalized customer engagement materials at scale. Conversational AI technology like chatbots and virtual assistants rely on artificially generated natural language in order to more effectively communicate with their users. And NLG’s ability to interpret complex graphs, tables, spreadsheets and data in a way that is understandable makes it an ideal process for generating automated status reports, maintenance updates and other analytics in the industrial IoT space.

Common Uses of Natural Language Generation

  • Creating personalized customer engagement materials
  • Advanced monitoring for industrial IoT devices
  • Content creation (blogs, news reports, etc.)
  • Interpreting graphs, tables and spreadsheets
  • Conversational AI such as chatbots and virtual assistants

At its best, NLG output can sound so natural that it appears to be produced by a human. This has only been possible for a few years, and “it’s only the tip of the iceberg,” Jay Alammar, a director and engineering fellow at natural language processing company Cohere, told Built In. “As with every new technology, it takes some time to really understand where this technology excels and where it falls short.”

 

How Does Natural Language Generation Work?

There are lots of ways to think about how natural language generation works. One is by a series of steps, which have been outlined in a blog post published by tech consulting firm Mutual Mobile. 

The first step is content analysis, which is where all the data, both structured and unstructured, is analyzed and filtered so that the final text generated addresses the user’s needs. Next, the NLG system has to make sense of that data, which involves identifying patterns and building context. Then comes data structuring, which involves creating a narrative based on the data being analyzed and the desired result (blog, report, chat response and so on). Then, through grammatical structuring, the words and sentences are rearranged so that they make sense in the given language. Finally, before the output is produced, it runs through any templates the programmer may have specified and adjusts its presentation to match it in a process called language aggregation.

One result of these steps is written natural language. This can come in the form of a blog post, a social media post or a report, to name a few.

For instance, Qualtrics has a program in beta right now called Automated Call Summaries, which can be used in call centers as a way to take and maintain notes about specific customers and their experiences, Loeshelle said. Typically, these notes are written up by a human agent, but Qualtrics has figured out a way to use NLG to summarize these calls automatically. This not only gives back some time to the humans, but it also helps ensure that notes on customer calls are clear and consistent.

To accomplish this, Loeshelle said Qualtrics relies on two primary approaches: extractive and abstractive. An extractive approach means a body of text (a book or an article, for example) is taken, and the sentences within that body of text that are representative of the entire piece are stitched together to create a synopsis of what happened. An abstractive approach involves more of the concepts and themes of the body of text, which are then used as data inputs for a piece of generated text.

“Extractive works well when the original body of text is well written, is well formatted, is single speaker. It’s highly grammatical, well organized,” Loeshelle said, adding that it “completely falls apart” when applied to an automatically transcribed conversation, or when informal language is used. So, Qualtrics uses a “hybrid” approach, where they have their customers build out a structure or format dictating exactly what they want their summaries to say and how they want them to look, allowing them to then plant in variables that align with a given conversation. Loeshelle likens it to a kind of Mad Libs that is both abstractive and “highly controlled” according to what the business needs.

A quick explainer on how Google applies natural language generation to its Google Home device. | Source: Google Cloud Tech

NLG can also produce natural spoken language — a common example is AI voice assistants like Amazon’s Alexa, Apple’s Siri or the Google Home Assistant. Interactions with these devices exist solely as conversations, where a user asks a question or makes a statement and the device offers an answer. 

In order to achieve a more natural exchange between a person and their device, two things need to happen, said Justin Zhao, a former Google research engineer, in a video explainer produced by Google. The first is that the content of the conversation has to make sense in the context of the conversation — was the response appropriate given the question asked? Figuring this out requires the use of structured data, or data organized in a standardized format. For example, if a user asks their Google Home virtual assistant what the weather is in Manhattan, Google has to first gather all the structured data about the current weather, and then translate that data into a natural language response that adequately answers the question.

That brings us to the second requirement that the language be used correctly: Is it grammatically correct? Do verbs agree? This is done at scale using machine learning, which allows the system to not only learn the correct way to use language, but do it in a way that is more natural. Recurrent neural networks, or neural networks that recognize and remember patterns in data to make predictions, are a gold standard here. For every word in the dictionary, RNNs assign a probability weight. And as the text unfolds, they take the current word, scour through the list and pick a word with the closest probability of use.

“It’s important to keep in mind that language, in general, is extremely sequential,” Zhao said in the video. Order matters. “RNNs are especially good at remembering what it saw earlier, because it enforces a sequential policy over the data. The inputs are decided in a very ordered manner.” 

Elsewhere in AI Meet the People Bringing Artificial Intelligence Products to Life

 

NLP vs. NLG vs. NLU

To understand how natural language generation fits into the larger artificial intelligence ecosystem, one must first understand natural language processing, or NLP — an umbrella term that refers to the use of computers to understand both written and spoken human language. If NLG is a building, NLP is the foundation.

Natural language processing uses both machine learning and deep learning techniques in order to complete tasks such as language translation and question answering, converting unstructured data into a structured format. It accomplishes this by first identifying named entities through a process called named entity recognition, and then identifying word patterns using methods like tokenization, stemming and lemmatization.

“I think of natural language processing as very much the foundational technology that makes natural language generation possible.”

In short: Natural language processing is understanding the “pieces” of language, Qualtrics’ Loeshelle said, which is essential to generating language. 

“In order to create language, you have to understand language. You have to understand its component parts, how they work together, what they mean, what it sounds like to be a native speaker,” she explained. “I think of natural language processing as very much the foundational technology that makes natural language generation possible.”

Specifically, NLG derives from the natural language processing method called large language modeling, which is trained to predict words from the words that came before it. If a large language model is given a piece of text, it will generate an output of text that it thinks makes the most sense.

“If you train a large enough model on a large enough data set,” Cohere’s Alammar said, “it turns out to have capabilities that can be quite useful.” This includes summarizing texts, paraphrasing texts and even answering questions about the text. It can also generate more data that can be used to train other models — this is referred to as synthetic data generation.

NLP vs. NLG vs. NLU

  • Natural language processing: NLP converts unstructured data into a structured data format in order to allow machines to not only understand written and spoken language, but formulate a relevant and coherent response. This is the foundation of both natural language generation and natural language understanding.
  • Natural language generation: NLG focuses on producing natural written or spoken language based on a given data set.
  • Natural language understanding: NLU focuses on enabling computers to actually comprehend the intent of written or spoken language using syntactic and semantic analysis.

Meanwhile, natural language understanding, or NLU, is another branch of the NLP tree. Using syntactic (grammar structure) and semantic (intended meaning) analysis of text and speech, NLU enables computers to actually comprehend human language. NLU also establishes relevant ontology, a data structure that specifies the relationships between words and phrases. 

Humans are able to do all of this intuitively — when we see the word “banana” we all picture an elongated yellow fruit; we know the difference between “there,” “their” and “they’re” when heard in context. But computers require a combination of these analyses to replicate that kind of understanding.

NLU has many practical applications. One is text classification, which analyzes a piece of open-ended text and categorizes it according to pre-set criteria. For instance, if you have an email coming in, a text classification model could automatically forward that email to the correct department. It can also be applied to search, where it can sift through the internet and find an answer to a user’s query, even if it doesn’t contain the exact words but has a similar meaning.

A common example of this is Google’s featured snippets at the top of a search page. 

In some cases, natural language understanding also consists of speech recognition. While speech recognition technology captures spoken language in real-time, transcribes it and returns it as text, natural language understanding goes beyond that — determining a user’s intent through machine learning. 

Both NLP and NLU are vital to a successful NLG model. In order to generate natural language, a computer needs to be able to both process and understand natural human language.

More on Natural Language Processing 13 Natural Language Processing Examples to Know

 

The Future of Natural Language Generation

These days, in an era where content is king, natural language generation appears to be the way forward. Its ability to analyze and describe massive amounts of data in a human-like manner at rapid speeds continues to not only dazzle, but stoke ongoing fears of AI’s capacity to take human jobs

But there is reason to believe this won’t necessarily be the case. Rather, NLG software can be quite beneficial to their human counterparts, Alammar said, particularly when it comes to helping writers scale their work. For instance, if a company has to write hundreds or thousands of variations of text (think targeted ads), these models can speed up the process.

Instead of replacing humans altogether, natural language generation can “help the creative process,” Alammar said. “[NLG] might not necessarily generate the final draft for you, but it can help you brainstorm,” he continued, likening it to a new tool in a toolbox, joining other longtime writer lifesavers like spell check. “This is an extension of this family of writing aids.” 

Afterall, like most other artificial intelligence, NLG still requires quite a bit of human intervention, meaning it is not perfect. We’re continuing to figure out all the ways natural language generation can be misused or biased in some way. And we’re finding that, a lot of the time, text produced by NLG can be flat-out wrong, which has a whole other set of implications. 

“It can make mistakes. It can generate text that is totally plausible, but is factually incorrect.”

“It can make mistakes. It can generate text that is totally plausible, but is factually incorrect,” Loeshelle said. And with grade school students and news outlets alike beginning to incorporate NLG in their own work, it’s easy to see how natural language generation could lead to fake news generation. “It could go really wrong really fast. … That’s the part that scares me about generative text or imagery or video or audio — there’s no signature to say that this is real or not real,” she continued. “I think that’s a huge challenge that this space is still tackling. How do we ensure its integrity?”

Still, natural language generation has come a long way in a fairly short amount of time. Things that were “absolutely out of the range of possibility” just a short time ago “suddenly seem very doable,” Loeshelle added. 

“Natural language generation is going to give us the ability to provide information to everyone in the format that they want to receive it, at the time that they want to receive it, at hopefully a much lower cost,” Loeshelle said. “It’s going to be pretty awesome.”

Great Companies Need Great People. That's Where We Come In.

Recruit With Us