Machine translation is the use of artificial intelligence to automatically translate text and speech from one language to another. Using natural language processing and deep learning techniques, machine translation software analyzes the linguistic elements of the original language, recognizes how the words influence one another and then communicates their full meaning in a new language.
What Is Machine Translation?
Machine translation uses AI to automatically translate text and speech from one language to another. It relies on natural language processing and deep learning to understand the meaning of a given text and translate it into different languages without the need for human translators.
Popular machine translation tools include Google Translate and Microsoft Translator, both of which are capable of translating both spoken and written languages. They build on all the existing knowledge of natural language processing — including grammar, language understanding and language generation — and quickly produce translations into hundreds of different languages.
Machine translation is far from flawless, and these systems don’t produce translations as rapidly or as fluently as the devices depicted in science fiction stories like Hitchhiker’s Guide to the Galaxy or Star Trek. Still, this technology has come a long way over the decades and promises to be a major disruptor to language translation going forward.
How Does Machine Translation Work?
Machine translation dates back to the 1950s — when the United States used it to spy on Russia and other countries during the Cold War — making it “the original artificial intelligence application,” according to Maite Taboada, a linguistics professor at Simon Fraser University in British Columbia, Canada.
The methods used then required programming extensive bilingual dictionaries and grammar rules into computers by hand in order to translate one language into another. In the early 2000s, computers began to use machine learning to analyze text and make statistical predictions, determining the likelihood that a particular word or phrase in a source language would be a corresponding word or phrase in a target language.
Today, we rely on neural machine translation, which uses deep learning to learn new languages and then continuously improve on that knowledge using a specific machine learning method called neural networks, where input data passes through several interconnected nodes to generate an output — similar to the way the human brain works.
Neural machine translation software works with massive datasets, and considers the entire input sentence at each step of translation instead of breaking it up into individual words or phrases like other methods. It is more capable of capturing — even understanding — the intent or meaning of a sentence and, as a result, has quickly replaced many of the older statistical models.
A more recent breakthrough in neural machine translation was the creation of transformer neural networks — the “T” in GPT, which powers large language models, or LLMs, like OpenAI’s ChatGPT and Google’s Bard. Transformers learn patterns in language, understand the context of an input text and generate an appropriate output. This makes them particularly good at translating text into different languages.
Using a technique called “self-attention,” transformers can selectively focus on different parts of an input sentence, weigh their importance based on how relevant they are to each other, and identify important relationships between them so that it can accurately translate them into another language. They are also trained on massive amounts of bilingual text data, which helps them learn the nuances of different languages and improves their ability to generate accurate translations.
“With transformer models you also predict [the next word], just like any large language model. But you predict it in context,” Olga Beregovaya, the VP of AI and machine translation at translation company Smartling, told Built In. “While large language models are trained for a variety of tasks, the latest generation of LLMs equally performs well on translation tasks.”
At its most sophisticated level, machine translation is essentially a form of generative AI, where LLMs are used to automatically produce text. For instance, if a user prompts ChatGPT in English to give them a chocolate éclair recipe in French, the output is an example of machine translation.
The Next Iteration of Machine Translation
Up until this point, neural machine translation without the use of transformer models has been factually accurate, but lacked the fluidity of natural language. And AI-generated text has become quite conversational, but can be wildly wrong about things.
The next iteration of machine translation will likely combine the strengths of LLMs and neural machine translation to generate more natural and precise language translation. In fact, Beregovaya says it’s already happening with GPT-4, OpenAI’s most advanced language model.
“GPT-4 is already producing machine translation copy — often superior in quality, for certain translation directions, to neural machine translation,” she said. “Will there be actual technological convergence? That’s to be seen. But definitely they will learn and harvest from each other.”
Advantages of Using Machine Translation
Modern machine translation tools come with a lot of advantages, particularly in business applications.
Machine translation is essentially a “productivity enhancer,” according to Rick Woyde, the CTO and CMO of translation company Pairaphrase. It can provide consistent, quality translations at scale and at a speed and capacity no team of human translators could accomplish on its own.
And with ongoing improvements in machine learning algorithms and computing technology, machine translation will likely become even faster and more efficient going forward.
Learns on Its Own
Machine translation systems can also continue to learn thanks to unsupervised learning, a form of machine learning that involves processing unlabeled data inputs and outputs in order to predict outcomes. With unsupervised learning, a system can identify patterns and relationships between unlabeled data all on its own, allowing it to learn more autonomously.
This is ideal for machine translation. As more content gets produced and fed into it, the quality of their translations can improve. The engines can learn new words, phrases, and even languages over time.
Machine translation does a lot of the initial heavy lifting of language translation, minimizing the need for human involvement, which can reduce both cost and time to delivery. For instance, businesses can integrate a machine translation engine into their content management system to automatically translate the information on it into different languages without having to pay a team of people to do it by hand.
“You can do a lot more today with less people,” Woyde said. “The cost comparison is ridiculously in favor of the technology today.”
“The cost comparison is ridiculously in favor of the technology today.”
That’s not to say that machine translation will completely do away with human translators. Rather, their jobs will just change. As a machine translation model is being trained, human translators can make glossaries of specific terms and the correct translations for those terms. They become, in a sense, software engineers that dictate the rules a machine has to follow. Then, once the translation is done, they can go in and make edits or alterations where necessary.
That kind of work is especially important for creating a machine translation model that is more finely tuned to a specific industry or company. For example, the word “clutch” in the automotive industry means something very different than it does in the fashion industry, and a machine translation system may need a human to teach it that.
“With a glossary, you can reduce 50 percent of your mistakes right there,” Woyde said. “That’s kind of where we’re heading. Where you can use smaller amounts of data to improve the translation that you’re getting from a machine. And you can do that at scale.”
Machine translation can be a cheap and effective way to improve accessibility. Many major machine translation providers offer hundreds of languages, and they can deliver translations simultaneously for multiple languages at a time, which can be useful in reaching a multilingual audience quickly.
It’s not just about breaking down language barriers either. People who are blind or visually impaired can use machine translation-enabled text-to-speech technology so that a text can be translated and read out loud concurrently, allowing them to access information in a much more convenient way.
By eliminating language barriers and improving user experience, machine translation can boost the accessibility of content, products and services for audiences around the world.
Disadvantages of Using Machine Translation
While machine translation has come a long way, and continues to benefit businesses, it is not perfect. There are still several challenges related to the training of machine translation systems and many cases where this technology is not an ideal solution.
Is Trained on Biased Data
Like any AI model, machine translation systems only know what is put into it in its training dataset. And because deep learning uses unsupervised methods, they learn everything by pulling data in from the world — whether that data is biased or not. As a result, they inherit the same problems and biases that exist in the real world.
This is especially true for languages that must classify their nouns as either masculine or feminine, like French and Spanish. For example, if the words “doctor” and “nurse” are translated from English to Spanish, they have to have a gender tied to them. What genders the machine translation engine decides to use will likely be tied to the predominant gender associated with doctors and nurses in its training data.
“It sort of reproduces the world as it is, not as we want it to be.”
“It will predict that nurses are women and doctors are men,” linguistics professor Taboada said. “It sort of reproduces the world as it is, not as we want it to be.”
Meanwhile, other training data sets may have an outsized amount of data in some languages, and not nearly enough in others, which means the machine translation engine won’t work as accurately for those underrepresented languages. Its algorithms may not be able to differentiate between nuances like dialects, rendering the translations inadequate.
Fails to Grasp the Subtleties of Language
In many instances, machine translation will not generate an accurate output without some editing or assistance from humans. No matter how much data one throws into a machine translation engine, it will struggle with the subtleties of language.
Machine translation tends to get tripped up over different syntax or grammar rules that are specific to particular languages. And if an engine comes across rare or specialized vocabulary that it has not been trained on, such as industry terms or industry-specific jargon, it may spit out incorrect or incomplete translations if there isn’t a human in the loop to make edits.
And many languages contain idiomatic expressions that don’t make sense when translated literally. For example, having a “frog in one’s throat” doesn’t mean someone has an amphibian in their mouth; it means they’ve lost their voice. A machine translation engine would likely not pick up on that and just translate it literally, which could lead to some pretty awkward outputs in other languages.
This makes machine translation a less than optimal solution for translating more creative, like novels or even narrative journalism. Machine translation doesn’t have the nuance or contextual know-how to sift through War and Peace, a work of fiction originally written in Russian, and adequately translate it into any other language.
“Machine translation has no brain.”
“Machine translation has no brain,” Smartling’s Beregovaya said. “It’s a neural network, but it’s a mathematical model. And the mathematical model is not designed to understand the figures of speech.”
Struggles With Context
Although machine translation engines excel at parsing out entire sentences, they still struggle to understand one sentence’s relationship to the sentences before and after it. So, if a person wanted to translate “Mary is a doctor. The doctor walked into the room” into Spanish, the engine would correctly translate “doctor” to “médica” in the first sentence, but then incorrectly translate it to “médico” in the second sentence, because it does not remember the context of the doctor being a woman named Mary from the previous sentence.
That problem can show up in other forms of context, like tone or culture.
For example, some languages use different pronouns depending on the person being addressed — if a person is addressing their friend in French they would say “tu” for you, but if they’re addressing their boss they would say “vous.” A machine translation engine likely wouldn’t know that intricacy though, because it does not understand how French grammar intertwines with context and culture.
Machine Translation Use Cases
Machine translation typically performs best when the source content is more instructional and straightforward rather than creative, or if the end-goal is to get a point across quickly rather than generate a flawless and nuanced translation.
“It’s [good for] what we would call ‘gisting,’” Pairaphrase’s Woyde said, “where I want to get the gist of the idea.”
For companies with lots of employees spread out across the world, sending out uniform and comprehensive company-wide communications can be difficult to manage. Language skills can vary from office to office, employee to employee, and some may not be proficient in the company’s official language of operations.
Machine translation can help lower or eliminate this language barrier by allowing companies to translate their internal communications at scale. This can be useful in creating tech support tickets, company bulletins, presentations and training materials.
The same can be said for external communications as well, where a company wants to be able to reach a global audience with efficiency. It’s good for translating videos, blog posts, marketing materials and user generated content like product reviews.
For example, Beregovaya says companies like Tripadvisor have been using machine translation to translate all of their user reviews for years, allowing customers to figure out what the best restaurant in Santorini is, for example, without having to know Greek.
Highly Regulated Content
For both external and internal communications, machine translation can be done with or without a human translator in the loop, so long as it isn’t imperative that the material is perfectly fluent in the translated language.
Machine translation with humans involved in either the training or post-editing are more for content that is perhaps too complicated for a machine translation engine to handle on its own. Or the stakes may be too high if the engine gets something wrong. This is good for translating content in highly regulated spaces like law and medicine — so things like patents, lawsuits, clinical trial results and drug warnings.
“With a human in the loop you produce 100 percent adequate, usable, fluid, grammatically correct, on-brand translations,” Beregovaya said. “From there, use cases expand indefinitely.”
Machine Translation Tools
Here are a handful of machine translation tools ushering in a new era of tech-enabled language translation.
Arguably the most popular machine translation tool, Google Translate offers free translation services in more than 100 languages. It was among the first engines of its kind to implement neural machine translation, now a standard practice in the industry.
Using neural machine translation, the platform translates text that is typed right into its interface. And it’s integrated with Google Docs to allow users to translate text directly there. Users can also take a picture of something — a street sign or a newspaper, for example — and Google Translate will automatically translate the text in that image to a different language.
Microsoft Translator allows users to translate everything from real-time conversations to menus to Word documents. It also has a Custom Translator feature meant specifically for enterprise businesses, app developers and language service providers to build a neural translation system to fit their own needs. With Custom Translator, users can also customize text using the Translator service on Azure, and speech translation using the Speech service in Azure.
Microsoft also offers custom translation features made specifically for education, providing tools that can translate and caption lectures and presentations, parent-teacher conferences and study groups.
With Pairaphrase, companies can translate anything from scanned PDFs to emails. Once they’ve done one translation, the platform retains that information and uses machine learning to improve its quality over time.
Pairaphrase also offers a data security component — an important distinction in a time when generative AI and other artificial intelligence models are posing new kinds of data privacy risks. The platform allows companies to keep all the proprietary documents, translations, glossaries and so on completely confidential and secure, and never publicly shares it or indexes it in search engines.
Amazon Translate uses neural machine translation to enable high quality and fast language translations. The platform is continuously improving to produce more accurate translations over time and is consistently adding new languages.
Translate can be integrated into a company’s other channels, and can process content in various formats. Its customization and scalability makes it easy to use for all kinds of projects, from translating user generated content to adding real-time translation within chat, email, help desk and ticketing applications.
Smartling’s machine translation tool is used by hundreds of companies, including Lyft, Shopify and Peloton to automate and create multilingual websites, marketing campaigns, web and mobile products and customer experiences.
Its cloud-based machine translation management platform offers AI-powered content and workflow management, performance and progress dashboards, and automated content ingestion. Customers can either use one of Smartling’s human translators, with whom they can communicate with directly and share style guides and glossaries, or its neural machine translation engine.
Unbabel’s so-called “LangOps” platform combines both human and machine translation to help businesses provide multilingual customer experience services and expand into new markets. This includes real-time chat translations between customer services agents and customers, press releases, email marketing campaigns, and e-books and white papers.
Unbable is able to integrate directly into a company’s CRM and lawyer into the digital channels they already use, including email, chat and social media. The company claims it can help businesses roll out their content up to 65 percent faster and cut costs by more than half compared to just using human translators.