What Is Google Gemini?

Here’s everything you need to know about Google’s latest generative AI model.

Written by Ellen Glover
Published on Mar. 13, 2024
What Is Google Gemini?
Image: Shutterstock

Gemini is a generative AI model developed by Google to power its AI chatbot of the same name. The model comes in three different sizes and is being incorporated into several Google products, including Gmail, Docs and its search engine.

What is Google Gemini?

Gemini is an AI model created by Google to power many of its products, including its chatbot, also named Gemini (formerly Bard), as well Gmail, Docs and its search engine. Available in three different sizes, Gemini is multimodal and can respond to text, image and audio.

Gemini is multimodal, meaning its capabilities span text, image and audio applications. It can generate natural written language, transcribe speeches, create artwork, analyze videos and more, although not all of these capabilities are yet available to the general public. Like other AI models, Gemini is expected to get better over time as the industry continues to advance.

 

What Is Google Gemini?

Gemini is Google’s multimodal foundation model that the company is integrating across several of its products. Gemini is Google’s answer to OpenAI’s GPT-4, the multimodal large language model (LLM) that powers the paid version of ChatGPT, the success of which kicked off a generative AI arms race, as several tech companies have since scrambled to bring the latest and greatest products to market.

Launched in December of 2023, Gemini is Google’s largest and most capable model to date, according to the company. It was developed by Google’s AI research labs DeepMind and Google Research, and is the culmination of nearly a decade of work.

The model comes in three different versions, which vary in size and complexity:
 

Gemini Ultra

This is the “largest” and “most capable” model for performing highly complex tasks, according to Google. The company says it outperforms GPT-4 on the majority of the most used academic benchmarks in LLM research and development, as well as various multimodal tasks. The model is being incorporated into several of Google’s most popular products, including Gmail, Docs, Slides and Meet. For $19.99 a month, users can access Gemini Ultra through the Gemini Advanced service.

 

Gemini Pro

Gemini pro is the middle-tier model, designed to understand complex queries and respond to them quickly, making it the best model for “scaling across a wide range of tasks,” as Google put it. A specially trained version of Pro is currently powering the AI chatbot Gemini and is available via the Gemini API in Google AI Studio and Google Cloud Vertex AI.

 

Gemini Nano

A much smaller version of the Pro and Ultra models, Gemini Nano is designed to be efficient enough to perform tasks directly on smart devices, instead of having to connect to external servers. Nano currently powers features on the Pixel 8 Pro like Summarize in the Recorder app and Smart Reply in the Gboard virtual keyboard app.

Related ReadingGrok: What We Know About Elon Musk’s Chatbot

 

How Does Google Gemini Work?

At a high level, the Gemini model can see patterns in data and generate new, original content based on those patterns.  

To accomplish this, Gemini was trained on a large corpus of data. Like GPT-4 and several other LLMs, Gemini is a “closed-source model,” generative AI expert Ritesh Vajariya told Built In, meaning Google has not disclosed what specific training data was used. But the model’s dataset is believed to include annotated YouTube videos, queries in Google Search, text content from Google Books and scholarly research from Google Scholar. (Google has said that it did not use any personal data from Gmail or other private apps to train Gemini.)

After training, Gemini leveraged several neural network techniques to better understand its training data. Specifically, Gemini was built on Transformer — a neural network architecture Google invented in 2017 that is now used by virtually all LLMs, including the ones that power ChatGPT. 

When a user types a prompt or query into Gemini, the transformer generates a distribution of potential words or phrases that could follow that input text, and then selects the one that is most statistically probable. “It starts by looking at the first word, and uses probability to generate the next word, and so on,” AI expert Mark Hinkle told Built In.

Gemini can also process images, videos and audio. It was trained on trillions of pieces of text, images (along with their accompanying text descriptions), videos and audio clips. And it was further fine-tuned using reinforcement learning with human feedback (RLHF), a method that incorporates human feedback into the training process so the model can better align its outputs with user intent. 

By training on all these mediums at once, Google claims Gemini can “seamlessly understand and reason about” a variety of inputs, such as reading the text on a photo of a sign, or generating a story based on an illustration.

More From GoogleAt Google, We Win Over Our Customers With Imperfect Data. Here’s How.

 

What Can Google Gemini Do?

Gemini is a multimodal model, so it is capable of responding to a range of content types, whether that be text, image, video or audio.
 

Generate Text

Gemini can generate text, whether that’s used to engage in written conversations with users, proof-read essays, write cover letters or translate content into different languages. Gemini can also understand, explain and generate code in some of the most popular programming languages, including Python, Java, C++ and Go.

Like any other LLM, though, Gemini has a tendency to hallucinate, or generate text that is incorrect or illogical. “The results should be used with a lot of care,” Subodha Kumar, a professor of statistics, operations and data science at Temple University’s Fox School of Business, told Built In. “They can come with a lot of errors.”

 

Produce Images

Gemini is able to generate images from text prompts, similar to other AI art generators like Dall-E, Midjourey and Stable Diffusion. 

This capability was temporarily halted to undergo retooling after Google was criticized on social media for producing images that depicted specific white people (like the U.S. Founding Fathers) as people of color. Up to this point, image generators have developed a reputation for amplifying and perpetuating biases about certain races and genders. Google’s apparent attempts to avoid this pitfall may have gone too far in the other direction, though, serving as yet another example of how AI tools continue to struggle with the concept of race.

 

Analyze Images and Videos

Gemini can accept image inputs, analyze what is going on in those images and explain that information via text for users. Images can be anything from a photograph to a chart to a pencil sketch. A user can take a photo of a flat tire and ask Gemini how to fix it, for instance, or ask Gemini for help on their physics homework by drawing out the problem.

Gemini can also process and analyze videos, which allows it to generate descriptions of what is going on in a given clip, as well as answer questions about it. Google published a live demo in which Gemini was able to process a 44-minute silent film and identify specific moments within it. In another demo, Gemini appeared to recognize an illustration of a duck, hand puppets, sleight-of-hand tricks and other videos. It’s worth noting, however, that the latter demo was taped and later edited by Google. Instead of responding to actual video prompts, the model was responding to more detailed text and image prompts, and taking a lot longer to do it than was shown in the demonstration.

 

Understand Audio

When fed audio inputs, Gemini can support speech recognition across more than 100 languages, and assist in various language translation tasks — as shown in this Google demonstration. 

 

Streamline Workflows 

Because Gemini is a Google product, it can be integrated into several Google Workspace products, including Gmail, Docs and Drive. 

For example, users can query Gemini (through its chatbot interface) to find a document in their Drive and summarize it, as shown in this demo. Or automatically generate specific emails, as shown here in this other demo. “It becomes a little bit of an assistant in that sense,” Gen Furukawa, an AI expert and entrepreneur, told Built In.

More on Generative AIBest Use-Cases for Generative AI in 2024

 

Is Gemini Better Than GPT-4?

According to Google, Gemini — specifically Gemini Ultra — outperformed GPT-4 on almost all of the most used academic benchmarks in LLM research and development, such as reading comprehension, code generation and basic mathematics. It also beat out GPT-4 in a range of multimodal tasks, including automatic speech translation, infographic understanding and visual question answering, which enables an AI model to answer questions about a given image.

By these metrics, Gemini appears to perform better than GPT-4. It’s important to note, however, that the scores Google points to are only marginally better than GPT-4, perhaps indicating that Gemini Pro (the smaller model size that powers the Gemini chatbot) likely doesn’t come out ahead of GPT-4. And most of Gemini Ultra’s scores do not beat the most advanced version of Anthropic’s Claude 3 model, which has outperformed GPT-4 on every benchmark, setting a new standard for AI performance.

 

Gemini vs. GPT-4

Both the Gemini and GPT-4 language models share several similarities in their underlying architecture and capabilities. But they also have some significant differences that impact the user experience and functionalities of their associated chatbots, Gemini and ChatGPT, respectively.
 

Gemini is ‘Natively Multimodal,’ GPT-4 is Not

The Gemini model was designed to be “natively multimodal,” as Google put it, so it was trained and fine-tuned on petabytes of audio, image, video and text data, as well as a large codebase. This means the Gemini chatbot can understand, combine and reason across these different data types seamlessly, without any plug-ins or extra steps. 

GPT-4 is also multimodal — it accepts both images and text as inputs and produces natural language text outputs — but it was not trained on the same variety of data Gemini was. So, if users want to generate their own images, for example, they have to feed prompts into a separate plug-in (the DALL-E 3 text-to-image model, in this case) offered to ChatGPT Plus subscribers.

In short: the GPT-4 model does not support a diverse range of modalities in the same way Gemini does, which means that the Gemini chatbot and ChatGPT perform multimodal tasks in different ways.

 

Gemini Has Real-Time Access to the Internet, GPT-4 Does Not

Gemini has real-time access to Google’s search index, which can “keep feeding” the model information, Hinkle said. So the Gemini chatbot can draw on data pulled from the internet to answer queries, and is fine-tuned to select data chosen from sources that fit specific topics, such as scientific research or coding. 

On its own, GPT-4 has no real-time web access, and only knows information up to 2024. But, subscribers to ChatGPT Plus get access to a plug-in that allows them to browse Bing, a search engine owned and operated by OpenAI’s biggest partner, Microsoft.  

 

Gemini Was Trained on TPUs, GPT-4 Was Trained on GPUs

Google trained Gemini on its in-house AI chips, called tensor processing units (TPUs). Specifically, it was trained on the TPU v4 and v5e, which were explicitly engineered to accelerate the training of large-scale generative AI models. In the future, Gemini will be trained on the v5p, Google’s fastest and most-efficient chip yet. Meanwhile, GPT-4 was trained on Nvidia’s H100 GPUs, one of the most sought-after AI chips today. 

TPUs are designed to handle the computational demands of machine learning with more speed and efficiency than GPUs, making them an essential component of the AI industry’s future.

Looking to the FutureWhat Is Artificial General Intelligence?

 

How to Access Google Gemini

Gemini can be accessed in several ways:

For free: You can head to gemini.google.com and use it for free through the Gemini chatbot. Or you can download the Gemini app on your smartphone. Android users can also replace Google Assistant with Gemini. 

Paid version: You can also subscribe to the Gemini Advanced service for $19.99 a month, where you can access updated versions of popular products like Gmail, Docs, Slides and Meet — all of which have Gemini Ultra built into them. 

Gemini is a work in progress, so it might generate answers that are inaccurate, unhelpful or even offensive. And it retains users’ conversations, location, feedback and usage information, according to Google’s privacy policy. So users may want to avoid consulting Gemini for professional advice on sensitive or high-stakes subjects (like health or finance), and refrain from discussing private or personal information with the AI tool.

 

Frequently Asked Questions

Gemini is an AI tool that can answer questions, summarize text and generate content. It also plugs into other Google services like Gmail, Docs and Drive to serve as a productivity booster. And, because Gemini is multimodal, its capabilities span across text, images and audio. So, in addition to generating natural written language, it can transcribe speeches, create artwork, analyze videos and more, according to Google.

According to Google, Gemini Ultra (the model’s most advanced version) outperformed GPT-4 on the majority of the most used academic benchmarks in language model research and development, as well as various multimodal tasks. But the margins were slim, indicating that Gemini Pro (the smaller model size that powers the Gemini chatbot) likely doesn’t come out ahead of GPT-4.

Gemini Pro, Google’s middle-tier model, is available for free at gemini.google.com. There is also a free mobile app. For $19.99 a month, users can access Gemini Ultra, the more powerful model, through the Gemini Advanced service.

Google was not available for an interview at the time of reporting.

Hiring Now
Formlabs
3D Printing • Hardware • Other • Software • Design
SHARE