What is Google Gemini? (Models, Capabilities & How to use)

Summary: Google Gemini is a family of multimodal AI models designed to understand text, code, images, audio and video. Available in three sizes, Gemini powers Google’s chatbot, also named Gemini, and integrates with several Google apps and services.

Gemini is a family of generative AI models created by Google and the name of the company’s chatbot. The models come in three different sizes and are being incorporated into several Google products, including Gmail, Docs and its search engine.

What is Google Gemini?

Gemini is a family of AI models created by Google to power many of its products, including its chatbot, also named Gemini, as well as Gmail, Docs and its search engine.

The Gemini models are multimodal, meaning their capabilities span text, image and audio applications. They can generate natural written language, transcribe speeches, create artwork, analyze videos and more. Gemini was developed by Google’s AI research labs DeepMind and Google Research, and is the culmination of nearly a decade of work. Like other AI products, Gemini is expected to get better over time as the industry continues to advance.

What is Gemini 2.0?

Gemini 2.0 is the latest iteration of the Google Gemini family. Made up of three main models — Pro, Flash and Flash-Lite — Gemini 2.0 was designed with agentic AI in mind, meaning it can not only understand and generate content, but also take action, interact with external tools and complete multi-step tasks on the user’s behalf. To enable this, the models combine advanced reasoning, tool use and extended memory. They also introduce a new “calling feature,” which enables the models to interact with external services like Google Search, APIs or execute code to complete tasks it cannot handle internally.

Here’s a rundown of each individual Gemini 2.0 model:

Gemini 2.0 Pro

Gemini Pro 2.0 is an experimental model that features strong coding performance and an ability to handle complex prompts. It features a 2 million token window, making it ideal for processing vast amounts of information, handling its calling feature and executing code.

Gemini 2.0 Flash

Gemini Flash 2.0 is a general-use model designed for high volume, high frequency tasks. It is used across Google’s various AI-enabled products including the Gemini App.

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is a lightweight model designed for high-volume text workloads and optimized for cost efficiency. According to Google, it has a context window of 1 million tokens and can generate captions for 40,000 photos for less than a dollar.

Previous Gemini Models

The model comes in four different versions, which vary in size and complexity:

Gemini 1.0 Ultra

Gemini 1.0 Ultra is the largest model for performing highly complex tasks, according to Google. The company says it is the first model to outperform human experts on a benchmark assessment that covers topics like physics, law and ethics. The model is being incorporated into several of Google’s most popular products, including Gmail, Docs, Slides and Meet. For $19.99 a month, users can access Gemini 1.0 Ultra through the Gemini Advanced service.

Gemini 1.5 Pro

Gemini 1.5 Pro is the middle-tier model designed to understand complex queries and respond to them quickly, and it’s suited for “a wide range of tasks” thanks to an expanded context window for improved memory and recall. A specially trained version of Pro powers the AI chatbot Gemini and is available via the Gemini API in Google AI Studio and Google Cloud Vertex AI.

Gemini 1.0 Nano

A much smaller version of the Pro and Ultra models, Gemini 1.0 Nano is designed to be efficient enough to perform tasks directly on smart devices, instead of having to connect to external servers. 1.0 Nano currently powers features on the Pixel 8 Pro like Summarize in the Recorder app and Smart Reply in the Gboard virtual keyboard app.

Gemini 1.5 Flash

The latest member of the Gemini family, Gemini 1.5 Flash is a smaller version of 1.5 Pro and built to perform actions much more quickly than its Gemini counterparts. 1.5 Flash was trained by 1.5 Pro, receiving 1.5 Pro’s skills and knowledge. As a result, this model has the context window to handle hefty tasks while serving as a more cost-efficient alternative to larger models.

What Can Google Gemini Do?

Gemini is a multimodal model, so it is capable of responding to a range of content types, whether that be text, image, video or audio.

Generate Text

Gemini can generate text, whether that’s used to engage in written conversations with users, proofread essays, write cover letters or translate content into different languages. Gemini can also understand, explain and generate code in some of the most popular programming languages, including Python, Java, C++ and Go.

Like any other LLM, though, Gemini has a tendency to hallucinate. “The results should be used with a lot of care,” Subodha Kumar, a professor of statistics, operations and data science at Temple University’s Fox School of Business, told Built In. “They can come with a lot of errors.”

Produce Images

Gemini is able to generate images from text prompts, similar to other AI art generators like Dall-E, Midjourey and Stable Diffusion.

This capability was temporarily halted to undergo retooling after Google was criticized on social media for producing images that depicted specific white figures as people of color. Image generators have developed a reputation for amplifying and perpetuating biases about certain races and genders. Google’s attempts to avoid this pitfall may have gone too far in the other direction, though.

Analyze Images and Videos

Gemini can accept image inputs and then analyze what is going on in those images and explain that information via text. For example, a user can take a photo of a flat tire and ask Gemini how to fix it, or ask Gemini for help on their physics homework by drawing out the problem. Gemini can also process and analyze videos, generate descriptions of what is going on in a given clip and answer questions about it.

Understand Audio

When fed audio inputs, Gemini can support speech recognition across more than 100 languages, and assist in various language translation tasks — as shown in this Google demonstration.

Streamline Workflows

Gemini can be integrated into several Google Workspace products, including Gmail, Docs and Drive. Users can query Gemini (through its chatbot interface) to find a document in their Drive and summarize it, or automatically generate specific emails. “It becomes a little bit of an assistant in that sense,” Gen Furukawa, an AI expert and entrepreneur, told Built In.

Within more specific business contexts, professionals can use Gemini to produce drafts for blog posts, emails and advertisements in Docs; generate images for Slides presentations by inputting a text prompt and selecting a visual style; and even tailor their virtual background in Google Meet with a detailed text prompt.

More on Generative AIA Comparison of the Top AI Models: Features Use Cases and Cost

How to Access Google Gemini

Gemini can be accessed in several ways:

For free: You can head to gemini.google.com and use it for free through the Gemini chatbot. Or you can download the Gemini app on your smartphone. Android users can also replace Google Assistant with Gemini.

Paid version: You can also subscribe to the Gemini AI Pro service for $19.99 a month and Gemini AI Ultra for $249.99 a month.

Developers can also access the models through the Gemini API in Google AI Studio and Vertex AI.

Gemini is a work in progress, so it might generate answers that are inaccurate, unhelpful or even offensive. And it retains users’ conversations, location, feedback and usage information, according to Google’s privacy policy. So users may want to avoid consulting Gemini for professional advice on sensitive or high-stakes subjects (like health or finance), and refrain from discussing private or personal information with the AI tool.

More on Artificial IntelligenceExplore Built In’s AI Coverage

Frequently Asked Questions

What can Google Gemini be used for?

Gemini is an AI tool that can answer questions, summarize text and generate content. It also plugs into other Google services like Gmail, Docs and Drive to serve as a productivity booster. And, because Gemini is multimodal, its capabilities span across text, images and audio. So, in addition to generating natural written language, it can transcribe speeches, create artwork, analyze videos and more, according to Google.

What are the different versions of Gemini?

Gemini is available in three versions: Gemini 2.0 Pro, Gemini 2.0 Flash and Gemini 2.0 Flash-Lite. Each model is optimized for different tasks, ranging from complex reasoning to lightweight, fast responses.

Is Google Gemini free?

A basic version ofGemini is available for free at gemini.google.com. There is a free mobile app too. Users can also subscribe to the Gemini AI Pro service for $19.99 a month and Gemini AI Ultra for $249.99 a month.

Who made Google Gemini?

Google Gemini was made by Google DeepMind and Google Research — AI research labs and subsidiaries under the Google corporate umbrella.

How to access Google Gemini?

To access the free version of Google Gemini, smartphone users can download the Gemini app and Android users can substitute Gemini for Google Assistant. To use Gemini in chatbot form, users can head to gemini.google.com. For those who want to access Gemini Ultra, subscribe to the Gemini Advanced service.

What Is Google Gemini?