What Are Foundation Models?

Foundation models help AI developers build smarter, not harder.

Written by Ellen Glover
Published on Oct. 15, 2024
a hand placing a wooden block on top of a stack of wooden blocks
Image: Shutterstock

Foundation models are large, general-purpose neural network architectures that serve as the building blocks for artificial intelligence systems. Trained on massive amounts of diverse datasets, these AI models can perform a wide range of tasks involving natural language processing, computer vision and generative AI.

Foundation Models Definition

Foundation models are AI models that have been pre-trained on large quantities of data, which can be fine tuned for specific tasks across various applications. Specific examples include OpenAI’s GPT and DALL-E models, Meta AI’s Llama models and Stability AI’s Stable Diffusion model.

The versatility of foundation models makes them a key starting point in AI development, as they can be fine-tuned for a variety of specialized tasks. Instead of having to build domain-specific models from the ground up, developers can create whatever AI products they want on top of existing foundation models, making the process faster and more cost-effective. 

As a result, foundation models have become the backbone of the AI industry, powering many of the tools and technologies we interact with every day.

 

What Are Foundation Models?

Foundation models are AI models that function as the base — or “foundation” — of many AI products. The term was introduced in a 2021 paper published by researchers at the Stanford Institute for Human Center Artificial Intelligence’s Center for Research on Foundation Models, who defined it as “any model that is trained on broad data” and that can be adapted to a “wide range of downstream tasks.”

The large size and flexible nature of foundation models make them different from more traditional machine learning models, which are usually designed exclusively for specific, pre-defined tasks. While foundation models are most-often linked to large language models (LLMs), which handle text-based tasks, many are becoming multimodal, meaning they can process text, images, video and audio.

Building a foundation model from scratch is expensive and requires an enormous amount of computational power and resources, often taking months to build and train. But once one has been built, businesses can input additional data and fine-tune it for specific tasks with minimal extra effort, making it a quick and cost-effective tool for making their own AI products.

 

Why Are Foundation Models Important? 

Foundation models provide a versatile and powerful base for developing a wide array of AI products. Their significance stems from their malleability and scalability, which allows businesses to tap into the powers of artificial intelligence without having to start from scratch. This not only saves time and resources, but it also democratizes AI development, empowering more players to leverage the technology in innovative ways.

As foundation models continue to grow more sophisticated, they will inevitably influence the quality of the AI products built on top of them — and, by extension, the industries using them. In many ways, the future is being shaped by advancements in foundation models.

Here’s an Example of a Foundation ModelWhat Is GPT-4?

 

How Do Foundation Models Work?

Foundation models leverage deep learning techniques (specifically neural networks) to process and interpret terabytes of data, learning its underlying structure and nuances. This data is extensive — often encompassing text, images, videos and audios — which gives the model a broad understanding of various subjects that can be applied in different ways.

The versatile nature of foundation models is made possible through self-supervised learning and transfer learning, where the model learns patterns and relationships from unlabeled data, then applies that information from one situation to another. It’s similar to how humans learn: If a person learns to drive one car, for example, they can drive most other cars fairly easily — and may even be able to figure out how to operate a bus or a boat, too.

After they’ve been trained, foundation models can be modified with additional data to specialize in specific tasks.

“It’s easier to teach a child who knows how to speak already than to teach a baby how to speak,” Susan Nomecos, senior director of global AI and Web3 strategy at Getty Images, told Built In. “It’s kind of the same thing with these foundation models. They’re trained to understand a lot of different contexts, and be able to perform a lot of different tasks already.”

For instance, a model trained on lots of diverse text data will understand the relationship between words, how to properly structure a sentence and a range of writing styles. If that model is then fine-tuned on healthcare data, it could be used to power a chatbot that helps doctors diagnose patients, fluently answering health-related questions and providing relevant advice.

“Fine-tuning is essentially the ability to personalize a version of a foundation model,” Brennan Woodruff, co-founder and chief business officer at generative AI company GoCharlie, told Built In. “You’re basically giving it more data so that it is more accurate to you and what you need to do.”

More on AIRead Built In’s Artificial Intelligence Coverage

 

Types of Foundation Models

While foundation models come in many forms, they can typically be organized into one of three categories: natural language processing, computer vision and generative AI.

Natural Language Processing

Natural language processing enables AI models to interpret written and spoken language, bridging the gap between human communication and computer understanding. These models have the ability to converse with users in a human-like way, summarize text inputs, generate their own content and translate languages.

  • Use Case: Foundation models can be fine-tuned to help law professionals research relevant case law, statutes and legal opinions by efficiently sifting through thousands of documents, summarizing their contents and highlighting key passages.

Computer Vision 

Computer vision focuses on processing and understanding visual data, allowing AI models to identify and analyze objects, actions and other relevant features. These models are pre-trained on datasets of annotated images and videos, and can be fine-tuned for more specific tasks like facial recognition and machine vision.

  • Use Case: Trained on images from x-rays, MRIs and CT scans, foundation models can learn to identify abnormalities like tumors and fractures in healthcare settings.

Generative AI

Some foundation models are built to create completely new content, which involves producing text, images, audio and video that mimic human-made works. These models are good at finding patterns in large quantities of data and then replicating those patterns in new creations. And they can be fine-tuned with additional data to generate content that’s more tailored to the user’s specific needs.

  • Use Case: Foundation models can be used to generate blogs, social media posts, emails and other marketing materials tailored to specific audiences.

 

Examples of Foundation Models

Some of the more prominent examples of foundation models available to day include GPT, Claude 3, Llama, DALL-E and Stable Diffusion.

GPT

The Generative Pre-trained Transformer (GPT) model was first introduced by OpenAI in 2018. Since then, the company released several improved versions of the large language model. The latest version powers OpenAI’s popular ChatGPT chatbot and can be accessed via the company’s API.

Claude 3

Claude 3 is a family of large language models developed by Anthropic. Companies can now fine-tune Claude 3 Haiku — the fastest and most affordable of the models — to adapt its knowledge and abilities for their specific business needs, making it more effective for specialized tasks.

Llama

Llama is a family of large language models developed by Meta AI to perform natural language processing tasks. These foundation models come in various parameter sizes, with larger ones designed for more complex tasks and smaller ones offering greater efficiency. The latest version, Llama 3.1, is open source and available for download on Hugging Face and Meta’s website.

Mistral

The Mistral foundation models are a family of large language models developed by Mistral AI. Some of the Mistral models are open source, while others are accessible exclusively through its API. In 2024, the company released Pixtral, which is capable of processing images and documents in addition to text.

DALL-E

DALL-E is a model that generates images from text inputs, transforming written prompts into visual art. OpenAI released the first iteration of the model in 2021 and has since come out with several more advanced versions, the latest of which is called DALL-E 3. The model is built natively in ChatGPT and is available via OpenAI’s API.

Stable Diffusion

Stable Diffusion is a model that can create images from text prompts, enabling users to generate highly detailed visuals with just a few written words. Developed by Stability AI, this model employs a process called diffusion to make its images, meaning it iteratively refines random noise into coherent visuals, capturing intricate details and styles. Stable Diffusion is one of many foundation models available on Amazon Bedrock.

More AI InnovationWhat Are AI Agents?

 

What Are the Challenges of Foundation Models?

Foundation models serve as a solid starting point in AI development, but they are not without flaws. As a single point of failure, any errors, vulnerabilities and biases within a model can spread to all of the AI products built on top of it, amplifying the risks. 

Lack of Interpretability

The inner workings and decision-making processes of foundation models are often not well understood — even to the people actually making them — which makes it hard to determine how and why they arrive at certain conclusions.

“These are little black boxes,” Rajul Rana, chief technology officer at IT services company Orion Innovation, told Built In. “We know roughly how they work but [we] don’t know exactly why they generate certain outputs.”

This lack of interpretability can make it difficult to trust foundation models’ outputs or correct any errors, which can have massive consequences — especially since they are embedded in our everyday lives, from the facial recognition software used to unlock phones to the hiring algorithms companies use to screen job candidates.

“You really run the risk of perpetuating a lot of bias and misinformation,” Nomecos said.

Privacy Risks

Because users can input proprietary or otherwise sensitive data during fine-tuning, using foundation models can pose a privacy risk. If not properly protected, this data could be leaked — either through security vulnerabilities in the system or by the model itself. And since foundation models are often shared across multiple users, there’s a risk of it inadvertently revealing private information if it is not carefully managed.

Unreliable Answers

Sometimes foundation models generate outputs that appear correct, but aren’t. They don’t always fully understand the context or meaning behind the information, relying instead on patterns learned during training. So, when they are faced with incomplete, ambiguous or unfamiliar inputs, they may fill in the gaps with incorrect information — a phenomenon known as “hallucination.”

As a result, foundation models “never get to 100 percent accuracy,” Woodruff said. “It’s that degree of unpredictability that makes them hard to work with.”

Frequently Asked Questions

One prominent example of a foundation model is OpenAI’s GPT-4, which was trained on petabytes of text and image data to perform tasks like text generation, image analysis and even web development. The model can also be fine-tuned for more specialized tasks, like powering a customer service chatbot or assisting with legal document analysis.

The main characteristics of foundation models are their large size and versatility. These models are typically trained on massive datasets, allowing them to handle a diverse range of tasks, and they can be fine-tuned for specific applications with minimal extra training.

Foundation models are the backbone of generative AI, providing the large-scale architectures needed to create new text, images, video and audio. Because these models are good at learning patterns from vast amounts of data, they can generate outputs that mimic human creations.

Deep learning is a core technology in foundation models, which use neural networks to process large quantities of data and identify patterns within them. Foundation models are capable of performing complex tasks like image generation and language understanding because of deep learning.

Explore Job Matches.