GPT-4o Mini Is Cheaper. Is It As Good?

Our expert takes the newest Open AI product out for a spin.

Written by James Evans
Published on Aug. 20, 2024
A computer screen displays type that reads GPT-4o mini: advancing cost-efficient.
Image: Robert Way / Shutterstock / Built In
Brand Studio Logo

The big advantage of the software business was always that once you had built the software, serving one more customer was effectively free — zero marginal costs.

AI changes this. Builders pay OpenAI (or their LLM provider of choice) for each input and output. That means a few power users can now kneecap the economics of an entire software business.

GPT-4o-Mini: 3 Highlights

  1. It’s 60 percent cheaper than 4o.
  2. It outperforms GPT-3.5 Turbo on all benchmarks.
  3. It’s multimodal — it can take image, video and audio inputs and produce these as outputs, so it’s not limited to text.

Because new models get more and more capable, running them gets more and more expensive, which increases the cost for API customers. That’s why the large language model ecosystem is changing.

Powerful models like Claude 3.5, Gemini Ultra and GPT-4 can do a lot. But using them for simple tasks is like using a Corvette engine to power a lawnmower: It’s going to work, but it’s a giant waste of money.

This is why OpenAI recently released GPT-4o Mini. The mini gives it away: This is a smaller model with less horsepower. It’s also about 30 times cheaper than GPT-4o and runs significantly faster. In other words: It’s a lawnmower engine: More compact, less expensive and better suited for simple-ish tasks.

Further ReadingRead All About Artificial Intelligence

 

GPT-4o-Mini Advantages

You might imagine that a small model would be worse at everything than a large model. That’s loosely true for models of the same vintage, but because LLM APIs are advancing so quickly, GPT-4o Mini is better at most things than flagship models of just a few months ago. 

For example, there’s really no reason to use GPT-3.5-turbo anymore because GPT-4o Mini is:

  • 60 percent cheaper
  • Outperforms 3.5-turbo on all benchmarks
  • Is multimodal (can take image, video and audio inputs and produce these as outputs, so it’s not limited to just text)

OpenAI didn’t invent the fast, cheap model. Many LLM providers now offer a highly capable, expensive flagship model and a faster, cheaper, but less capable one:

  • Anthropic offers Claude Opus (flagship) and Sonnet (small)
  • Mistral offers Mistral Large (flagship) and Mistral Nemo (small)
  • Google offers Gemini Ultra (flagship) and Gemini Nano (small)
  • Meta offers Llama 3.1 405B (flagship) and Llama 3.1 8B (small)

Let’s look at the short term and long term implications of this small model trend.
 

Long-Term Implications for Small Models

Their need for less hardware power plus the lower cost add up to a bright future for smaller LLMs.

Running on Device

Small models are often touted for their ability to run on-device, without an internet connection. LLMs are huge matrices, and running them (not to mention training them) requires a lot of strong hardware powering it. That’s why OpenAI, Meta, Anthropic etc. run giant data centers filled with GPUs that take up a lot of energy.

But smaller models require far less hardware power, which means they could run locally. This creates a whole host of new applications for on-premise devices or high-security environments. OpenAI isn’t supporting this yet, but Google, Microsoft and Apple are running on-device models for their own products, and Llama 3 has been hacked to run on device.

They also enable software vendors to lower their costs by running the AI models on the user’s device instead of paying for each user interaction via an API. This is why I believe these models will become more and more prevalent. Many of the tasks software companies are offering as AI features don’t require the capabilities of the flagship models.

Decline of the Monolith API

Even in the early early days of LLM APIs, vendors offered endpoints which traded off performance vs. latency vs. cost. We’re now seeing a Cambrian explosion of endpoints that rearrange these trade offs.

Small models offer a compelling alternative to other models when the complexity of the task isn’t as high, or the importance of strict correctness isn’t as great, just like the way batch endpoints can be used when latency isn’t a huge concern.

Even if a company wants to keep serving its users with the most capable model to ensure quality, the intermediate steps and analysis are best done with small models like GPT-4o Mini. That’s because there are so many interactions that costs scale too quickly.

It’s possible to fine tune these small models to get close to the performance of large models, within a specific domain. The way to do this is to fine tune a small model using the outputs of a large model on your task of interest (a process called student teacher distillation). This is a frontier strategy, but it appears to be quite effective.

More From James EvansHow to Get Into a Top Tech Accelerator


Short-Term Implications for Small Models

There’s some speculation about where mini models are heading and how they might change the LLM API landscape. How is GPT-4o Mini (and its colleagues from other AI shops) affecting AI software today?

My company, CommandBar, builds an AI chatbot that other companies embed into their sites to interact with users. So we’re a layer company between foundation model companies and our customers.

With the introduction of GPT-4o Mini, we were able to pretty much immediately switch from running non-user-facing queries (like a service that does sentiment analysis on user chats) from GPT-4o to 4o Mini.

We’re also going farther than that: for many end-user-facing queries, we’re seeing 4o-mini perform just as well as 4o, so we’re beginning to orchestrate certain types of queries to 4o-mini and might eventually make it our default model for all but the most complicated user queries. The net effect on our business is that the OpenAI portion of our LLM bill is likely to decrease by around 50 percent. That’s tens of thousands of dollars saved, immediately. 

That’s what it’s like running a layer AI company these days: every so often, OpenAI shows up and hands you a P&L gift.

Explore Job Matches.