It feels like every B2B app is now launching AI features. My startup CommandBar is no exception. We rely on large language model (LLM) APIs to build embedded user assistance agents for 20 million end users. After spending 11 months with OpenAI’s APIs, we’re now exploring other models to see how they’ve caught up. We’re doing this mainly to see where we can lower costs, improve performance of our AI products for different use cases and maybe gain more control over data privacy.
In the mainstream, the success of ChatGPT has made GPT synonymous with AI. This dominance seems to translate into the B2B world: In a study published by venture capital firm a16z, enterprises using AI predominantly use OpenAI’s models to power their AI features and apps. OpenAI’s models are in production at more than five times the rate as second-place finisher Google.
Is this dominance here to stay? The data doesn’t seem to suggest so. In a16z’s data, most companies are now at least testing multiple models to avoid vendor lock-in. The data suggests that enterprises are excited about finding OpenAI alternatives to help them save costs, get more control and avoid vendor lock-in.
Alternatives to OpenAI API for B2B
Most AI models take too much compute to run locally on devices, which is why companies usually access LLMs via APIs connected to data centers, which actually run the models.
OpenAI is far from the only provider for LLM APIs nowadays. AI ethics champion Anthropic offers its Claude models via an API, as does Mistral (for its proprietary models). Google’s Gemini is also accessible via API.
Via API providers like Replicate and Hugging Face, you can also access various open-source LLMs via API. The most popular ones include Mistral’s Mixtral models and Meta’s Llama models.
These APIs are usually paid per million tokens, which can best be described as word fragments. The ranges are large: OpenAI’s GPT-4o costs $30 per 1 million tokens of output, while running Llama 3 70b via Replicate costs $2.75 per million tokens of output.
For reference, OpenAI states that 100 tokens are approximately 75 words. How many tokens your product requires depends heavily on what you generate. An article-writing tool that produces thousands of words will consume more tokens than an AI feature that automatically renames downloaded files.
After running our AI products on OpenAI APIs for the longest time, we’re now exploring alternatives.
Which AI API Should You Use?
Our customers embed our product in theirs, meaning they ship us to their users. This means response quality is paramount, but that’s not the case for every company. Let’s first define what you might optimize for by looking at three core metrics.
Price. This is most obvious. You should try to pay the lowest price possible that allows you to maintain the user experience you want to offer.
Quality. How good is the AI output? Different use cases demand different levels of quality. A user assistant like ours should be accurate, but if you’re using an LLM to automatically name documents, you probably don’t need the highest possible quality.
Speed. The bigger the model, the slower the output — and different use cases require different speeds. If you’re building an AI writing tool, waiting an extra 20 seconds for an article to be generated is trivial.
Without testing different ones, it’s hard to say exactly which AI model to use for what use case. But here are a few things we’ve found: In terms of quality, many models are extremely close to each other now.
OpenAI used to be the undisputed leader, but is no longer alone at the top. Many models approximate GPT-4o’s level of quality now, especially for less complex queries.
But we’ve found a few standouts:
Anthropic’s Claude 3 seems to have the most natural writing style. Many users now dislike ChatGPT’s style.
Llama and Mistral’s models are best if you want full control. Because they’re open source, they’re cheap to access via APIs, which makes them ideal for testing purposes. If you’re dealing with sensitive data, you can even host them locally and avoid sending data anywhere but to your own server/device.
But once you’ve chosen the AI API you want to build with, you need to make another choice: How to set up your infrastructure.
Diversifying Your AI Infrastructure
If you rely on one API to run your entire product (or a crucial feature), that vendor can harm your business. Even if they don’t raise prices or shut down, simple downtime affects your customer/user experience.
During OpenAI’s leadership kerfuffle, we switched our infrastructure to Microsoft Azure (which also has APIs for OpenAI’s models). By doing this, we insulated ourselves from a potential OpenAI shutdown or major product change. This ultimately didn’t happen, but we still feel safer relying on Microsoft to keep up the infrastructure than on OpenAI.
While you may have a preferred model like GPT-4, it’s wise to have other APIs (whether those host the same models or not) to mitigate vendor risk.
Orchestration Across Multiple Models
I’ve talked about selecting a model, but I think the state of AI apps is heading toward multi-model architecture. As models specialize, it’s likely that a single AI application would get the best performance by orchestrating across multiple models.
For example, in our chat product, we can orchestrate queries to models based on the currently available latency from each, to make sure users get the fastest responses possible (response time is a huge driver of user satisfaction with AI chat). We’ve also explored orchestrating to trade off cost and quality. For short, simple questions, we can get by using cheaper models. For more complex questions, we might want to bring in the big guns and optimize for cognition.
As models specialize, it’s likely that a single AI application would get the best performance by orchestrating across multiple models.
That said, orchestration can be complex and requires familiarity with multiple models and staying up to date with the latest drops.
If you’re building a production AI application for end users, which comes with higher potential cost and more usage diversity, then it might make sense to invest in orchestration yourself or use one of the emerging vendors in the space. But if you’re building something for internal use that is relatively small scale, you’ll be fine doing a bit of testing with different models and selecting the one that seems best.