By now, most enterprises have realized that it’s critical to manage cloud spending – hence the popularity of FinOps, a discipline devoted to helping businesses understand and optimize cloud costs.
But as AI becomes an increasingly important component of the enterprise IT landscape, keeping AI costs in check is critical too. And while traditional FinOps practices offer a start in that direction, they’re not always enough on their own to address the unique spending challenges of AI. Businesses that want to benefit from AI without breaking the bank need to extend their FinOps strategies to support AI workloads too.
My business has thought about this topic quite a bit. We’re employing generative AI for what we think is a pretty novel use case: Helping organizations to understand and modernize legacy application code as part of the journey toward SAP Clean Core. This initiative aims to break custom application logic out of SAP’s core platform.
Using AI for this purpose can dramatically accelerate application modernization. But it can also be costly due to factors like the charges incurred to process requests using third-party large language models. Thus, it’s critical to be able to monitor costs — and identify opportunities for reducing them — as part of the AI-powered modernization process I just described.
That’s why we’ve made granular AI cost tracking a key part of the solution we’re building. Here’s how we’re doing it, along with takeaways on how organizations of all types can implement FinOps for AI, no matter which AI workloads they run.
The Need for AI-Centric FinOps
Before diving into the details of the AI FinOps solution my company has built, let’s set the context by discussing why AI requires a special approach to FinOps.
The main reason is that, while traditional FinOps is great at providing visibility into general IaaS cloud spending, AI presents unique challenges that don’t apply to traditional cloud workloads. To measure your AI costs, you need to track metrics such as:
Which Models You Use
There are a variety of production-ready large language models (LLMs) available today, and the cost of querying them can vary widely. Indeed, we’ve found that the cost of submitting an identical query to two different models can vary by a factor of up to 100.
How Many Tokens You Process
Each request to an AI model “costs” a certain number of tokens. The more complex a request is, the more tokens it generally requires, and more tokens generally translate to a higher bill.
Model Runtime
Some models may charge based in part on how long it takes to process a request.
Ancillary Costs
In addition to the direct costs associated with submitting queries to AI models, it’s important also to think about indirect costs, such as storage expenses for archiving the data generated by AI models.
Traditional cloud FinOps doesn’t factor in most of these costs. It focuses instead on tracking expenses like data egress fees or comparing the costs of various types of cloud server configurations, which aren’t relevant for most AI workloads.
How to Implement FinOps for AI
As my organization has built our solution to help modernize SAP applications, we’ve committed to including the following FinOps capabilities into the core product.
4 Aspects of FinOps for AI
- Granular cost reporting.
- Quality tracking.
- Rebilling capabilities.
- Cost prediction.
Granular Cost Reporting
Above all, ensuring that cost reporting for AI is granular has been a critical priority. We want to make sure we can break down AI costs based on users, dates, models, query lengths and so on. Knowing how much AI costs us in aggregate is not enough because we would lack the visibility to slice-and-dice cost data in ways that enable us to identify inefficiencies and optimize costs.
Quality Tracking
With AI in particular, you need to know both how much you’re spending and how it correlates with the quality of output. In general, lower-cost models will produce lower-quality results, although that is not universally the case.
To provide visibility into how quality aligns with cost and model usage, we’ve built comparison features into our product. They allow us to run the same query across different models, then view a summary of both the cost and the quality of each result. With this insight, we can make informed decisions about which types of queries to run on which models.
Rebilling Capabilities
When you’re using AI to support a complex, large-scale, multi-stakeholder project like ours, you must align costs with specific users or organizational units. That data makes it possible to rebill the consumers of each AI query when necessary. This is another reason why tracking costs in the aggregate is not enough.
Cost Prediction
Unlike cloud costs, which users can typically predict easily enough with help from the cost calculators offered by cloud service providers, estimating how much an AI query or workload will cost before you run it can be challenging. There are no official cost calculators for LLMs. If you collect data about historical AI costs, however, you can use it to predict costs for similar queries using similar models.
In this way, the types of AI costs that we’re monitoring inside our tool lay the groundwork for building a sort of AI cost calculator — one that will allow us to predict with a fair degree of accuracy how much we’ll need to invest to complete various types of queries.
Lessons for Successful AI FinOps Adoption
The project my business has undertaken focuses on a specific niche — legacy platform modernization — but it has broad lessons for any organization committed to streamlining its AI costs.
The biggest, perhaps, is that now is the time to begin implementing FinOps for AI. Don’t wait for the cloud service providers who host models or AI vendors who develop them to build cost reporting tools into their AI solutions because there’s no indication that they will do so anytime soon. The onus is on companies that use these products to track costs themselves.
A second key point of emphasis is the need for granular visibility into AI costs. Again, tracking total AI spending is not very useful because it doesn’t help you drill down into where you’re overspending and find ways to cut costs. Nor can you predict future spending for specific types of queries if you lack granular visibility into your AI spending.
Finally, ensure that AI FinOps capabilities make it possible to compare spending between different configurations. As I mentioned, the quality of a model’s output can vary widely, as can its costs. Having the ability to compare quality and costs at a glance is critical if you want to know which configuration will yield the best bang for your buck, so to speak, for a given type of query.
To be sure, it remains early days for enterprise AI adoption in general, and monitoring AI costs may be far from the top of many organizations’ priority lists. But if you don’t build FinOps capabilities into AI investments from day one, you risk undercutting the value that AI delivers due to a failure to keep costs in check. Applying FinOps to AI poses some special challenges, but they’re solvable, and addressing them is absolutely critical for long-term AI success.