How to Reduce AI Computing Costs
Brock Ferguson is a practice-over-theory kind of guy.
The Chicago-based data-science and machine-learning consultancy he co-founded in 2016, Strong Analytics, puts a major focus on productionizing AI models rather than just building out proofs of concept.
“We want to minimize that gap between research in the lab and deploying to production,” he said. “We think about that a lot.”
That means thinking a lot about cost — something that’s never far from the minds of machine-learning practitioners and consultants, but which came to the forefront again thanks to a much-circulated recent Andreesen Horowitz review that emphasized the high and ongoing computing costs of building and deploying artificial intelligence models.
The review “definitely rang true,” Ferguson said.
So what exactly can organizations do to relieve that strain? Are high cloud-provider bills an unfortunate but necessary cost of doing ML business? Does it ever make more financial sense to shift to a hybrid system? We asked Ferguson and a few other experts for advice on how to avoid perpetual sticker shock.
Crunch the Numbers Early
Perhaps it’s stating the obvious, but it’s important enough to make explicit: Companies really must consider computing costs from the very beginning. The first step in wrangling production costs is to model those figures well before deploying an AI system, David Linthicum, chief cloud strategy officer at Deloitte, told Built In.
That gives companies “a sound baseline for understanding how much will be spent and allows for some tradeoffs to be considered,” he said.
That includes making sure companies consider the finer points of their data infrastructure early on. In fact, the cost is often less about the AI services that process data and more about the data itself — storing, extracting, egressing and ingressing it, according to Tristan Morel L’Horset, a cloud and infrastructure growth lead at Accenture.
Data approaches vary based on the application being pursued, so forget about a one-size-fits-all. But some concerns are universal, like so-called data gravity. In short, compute where the data is.
“Where is most of your data? We want to process and perform AI services on that data where it sits the most.” L’Horset said.
When setting up data storage, carefully consider what data you need in real time, and what you need less quickly, he added.
Also, flexibility can be expensive. There can be big costs associated with moving data both across regions and between cloud providers. Companies have sought to minimize those costs by focusing on a single, primary provider, he said. But even if applications and data are spread across different providers, still keep them localized together to help save money, said Linthicum, who added that trying to negotiate ingress costs with your provider could also be beneficial.
Companies that don’t fully consider issues like data localization and flexibility versus innovation among cloud providers “end up not really getting the full value of the cloud,” L'Horset said.
Do the Easy Stuff
The household-name cloud providers all sell efficiency resources either directly, like IBM’s Cloud Cost and Asset Management, or through third-party alliances, like Amazon’s AWS Cloud Management Tools Partners. Don’t skimp here.
“Organizations need to set up cost governance, where the usage is monitored, analyzed and reported,” Linthicum said. “Limits can be set using these systems as well, and this will keep the enterprise out of cost overrun trouble.”
One standout way to make usage more efficient is to autoscale. This is good practice for anyone using cloud computing, including those using it to run machine learning models, according to Strong Analytics co-founder Brock Ferguson.
“People tend to put up servers or clusters and then just leave them as is,” he said. “Realistically, if you’re running an API or scheduled batch jobs, you need to be tearing down that infrastructure and spitting it back up when you need it.”
“You can save like 90 percent just by turning things off,” he added. “It sounds simple but most people aren’t actually doing it and they’re just burning a ton of money.”
Another more AI-specific cost-reduction measure is knowledge distillation, or training a small, lean model to reproduce the work of a larger, more resource-intensive model. “You can go from requiring some monster server or even a GPU down to something that’s cheaper, maybe CPU-based, without sacrificing much accuracy,” Ferguson said.
It’s a fairly easy option, but not a great fit for every application. The narrower the focus, the better. For instance, large language models, like text predictors, aren’t a great candidate, but a classifier model very well could be. “You can often distill those models quite a bit,” he said. “Then you’re able to run on more efficient hardware.”
There’s also a small-but-growing ecosystem of cloud provider competitors, like Paperspace, and GPU rental services, like Vast AI, that market themselves as cost-effective alternatives to AWS, Azure and Google Cloud. The experts with whom we spoke either didn’t have direct experience with or considered them to be primarily niche services. And fairly or not, they’re still considered a nonstarter for many enterprises.
“Perhaps for small businesses, but for Global 2000 enterprises the larger public cloud providers are typically only what’s considered,” Linthicum said. “Considering the number of points-of-presence that are required, and the need for guaranteed scale, cloud providers need to spend many billions of dollars to get into that game, and the smaller players will typically only play niche roles.”
Use What’s Already Available
If you instinctively feel compelled to build your AI model from scratch, question that instinct. Companies can temper costs by experimenting with low-cost APIs and building on top of the pre-trained neural networks that are already made available by Big Cloud.
“Some folks would say any AI worth doing is worth doing custom — I’m not sure I agree with that,” said Karl Freund, a senior analyst for high-performance computing and machine learning at Moor Insights and Strategy. “There is some easy, low-hanging fruit for AI adoption that can be adequately served just by accessing Microsoft Azure.”
Many companies already have Microsoft infrastructure in place, so it might make sense to extend those applications with whatever Azure-available AI features fit well for the organization’s goals. Same with Google-favoring offices, although their tools are a bit less robust in comparison, according to Freund. You might end up feeling somewhat tethered to that provider’s pre-trained networks, but it’s a cost-benefit tradeoff worth considering.
And despite a degree of vendor lock-in, avenues for expandability do exist.
“If you want to add some of your own data elements, you can do that,” he said. “You basically build a preprocessing neural network that sits in front of their pre-trained neural network.”
Consider Hybrid or On-Premise Processing
The majority of companies tend to do their machine-learning computing work in the cloud. Freund posits that’s the case with at least 80 percent. But companies should vigilantly watch out for when a hybrid approach starts to make more sense — or, in rarer instances, when a full migration to on-premises might.
The recent Andreessen Horowitz review underscores that retraining models in order to fend off data drift is an ongoing financial commitment, even though “it’s tempting to treat this as a one-time cost.”
Freund’s advice dovetails with that sentiment, especially as it relates to the to-cloud-or-not-to-cloud (or complement-the-cloud) question.
“Training is not one and done,” he said. “You continually have to update your neural network models to keep them fresh with new data elements and patterns. That training can be pretty expensive in the cloud.”
It might be more cost-effective at that point to pony up for some GPUs and equip them to onsite rack servers, he said.
Fully on-prem systems, on the other hand, are a rarer bird, but they shouldn’t be dismissed out of hand for enterprises that are doing heavy-duty deep-learning applications with software that performs the same tasks repeatedly. “The reality is that traditional on-premises computing should always be an option, even if you’ve moved workloads and data to a public cloud,” Linthicum said.
He added: “In some cases, workloads should not have moved in the first place and need to be moved back to work properly. This is the case when the public cloud provider doesn’t offer a platform analog, such as mainframes and some older processors.”
It’s a game of constant evaluation. “While public clouds are the answer the majority of the time, they are not always the answer,” he said. “Enterprises need to keep an open mind.”
What are the best options for on-prem? Freund mentions the popular-but-pricey Nvidia’s DGX System, which is a preconfigured server with built-in GPUs. It’s a multi-thousand-dollar investment, and shifting to on-prem carries security costs that cloud users don’t face, so it’s not to be taken lightly. But heavy-lift enterprises should eye their monthly cloud bills closely to see when either a hybrid or on-prem option does start to make sense.
Watch the GPU Market
One overriding hope among AI practitioners is that, as processors continue to advance, costs will correspondingly decrease. The Andreessen Horowitz review splashed some cold water on that optimism, claiming that distributed computing “primarily addresses speed — not cost,” but some experts still consider that anticipation well-founded and urge companies to bear that in mind.
“Costs will come down as these systems become much more optimized, using less storage and compute power,” Linthicum said. “Advancements such as serverless AI mean that it’s up to the cloud provider to find the best, and least-cost resources that meet the needs of the AI system exactly at time of execution. This means there will be no over- or under-buying of resources, which affects the cost of the resources significantly over time.”
Freund, too, believes processor innovation will help tamp down costs. On the other hand, he’s also seen arrival times delayed again and again. With notable exceptions like Google, which announced its TPU in 2016, and Tesla, which now uses its own proprietary processors in its vehicles, the years-long expectation that more Nvidia competitors would surface — and therefore drive down costs — mostly hasn’t played out in reality.
“In terms of data centers, Intel has been trying to get their chip out for over three years,” he said. “They finally gave up, bought a startup [Habana Labs] and are starting over.”
But having seen some design-stage work of startups like Habana and Groq, which was launched by veterans of Google’s TPU development team, Freund is optimistic that a correction is due — if not imminently.
“These guys all have really cool designs and at some point will have production products that will compete with Nvidia and provide those lower price points,” he said. “It just hasn’t happened yet.”