With headlines emerging about artificial intelligence (AI) reaching “sentience,” it’s clear that the power of AI remains both revered and feared. For any AI offering to reach its full potential, though, its executive sponsors must first be certain that the AI is a solution to a real business problem.
And as more enterprises and startups alike develop their AI capabilities, we’re seeing a common roadblock emerge — known as AI’s “last mile” problem. Generally, when machine learning engineers and data scientists refer to the “last mile,” they’re referencing the steps required to take an AI solution and make it available for generalized, widespread use.
What Is the ‘Last Mile Problem’?
Democratizing AI involves both the logistics of deploying the code or model as well as using the appropriate approach to track the model’s performance. The latter becomes especially challenging, however, since many models function as black boxes in terms of the answers that they provide. Therefore, determining how to track a model’s performance is a critical part of surmounting the last-mile hurdle. With less than half of AI projects ever reaching a production win, it’s evident that optimizing the processes that comprise the last mile will unlock significant innovation.
The biggest difficulty developers face comes after they build an AI solution. Tracking its performance can be incredibly challenging as it’s both context-dependent and varies based on the type of AI model. For instance, while we must compare the results of predictive models to a benchmark, we can examine outputs from less deterministic models — such as personalization models — with respect to their statistical characteristics. This also requires a deep understanding of what a “good result” actually entails. For example, during my time working on Google News, we created a rigorous process to evaluate AI algorithms. This involved running experiments in production and determining how to measure their success. The latter required looking at a series of metrics (long vs. short clicks, source diversity, authoritativeness, etc.) to determine if in fact the algorithm was a “win.” Another metric that we tracked on Google News is new source diversity in personalized feeds. In local development and experiments, the results might appear good, but at scale and as models evolve, the results may skew.
The solution, therefore, is two-fold:
- Organizations must improve the rigors surrounding AI and develop methodology to assess success once an AI model goes live. Historically, traditional systems have had a tremendous number of controls around how code has been deployed, tracked, and measured. In the world of AI, though, the data is the code, and as a result, we must apply the same rigor to the data driving the model.
- Additionally, engineers must gain a detailed understanding of how to quantify and monitor a model’s success since the way a model’s behavior is tested after implementation varies from the way it might react during its development.
Machine learning operations (MLOps) is becoming a new category of products necessary to adopt AI. MLOps are needed to establish good patterns and the tools required to increase confidence in AI solutions. Once AI needs are established, decision-makers must weigh the fact that while developing in-house may look attractive, it can be a costly affair given the approach is still nascent.
Looking ahead, cloud providers will start offering AI platforms as a commodity. In addition, innovators will consolidate more robust tooling, and the same rigors that we see with traditional software development will standardize and operationalize within the AI industry. Nonetheless, tooling is only a piece of the puzzle. There is significant work required to improve how we take an AI solution from idea to test to reality — and ultimately measure success. We’ll get there more quickly when AI’s business value and use case is determined from the outset.