Taking an AI model from the design phase all the way through to production deployment is a complicated and tedious process; this is why MLOps, which streamlines model development and management, is so valuable for any organization that wants to excel at AI and machine learning.
The better that teams implement MLOps, the faster, more reliable and more productive their AI workloads will be. Here’s a look at steps businesses can take to supercharge their approach to MLOps and, by extension, optimize their ability to create value using ML processes and AI models.
How can you get the most value out of MLOps?
- Set up data environments automatically using IaC.
- Control data and model drift with drift detection tools that allow you to use an automated schedule.
- Perform shadow testing before deploying new AI models.
- Optimize your data cleaning process with automation.
- Use tools that let you automate retraining models.
The What and Why of MLOps
Before diving into MLOps best practices, let’s talk about what MLOps does and why it’s beneficial. Short for machine learning operations, MLOps is a strategy for simplifying all the workflows necessary to create and deploy AI models.
It covers the design, development, testing and deployment of models themselves, as well as the data management and collaboration processes that support model creation. It also sustains ongoing management of models after they have been developed and deployed.
The goal of MLOps is to automate AI/ML workflows in ways that not only save time and effort on the part of data scientists but also help to make AI data preparation, training, testing, deployment and management workflows more consistent and predictable. When you automate processes using MLOps tools, you can duplicate or repeat them as many times as you need without the inconsistency that tends to arise when humans implement processes manually.
Ultimately, MLOps frees data scientists to focus on creative tasks, such as designing and experimenting with models, as opposed to tying them down with operational tasks that can be automated.
How to Get the Most Out of MLOps
No matter how you approach MLOps, it virtually always helps to increase the efficiency of AI model development and management processes. But to get the very most out of MLOps, you must be strategic about how you approach it. The following best practices can help.
1. Use IaC to Create Data Environments
MLOps makes it possible to use Infrastructure as Code tools to automate the process of setting up environments where data scientists test and deploy models. Making full use of IaC for this purpose allows data scientists to whip up environments on demand and on a self-service basis whenever they want to run a new experiment or test without waiting for someone else to set up the environment for them. IaC-based environment provisioning can increase the velocity of AI innovation.
2. Automate Drift Detection
Drift is a phenomenon that can occur in two ways.
- Data drift: Data can change over time in ways that make models less effective.
- Model drift: Model performance can degrade due to changes in the model itself or the environment that hosts it.
Using tools like Evidently, Frouros or drift detection engines built into cloud services like Amazon SageMaker, data teams can perform automated drift detection on a scheduled basis. They can also configure alerts that flag instances where the level of detected drift exceeds a threshold defined by data scientists.
Incorporating automated drift detection solutions into MLOps is critical; you don’t want to wait until your users complain or you encounter a major bug to realize that something has gone wrong with your data or models.
3. Shadow Test AI Models
Data scientists can reduce the risk of unanticipated problems with AI models, including problems that may result from subpar MLOps processes, such as erroneous data cleaning procedures, by performing shadow testing prior to deploying new versions of their models.
Shadow testing means submitting the same query to the production version of a model and an upcoming, newer version of the model. If the newer version doesn’t produce results that are at least as good as those of the old model, you know you have a problem to correct before you can take the new version live.
4. Maximize Data Cleaning Automation
Cleaning data in ways that improve its quality, such as removing duplicate data from a data set, drives effective AI/ML operations. To streamline the cleaning process, consider these practices.
- Automate data cleaning using tools, such as scripts that leverage the open-source pandas library, which can automatically analyze data, detect where cleaning is needed and perform the cleaning.
- Use a centralized environment, like a cloud-based environment that everyone on the data team can access, to clean data. This approach ensures that data cleaning is consistent, which may not be the case if engineers performed cleaning on local workstations.
- Store “dirty” data — raw data that has not yet undergone a cleaning process — in a location such as a cloud vendor object storage service that supports data version control. This practice allows teams to automate the tracking of data and to tie data to the model versions it was used for training and testing activities.
5. Automate Model Retraining and Redeployment
Retraining of models is frequently necessary if teams make changes to the models or if they need to mitigate model or data drift issues. After retraining a model, they redeploy it to production.
You can perform retraining and redeployment manually — and you may be tempted to take that approach if you underestimate how frequently you’ll need to retrain and redeploy — but automating both processes streamlines the process significantly. This not only saves time for data scientists but also helps organizations get new models into production faster, which means they can obtain value from them sooner.
A healthy approach to retraining and redeployment ensures that data scientists retain control over the process; for example, they should be in charge of deciding which drift mitigation strategies to employ beyond retraining. But the busywork of feeding new data to the model, monitoring the training process, and moving the updated model into production can be automated using tools that free data scientists to focus more on the creative portion of their jobs.
The More Automation the Better
In short, the more teams take advantage of automated, repeatable processes as part of their MLOps strategy, the more value they’ll get out of MLOps. At the same time, they prime themselves for creating better AI models, because they free their data scientists to focus on what really matters, like model design, development and testing, rather than grunt work like data preparation or environment setup.