In machine learning, model deployment is the process of integrating a machine learning model into an existing production environment where it can take in an input and return an output.
Imagine that you’ve spent several months creating a machine learning model that can determine if a transaction is fraudulent or not with a near-perfect f1 score. That’s great, but you’re not done yet. Ideally, you would want your model to determine if a transaction is fraudulent in real-time so that you can prevent it from going through in time. This is where model deployment comes in.
Machine Learning Model Deployment Explained
Model deployment in machine learning is the process of integrating your model into an existing production environment where it can take in an input and return an output. The goal is to make the predictions from your trained machine learning model available to others.
Most online resources focus on the prior steps to the machine learning life cycle like exploratory data analysis (EDA), model selection and model evaluation. However, model deployment is a topic that seems to be rarely discussed because it can be complicated. Deployment isn’t well understood by those without a background in software engineering or DevOps.
In this article, you’ll learn what model deployment is, the high-level architecture of a model, different methods in deploying a model and factors to consider when determining your method of deployment.
What Is Model Deployment?
Deploying a machine learning model, also known as model deployment, simply means integrating a machine learning model into an existing production environment where it can take in an input and return an output. The purpose of deploying your model is so that you can make the predictions from a trained machine learning model available to others, whether that be users, management or other systems.
Model deployment is closely related to machine learning systems architecture, which refers to the arrangement and interactions of software components within a system to achieve a predefined goal.
Model Deployment Criteria
Before you deploy a model, there are a couple of criteria that your machine learning model needs to achieve before it’s ready for deployment:
- Portability: This refers to the ability of your software to be transferred from one machine or system to another. A portable model is one with a relatively low response time and one that can be rewritten with minimal effort.
- Scalability: This refers to how large your model can scale. A scalable model is one that doesn’t need to be redesigned to maintain its performance.
This will all take place in a production environment, which is a term used to describe the setting where software and other products are actually put into operation for their intended uses by end users.
Machine Learning System Architecture for Model Deployment
At a high-level, there are four main parts to a machine learning system:
- Data layer: The data layer provides access to all of the data sources that the model will require.
- Feature layer: The feature layer is responsible for generating feature data in a transparent, scalable and usable manner.
- Scoring layer: The scoring layer transforms features into predictions. Scikit-Learn is most commonly used and is the industry standard for scoring.
- Evaluation layer: The evaluation layer checks the equivalence of two models and can be used to monitor production models. It’s used to monitor and compare how closely the training predictions match the predictions on live traffic.
3 Model Deployment Methods to Know
There are three general ways to deploy your ML model: one-off, batch, and real-time.
1. One-off
You don’t always need to continuously train a machine learning model to deploy it. Sometimes a model is only needed once or periodically. In this case, the model can simply be trained ad-hoc when it’s needed and pushed to production until it deteriorates enough to require fixing.
2. Batch
Batch training allows you to constantly have an up-to-date version of your model. It is a scalable method that takes a subsample of data at a time, eliminating the need to use the full data set for each update. This is good if you use the model on a consistent basis but don’t necessarily require the predictions in real-time.
3. Real-time
In some cases, you’ll want a prediction in real time like determining whether a transaction is fraudulent or not. This is possible by using online machine learning models, such as linear regression using stochastic gradient descent.
4 Model Deployment Factors to Consider
There are a number of factors and implications that one should consider when deciding how to deploy a machine learning model. These factors include the following:
- How frequently predictions will be generated and how urgent the results are needed.
- If predictions should be generated individually or by batches.
- The latency requirements of the model, the computing power capabilities that one has, and the desired service level agreement (SLA).
- The operational implications and costs required to deploy and maintain the model.
Understanding these factors will help you decide among the one-off, batch and real-time model deployment methods.