How To Productionize Machine Learning Models

When productionizing machine learning models, take the time to automate and standardize as many parts of the process as possible.

According to Data Scientist Alicia Erwin, manually preparing data or acting on predictions introduces opportunities for unnecessary error and bias to creep in.

“The more automated the process is, the easier it is to execute and the more likely it is to be used exactly as you intended,” Erwin, who works at local tech company Pareto Intelligence, said.

Further, having a well-defined production process ensures teams will be more easily alerted to abnormalities in the data or system, she added.

She and fellow data science professionals agree that in addition to standardization, plenty of model testing helps teams check that features are behaving as expected and that any errors match what they might have encountered in development.

Machine learning experts and teams dove into how thy productionize machine learning models that work for their businesses.

What is a machine learning model?

A machine learning model is a mathematical file that is trained with an algorithm to recognize patterns and provide predictions.

Graphika

Alex Ruch

MACHINE LEARNING RESEARCH ENGINEER

Alex Ruch

What tools have you found to be most effective for productionizing Graphika’s ML models?

At Graphika, we use a variety of Python-based ML frameworks to examine how behavior and language unfold over the cybersocial landscapes of our network maps. Many of our deep learning models use PyTorch as a back end. For example, the Deep Graph Library provides a flexible way for us to develop semi-supervised classification models for nodes. And the DGL-KE package lets us scale our knowledge graphs to millions of nodes. Hugging Face’s Tokenizers and Transformers libraries also enabled us to produce and test language models for text classification and multi-label models of sentiment while avoiding huge amounts of boilerplate code.

These tools greatly enhanced our ability to quickly generate results given their easy integration with GPU processing. But we also use more traditional ML frameworks like scikit-learn, Gensim, and scattertext for in-house analyses.

Teams should carefully evaluate when the benefits of new models outweigh their costs.”

What are your best practices for deploying a machine learning model to production?

We use MLflow to package, track, register and serve machine learning projects. It’s helped us make improvements to ensure model integrity while letting us efficiently replicate runtime environments across servers. For example, MLflow automatically logs our automated hyperparameter tuning trials with Optuna. It also saves the best-performing model to our registry along with pertinent information on how and on what data it was trained.

Then, MLflow allows us to easily serve models accessible by API requests. Together, this training and deployment pipeline lets us know how each of our models were created. It helps us better trace the root cause of changes and issues over time as we acquire new data and update our model. We have greater accountability over our models and the results they generate.

What advice do you have for other data scientists looking to better productionize ML models?

Productionizing machine learning models is a complex decision-making process. New model architectures are created almost daily, but often the purported gains of such approaches fail to outweigh the technical debt. For example, a simple logistic regression model can often perform within an acceptable range compared to a deep neural network if data is high quality.

Teams should carefully evaluate when the benefits of new models outweigh their costs, and when they should update or upgrade modeling approaches. Perhaps effort would be better spent on improving data quality, which would not only help the present modeling pipeline but boost the performance of future models. This approach minimizes technical debt now and in the future, whereas changing models may only affect immediate performance gaps.

Pager

Jaime Ignacio Castro Ricardo

DATA ENGINEER

Jaime Ignacio Castro Ricardo

What tools have you found to be most effective for productionizing Pager’s ML models?

Like most of the industry, at Pager, use Python and Jupyter Notebooks for exploratory data analysis and model development. We recently switched from self-hosting Jupyter Notebooks to using Google Colab since much of our tech stack is already on the Google Cloud Platform. Colab offers an easy medium for collaboration between team members.

We deal primarily with chatbots, so our ML stack is geared toward natural language processing. We use scikit-learn, spaCy, and Rasa as the main ML and NLP libraries to build our models. There’s also an in-house framework we developed around them to streamline our experimentation and deployment process.

The engineering department integrated GitOps into our continuous integration and delivery pipelines. New versions of our models are Dockerized and deployed to a production Kubernetes cluster when we merge into master by Google Cloud Build.

User, client and clinical input gets used to design product features that leverage our models.”

What are your best practices for deploying a machine learning model to production?

We perform thorough unit testing with pytest, and exhaustive integration and user testing in multiple lower-level environments. User, client and clinical input gets used to design product features that leverage our models. Then we iterate on proof-of-concept feedback from users. We also quantify how new ML models affect efficiency and productivity to gauge their real-world effectiveness.

Because of these practices, our ML team rarely introduces bugs in production. Users are usually happy when models are deployed since they were involved in feature planning, discussion and reviews. Additional training and input helps users be more efficient when using the features in production.

What advice do you have for data scientists looking to better productionize ML models?

Always keep track of experiments with versioning; not just for trained models but also for the input data, hyperparameters and results. Such metadata proves useful when we develop new models for the same problem and reproduce old ones for comparison and benchmarking. We use an in-house framework for developing new models, but MLflow is a great open-source solution as well.

pareto intelligence machine learning models

Pareto Intelligence

Data Scientist Alicia Erwin speaks highly of SHAP (SHapley Additive exPlanations), a tool that assigns each machine learning feature an importance value for a particular prediction. At data healthcare company Pareto Intelligence, she said the Python package has become a critical component in ensuring the success of her productionized ML models, thanks in part to the transparency it provides end users.

What frameworks, languages or other tech tools have you found to be most effective for productionizing machine learning models, and why?

Machine learning models are often described as a black box. They can be difficult for non-data scientists to understand and trust. Without this understanding and trust, state-of-the-art productionized ML models may not get used to their full potential regardless of how accurate the predictions might be.

I find SHAP useful in helping to bridge this gap. SHAP is a Python package that explains ML model predictions using Shapley values. I have incorporated this tool into my productionized ML models to provide end users with the top five most important features in making each prediction, as well as the exact values of those features for that record.

We productionize models so that end users can act on the final prediction. SHAP makes doing so easier by providing insight into what led to each prediction. In turn, this insight creates understanding and trust in the model prediction and empowers end users with additional information on how to act.

What best practices do you follow when deploying a machine learning model to production?

There are several well-known best practices that I think most data scientists try to follow when deploying a machine learning model to production. They include running the production process in a test environment before deployment, checking the reproducibility of the process and monitoring production performance.

I sometimes find it difficult to preprocess the data consistently between training and production. New data you want to use for prediction won’t always be in the same format as your original training data. The discrepancy might require you to make adjustments to preprocessing procedures if production data doesn’t work with your model. When making these changes, you might unknowingly affect model predictions.

To avoid this problem, I take the time to test any updated preprocessing methods, no matter how small, on the training data. This way, I am able to see what effect the adjustments have on the input data and ensure we see the predictions we expect. With this best practice, deployed models produce more reliable predictions and cause fewer interruptions to the people and workflows they are designed to support.

The more automated the process is, the easier it is to execute.’’

What advice do you have for other data scientists looking to better productionize ML models?

Advocate for and take the extra time to automate and standardize as many parts of the process as possible. Data science teams might receive pressure to cut corners in order to produce something faster. However, manually preparing data or making decisions or manually generating and acting on predictions introduces opportunities for unnecessary error and bias to creep in. Therefore, it is important to have a well-defined production process that alerts you to abnormalities in the data or in the system, which are often a clue that something is off and needs to be investigated.

Additionally, the more automated the process is, the easier it is to execute and the more likely it is to be used exactly as you intended. Systematic methods take more time to build initially but make for better ML implementations in the long run. Discuss these benefits with business stakeholders to help them understand the value of the time spent building out these tools.

KAR Global

At KAR Global, the data science team remains ready to adapt to machine learning’s evolving landscape. As vice president of data science and analytics at the automotive marketplace platform, Chris Simokat also said he tries to strike a balance between tools that lend themselves to natural development and those their DevOps partners will find most operationally sustainable.

What frameworks, languages or other tech tools have you found to be most effective for productionizing machine learning models, and why?

KAR Global’s DRIVIN data science and analytics teams mainly work in Python, R and SQL to develop our models. We productionize using a mix of open-source and cloud-based technologies like H2O’s POJOs (Plain Old Java Object), Python modules like Flask APIs in containers elastically scaled in AWS ECS and AWS Lambdas or directly in SQL statements or as SQL UDFs.

We use these tools because they align with our overall tech strategy and because they lend themselves well to productionizing.

What best practices do you follow when deploying a machine learning model to production?

Our data science and analytics teams strive to strike a balance between what is natural for the tools we develop in and what is operationally sustainable for our DevOps partners. This process often takes on a different form depending on the situation. So rather than having an overly prescriptive or rigid process, we look to high-level guiding principles.

We empower our data scientists to make the right decisions and encourage them to consult with and gain buy-in from their various stakeholders. Open communication with technology and business stakeholders about trade-offs also plays a key role in defining requirements and setting rational expectations for everyone involved. This feedback loop also validates that we are focusing our efforts on the highest-impact projects.

Solutioning for the wrong use case may end up in costly rework or missed delivery dates.’’

What advice do you have for other data scientists looking to better productionize ML models?

Have a process but remain flexible and be ready to adapt to the evolving landscape of machine learning. Also, document your decisions and be able to justify them when communicating with stakeholders. Listen to feedback and ensure you clearly understand how models will be used in production. Building for batch is very different than building for real time or needing to support both. Solutioning for the wrong use case may end up in costly rework or missed delivery dates.

Quantium

Anton Bubna-Litic, lead data scientist at Quantium, recommends reusing as much code as possible between training and serving. Bubna-Litic said that doing so will minimize or eliminate any unwelcome surprises once the model goes to production. And for Quantium, a company that provides clients with bespoke, complex predictive models to help them improve their business processes, ensuring data precision matters.

What frameworks, languages or other tech tools have you found to be most effective for productionizing machine learning models, and why?

The most effective tools we have been using on recent projects are Docker and Kubernetes. They have been set up by our engineers and allow us to manage and scale the training and serving of multiple models without having to worry about changing environments and packages.

Of course, as a consulting company, it’s important that we use the framework, languages and tools our clients use to ensure the maintainability of the models.

What best practices do you follow when deploying a machine learning model to production?

Plenty of testing and checking. For example, always test your models on an out-of-sample holdout before deploying to production. Check that the features are behaving as expected and that any errors match what you’ve seen in development. There have been a few times in my experience where these checks have caught upstream issues and we’ve been able to avoid deploying a faulty model to some of our key clients.

You also want to re-use as much code between training and serving as possible to ensure that the model you’ve trained stays steady in production. Once productionized, always track your model’s outputs with both metrics and visualizations. This will help you monitor whether your models are getting stale sooner than expected and ensure poor data doesn’t break them.

We choose simple models and features over more complex ones.’’

What advice do you have for other data scientists looking to better productionize ML models?

Quantium’s data science problems tend to be quite statistically complex, but productionizing ML models is much more engineering than it is statistics. For us, having close relationships with our engineers and making sure we’re having fun as a team is important.

As a data scientist, you want to try to learn as much as you can by working with engineers in this process. We choose simple models and features over more complex ones. Then if things go wrong, it is easier to diagnose and correct them.

Lastly, build in plenty of checks. Machine learning pipelines are prone to silent errors.

Caterpillar

The Cat Digital Analytics Center of Excellence serves as a bridge between Caterpillar and Cat Digital. It coordinates and supports an analytics community by sharing a common set of standards and processes that analytics practitioners can reuse. During her time as part of that team, Analytics Manager Fiona O’Laughlin says she’s learned the importance of keeping code clean, as well as documenting and versioning everything.

What frameworks, languages or other tech tools have you found to be most effective for productionizing machine learning models, and why?

For non-deep learning models, the scikit-learn framework is a go-to. For deep learning models, Keras, PyTorch and TensorFlow are key. TensorFlow also has applications for custom models. For example, it allows us to apply additional weights to particular layers. Docker, a tool used as a container, allows us to package and deploy models with necessary parts, including libraries and other dependencies.

Being able to reproduce model outputs is key for our data science solutions. Git and DVC can help enforce reproducibility across the entire development process, from research and prototyping to production implementation. These open-source tools help us maintain combinations of data, configurations, models and code.

What best practices do you follow when deploying a machine learning model to production?

Deploying machine learning models into production depends heavily on the model trained and the data available. By keeping things simple and understanding the hardware where models will run, we can better identify the performance expectations from the model.

For many of our models, retraining is not automated since these are supervised models with labeled data. We train the models manually and then replace the production deployed version. We need additional infrastructure to handle the on-demand requests and interaction between the data and model. AWS SageMaker can automate a lot of this process.

Another best practice is version control for tracking code changes. Our team uses a variety of open-source tools for versioning data and keeping track of experiments and model versions. Finally, an automated pipeline offers a huge impact on ML model deployment. ML models can be easily retrained with new data and implemented with the test data.

Don’t be afraid to experiment with data manipulation.’’

What advice do you have for other data scientists looking to better productionize ML models?

Our Cat Digital team is all about keeping code and notebooks clean and documenting and versioning everything. Make sure you research and read documentation and engage with the wider community. There are so many resources out there. Ask questions of fellow coders or read articles by leading experts on Twitter, Stack Overflow and Toward Data Science.

Finally, don’t be afraid to experiment with data manipulation such as feature engineering and dimension reduction. When you experiment and explore, you learn.

Further ReadingAn In-Depth Guide to Supervised Machine Learning Classification

How to Productionize Machine Learning Models

What is a machine learning model?

Graphika

Alex Ruch

Pager

Jaime Ignacio Castro Ricardo

Pareto Intelligence

KAR Global

Quantium

Caterpillar

Responses have been edited for length and clarity. Images via listed companies.

Recent Machine Learning Articles