Function Wrappers in Python: Model Runtime and Debugging

In Python, function wrappers are called decorators, and they have a variety of useful applications in data science. This guide covers how to use them for managing model runtime and debugging.

Written by Sadrach Pierre
Function Wrappers in Python: Model Runtime and Debugging
Image: Shutterstock / Built In
Brand Studio Logo
UPDATED BY
Brennan Whitfield | May 21, 2024

Function wrappers are useful tools for modifying the behavior of functions. In Python, they’re called decorators. Decorators allow us to extend the behavior of a function or a class without changing the original implementation of the wrapped function.

What Are Python Wrappers?

Function wrappers, called decorators in Python, are used to modify the behavior of functions without changing the original implementation of the wrapped function. Common applications include monitoring the runtime of function calls or debugging other functions.

The functools module in Python makes defining custom decorators easy, which can “wrap” (modify/extend) the behavior of another function. Defining function wrappers is very similar to defining ordinary functions in Python. Once the function decorator is defined, then we simply use the “@” symbol and the name of the wrapper function in the line of code preceding the function we’d like to modify or extend. The process for defining timer and debugger function wrappers follows similar steps.

Python Decorators in 15 Minutes. | Video: Kite

Below, we cover how to define and apply function wrappers for profiling machine learning model runtime for a simple classification model. We will use this function wrapper to monitor the runtime of the data preparation, model fit and model predict steps in a simple machine learning workflow. We will also see how to define and apply function wrappers for debugging these same steps. 

I will be working with Deepnote, a data science notebook that makes running reproducible experiments simple, and with the fictitious Telco Churn data set, which is publicly available on Kaggle.

More on PythonHow to Copy a File With Python

 

Monitoring Runtime of Machine Learning Workflow Using Wrappers

Let’s go through an example to see how this process works.

Data Preparation

Let’s start the data preparation process by navigating to the Deepnote platform (sign up is free if you don’t already have an account). Let’s create a project. 

Image: Screenshot by the author.

And name our project function_wrappers and name our notebook profiling_debugging_mlworkflow:

Image: Screenshot by the author.

Let’s add our data to Deepnote:

Image: Screenshot by the author.

We will be using the Pandas library to handle and process our data. Let’s import it:

import pandas as pd

Next, let’s define a function that we will call data_preparation:

def data_preparation():
   pass

Let’s add some basic data processing logic. This function will perform five tasks:

  1. Read in data
  2. Select relevant columns: The function will take a list of column names as input
  3. Clean the data: Specify column data types
  4. Split data for training and testing: The function will will take test size as input 
  5. Return training and testing set 

Let’s first add the logic to read in the data. Let’s also add logic to display the first five rows:

def data_preparation(columns, test_size):
  df = pd.read_csv("telco_churn.csv")
  print(df.head())

Let’s call our data prep function. For now, let’s pass “none” as arguments for columns and test size:

​​def data_preparation(columns, test_size):
  df = pd.read_csv("telco_churn.csv")
  print(df.head())

data_preparation(None, None)
Image: Screenshot by the author.

Next, within our data_preparation method let’s use the columns variable to filter our data frame, define a list of column names we will use, and call our function with the columns variable:

def data_preparation(columns, test_size):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()
  print(df_subset.head())

columns = ["gender", "tenure", "PhoneService", "MultipleLines", "TotalCharges", "Churn"]
data_preparation(columns, None)

Next, let’s specify another function argument, which we will use to specify data types of each column. Within a for loop in our function, we will specify the data for each column, which we will get from our input dictionary of data type mappings:

def data_preparation(columns, test_size, datatype_dict):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()

  for col in columns:
     df_subset[col] = df_subset[col].astype(datatype_dict[col])
  print(df_subset.head())

columns = ["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges", "Churn"]
datatype_dict = {"gender":"category", "tenure":"float", "PhoneService":"category", "MultipleLines":"category", "MonthlyCharges":"float", "Churn":"category"}
data_preparation(columns, None, datatype_dict)

Within another for loop, we will convert all categorical columns to machine-readable codes:

def data_preparation(columns, test_size, datatype_dict):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()

  for col in columns:
     df_subset[col] = df_subset[col].astype(datatype_dict[col])

  for col in columns:
   if datatype_dict[col] == "category":
     df_subset[col] = df_subset[col].cat.codes

columns = ["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges", "Churn"]
datatype_dict = {"gender":"category", "tenure":"float", "PhoneService":"category", "MultipleLines":"category", "MonthlyCharges":"float", "Churn":"category"}
data_preparation(columns, None, datatype_dict)

Finally, let’s specify our input and output, split our data for training and testing and return our train and test sets. First, let’s import the train test split method from the model selection module in Scikit-learn:

from sklearn.model_selection import train_test_split
Next, let’s specify our inputs, outputs, training and testing sets:

def data_preparation(columns, test_size, datatype_dict):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()

  for col in columns:
     df_subset[col] = df_subset[col].astype(datatype_dict[col])

  for col in columns:
   if datatype_dict[col] == "category":
     df_subset[col] = df_subset[col].cat.codes
  X = df_subset[["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges",]]
  y = df_subset["Churn"]
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
  return X_train, X_test, y_train, y_test

columns = ["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges", "Churn"]
datatype_dict = {"gender":"category", "tenure":"float", "PhoneService":"category", "MultipleLines":"category", "MonthlyCharges":"float", "Churn":"category"}
X_train, X_test, y_train, y_test = data_preparation(columns, 0.33, datatype_dict)

Model Training 

Now that we have our training and test data prepared, let’s train our classification model. For simplicity, let’s define a function that trains a random forest classifier with default parameters and sets a random state reproducibility. The function will return the trained model object. Let’s start by importing the random forest classifier:

from sklearn.ensemble import RandomForestClassifier

Next, let’s define our fit function and store the trained model object:

def fit_model(X_train,y_train):
   model = RandomForestClassifier(random_state=42)
   model.fit(X_train,y_train)
   return model

model = fit_model(X_train,y_train)

Model Predictions and Performance

Let’s also define our predict function that will return model predictions

def predict(X_test, model):
   y_pred = model.predict(X_test)
   return y_pred

y_pred = predict(X_test, model)

Finally, let’s define a method that reports classification performance metrics

def model_performance(y_pred, y_test):
   print("f1_score", f1_score(y_test, y_pred))
   print("accuracy_score", accuracy_score(y_test, y_pred))
   print("precision_score", precision_score(y_test, y_pred))

model_performance(y_pred, y_test)
Image: Screenshot by the author.

Now, if we want to use function wrappers to define our timer, we need to import the functools and time modules: 

import functools
import time

Next, let’s define our timer function. We will call it runtime_monitor. It will take a parameter called input_function as an argument. We will also pass the input function to the wraps method in the functools wrappers, which we will place before our actual timer function, called runtime_wrapper:

def runtime_monitor(input_function):
   @functools.wraps(input_function)
   def runtime_wrapper(*args, **kwargs):

Next, within the scope of runtime wrapper, we specify the logic for calculating the execution runtime for our input function. We define a starting time value, the return value of our function (which is where we execute our function) an endtime value, and the runtime value, which is the difference between the start and endtime

  def runtime_wrapper(*args, **kwargs):
       start_value = time.perf_counter() 
       return_value = input_function(*args, **kwargs)
       end_value = time.perf_counter()
       runtime_value = end_value - start_value 
       print(f"Finished executing {input_function.__name__} in {runtime_value} seconds")
       return return_value

Our timer function (runtime_wrapper) is defined within the scope of our runtime_monitor function. The full function is as follows:

def runtime_monitor(input_function):
   @functools.wraps(input_function)
   def runtime_wrapper(*args, **kwargs):
       start_value = time.perf_counter() 
       return_value = input_function(*args, **kwargs)
       end_value = time.perf_counter()
       runtime_value = end_value - start_value 
       print(f"Finished executing {input_function.__name__} in {runtime_value} seconds")
       return return_value
   return runtime_wrapper

We can then use runtime_monitor to wrap our data_preparation, fit_model, predict, and model_performance functions. For data_preparation, we have the following:

@runtime_monitor
def data_preparation(columns, test_size, datatype_dict):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()
  for col in columns:
     df_subset[col] = df_subset[col].astype(datatype_dict[col])
  for col in columns:
   if datatype_dict[col] == "category":
     df_subset[col] = df_subset[col].cat.codes
  X = df_subset[["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges",]]
  y = df_subset["Churn"]
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
  return X_train, X_test, y_train, y_test

columns = ["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges", "Churn"]
datatype_dict = {"gender":"category", "tenure":"float", "PhoneService":"category", "MultipleLines":"category", "MonthlyCharges":"float", "Churn":"category"}
X_train, X_test, y_train, y_test = data_preparation(columns, 0.33, datatype_dict)

We see our data preparation function takes 0.04 to execute. For fit_model, we have:

@runtime_monitor
def fit_model(X_train,y_train):
   model = RandomForestClassifier(random_state=42)
   model.fit(X_train,y_train)
   return model

model = fit_model(X_train,y_train)

For our predict:

@runtime_monitor
def predict(X_test, model):
   y_pred = model.predict(X_test)
   return y_pred

y_pred = predict(X_test, model)

And finally, for model performance:

@runtime_monitor
def model_performance(y_pred, y_test):
   print("f1_score", f1_score(y_test, y_pred))
   print("accuracy_score", accuracy_score(y_test, y_pred))
   print("precision_score", precision_score(y_test, y_pred))

model_performance(y_pred, y_test)
Image: Screenshot by the author.

We see that the fit method is the most time consuming, which we would expect. Being able to reliably monitor the runtime of these functions is essential for resource management when building even simple machine learning workflows such as this. 

 

Debugging Machine Learning Models Using Wrappers

Defining a debugger function wrapper is also a straightforward process. Let’s start by defining a function called debugging method. Similar to our timer function, it will take a function as input. We will also pass the input function to the wraps method in the functools wrappers, which we will place before our actual debugger function, called debugging_wrapper. The debugging_wrapper will take arguments and keyword arguments as inputs:

def debugging_method(input_function):
   @functools.wraps(input_function)
   def debugging_wrapper(*args, **kwargs):

Next, we will store the representations of arguments, the key words and their values in lists called arguments and keyword_arguments respectively:

def debugging_wrapper(*args, **kwargs):
       arguments = []
       keyword_arguments = []
       for a in args:
          arguments.append(repr(a))    
       for key, value in kwargs.items():
          keyword_arguments.append(f"{key}={value}")

Next, we will concatenate arguments and keyword_argument and then join them in a string:

   def debugging_wrapper(*args, **kwargs):
      ...#code truncated for clarity
      function_signature = arguments + keyword_arguments
      function_signature = "; ".join(function_signature)  

Finally, we will print the function name, its signature and its return value:

   def debugging_wrapper(*args, **kwargs):
      ...#code truncated for clarity
       print(f"{input_function.__name__} has the following signature: {function_signature}")
       return_value = input_function(*args, **kwargs)
       print(f"{input_function.__name__} has the following return: {return_value}") 

The debugging_wrapper function will also return the return value of the input function. The full function is as follows:

def debugging_method(input_function):
   @functools.wraps(input_function)
   def debugging_wrapper(*args, **kwargs):
       arguments = []
       keyword_arguments = []
       for a in args:
          arguments.append(repr(a))    
       for key, value in kwargs.items():
          keyword_arguments.append(f"{key}={value}")
       function_signature = arguments + keyword_arguments
       function_signature = "; ".join(function_signature)      
       print(f"{input_function.__name__} has the following signature: {function_signature}")
       return_value = input_function(*args, **kwargs)
       print(f"{input_function.__name__} has the following return: {return_value}") 
       return return_value
   return debugging_wrapper

Data Preparation 

We can now wrap our data_preparation function with our debugging_method:

@debugging_method
@runtime_monitor
def data_preparation(columns, test_size, datatype_dict):
  df = pd.read_csv("telco_churn.csv")
  df_subset = df[columns].copy()

  for col in columns:
     df_subset[col] = df_subset[col].astype(datatype_dict[col])

  for col in columns:
   if datatype_dict[col] == "category":
     df_subset[col] = df_subset[col].cat.codes
  X = df_subset[["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges",]]
  y = df_subset["Churn"]
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
  return X_train, X_test, y_train, y_test

columns = ["gender", "tenure", "PhoneService", "MultipleLines","MonthlyCharges", "Churn"]
datatype_dict = {"gender":"category", "tenure":"float", "PhoneService":"category", "MultipleLines":"category", "MonthlyCharges":"float", "Churn":"category"}
X_train, X_test, y_train, y_test = data_preparation(columns, 0.33, datatype_dict)

Model Training 

We can do the same for our fit function:

@debugging_method
@runtime_monitor
def fit_model(X_train,y_train):
   model = RandomForestClassifier(random_state=42)
   model.fit(X_train,y_train)
   return model

model = fit_model(X_train,y_train)
Image: Screenshot by the author.

Model Predictions and Performance

And for our predict function:

@debugging_method
@runtime_monitor
def predict(X_test, model):
   y_pred = model.predict(X_test)
   return y_pred

y_pred = predict(X_test, model)
Image: Screenshot by the author.

And finally, for our performance function:

@debugging_method
@runtime_monitor
def model_performance(y_pred, y_test):
   print("f1_score", f1_score(y_test, y_pred))
   print("accuracy_score", accuracy_score(y_test, y_pred))
   print("precision_score", precision_score(y_test, y_pred))

model_performance(y_pred, y_test)
Image: Screenshot by the author.

The code in this post is available on GitHub

more on PythonHow to Build Optical Character Recognition (OCR) in Python

 

Function Wrapper Uses in Python

Function wrappers have a wide range of applications in software engineering, data analytics and machine learning.

Monitoring Function and Operation Runtimes

When developing and fitting machine learning models, runtimes for steps like data preparation, model training and predicting steps can vary depending on the size of the data and operation complexity. With this in mind, using wrappers to monitor how runtime of these operations changes when the data changes is useful. For example, wrappers allow developers to monitor how long a function takes to execute and run successfully. This process is essential for managing computational resources like time and costs.

Debugging Other Functions 

Debugging with function wrappers is also useful when building machine learning models. In Python, defining a debugger function wrapper that prints the function arguments and return values is straightforward. Function wrappers can be used to inspect causes of failed function executions using a few lines of code, which can help resolve issues with data preparation, model fit calls and model prediction calls. 

For example, in data preparation, issues and bugs may arise when data is refreshed or model inputs for training are modified. In this case, function wrappers can be used to help indicate how changes in inputs, array shapes and array lengths may be causing fit calls to fail. 

Function wrappers in Python can help simplify runtime monitoring and debugging. Having reliable tools for runtime monitoring and debugging is valuable for both data scientists and machine learning engineers.

 

Frequently Asked Questions

Python wrappers are functions or classes that can encapsulate, or “wrap,” the behavior of another function. Wrappers allow an existing function to be modified or extended without changing its core source code.

In Python, “@wraps” is a decorator provided by the functools module. Using @wraps transfers metadata (attributes like __name__, __doc__, etc.) from another function or class to its wrapper function.

A wrapper is a function in a programming language used to encapsulate another function and its behavior. Encapsulation in programming means combining data and associated data methods into one unit.

Wrappers are called decorators in Python. A decorator is a kind of wrapper, where it works to wrap another function to modify its behavior.

Hiring Now
Own Company
Big Data • Cloud • Software
SHARE