A Guide to Pandas Pivot Table

Pandas’ pivot_table function operates similar to a spreadsheet, making it easier to group, summarize and analyze your data. Here’s how to create your own.     

Written by Rebecca Vickery
Published on Sep. 29, 2022
A Guide to Pandas Pivot Table
Image: Shutterstock / Built In
Brand Studio Logo

The pandas library is a popular Python package for data analysis. When initially working with a data set in pandas, the structure will be two-dimensional, consisting of rows and columns, which are also known as a DataFrame. An important part of data analysis is the process of grouping, summarizing, aggregating and calculating statistics about this data. Pandas pivot tables provide a powerful tool to perform these analysis techniques with Python.

Create Your Own Pandas Pivot Table in 4 Steps

  1. Download or import the data that you want to use.
  2. In the pivot_table function, specify the DataFrame you are summarizing, along with the names for the indexes, columns and values.
  3. Specify the type of calculation you want to use, such as the mean. 
  4. Use multiple indexes and column-level grouping to create a more powerful summary of the data. 

If you are a spreadsheet user then you may already be familiar with the concept of pivot tables. Pandas pivot tables work in a very similar way to those found in spreadsheet tools such as Microsoft Excel. The pivot table function takes in a data frame and the parameters detailing the shape you want the data to take. Then it outputs summarized data in the form of a pivot table.

I will give a brief introduction with code examples to the pandas pivot table tool. I’ll then use a data set called “autos,” which contains a range of features about cars, such as the make, price, horsepower and miles per gallon.

You can download the data from OpenML, or the code can be imported directly into your code using the scikit-learn API as shown below.

import pandas as pd
import numpy as np
from sklearn.datasets import fetch_openml

X,y = fetch_openml("autos", version=1, as_frame=True, return_X_y=True)
data = X
data['target'] = y

 

How to Create a Pandas Pivot Table

A pandas pivot table has three main elements:

  • Index: This specifies the row-level grouping.
  • Column: This specifies the column level grouping. 
  • Values: These are the numerical values you are looking to summarize.
Pivot table containing index values and columns
Basic anatomy of a pandas pivot table. | Image: Rebecca Vickery

The code used to create the pivot table can be seen below. In the pivot_table function, we specify the DataFrame we are summarizing, and then the column names for the values, index and columns. Additionally, we specify the type of calculation we want to use. In this case, we’re computing the mean.

pivot = np.round(pd.pivot_table(data, values='price', 
                                index='num-of-doors', 
                                columns='fuel-type', 
                                aggfunc=np.mean),2)
pivot

Pivot tables can be multi-level. We can use multiple indexes and column level groupings to create more powerful summaries of a data set.

pivot = np.round(pd.pivot_table(data, values='price', 
                                index=['num-of-doors', 'body-style'], 
                                columns=['fuel-type', 'fuel-system'], 
                                aggfunc=np.mean,
                                fill_value=0),2)
pivot
Multi-level pandas pivot table for cars
Basic anatomy of a multi-level pandas pivot table. | Image: Rebecca Vickery

More on PandasSorting Data Frames in Pandas: A Hands-On Guide

 

How to Plot with Pandas Pivot Table

Pandas pivot tables can be used in conjunction with the pandas plotting functionality to create useful data visualizations.

Simply adding .plot() to the end of your pivot table code will create a plot of the data. As an example, the below code creates a bar chart showing the mean car price by make and number of doors.

np.round(pd.pivot_table(data, values='price', 
                                index=['make'], 
                                columns=['num-of-doors'], 
                                aggfunc=np.mean,
                                fill_value=0),2).plot.barh(figsize=(10,7),
                                                          title='Mean car price by make and number of doors')
bar chart created to show mean price by make and number of doors
Plot created using a pandas pivot table analyzing the mean car price by make and number of doors. | Image: Rebecca Vickery 

 

How to Calculate With Pandas Pivot Table 

The aggfunc argument in the pivot table function can take in one or more standard calculations.

The following code calculates the mean and median price for car body style and the number of doors.

np.round(pd.pivot_table(data, values='price', 
                                index=['body-style'], 
                                columns=['num-of-doors'], 
                                aggfunc=[np.mean, np.median],
                                fill_value=0),2)
car mean and median calculation for pandas pivot table
Pivot table aggregations. | Image: Rebecca Vickery

You can add the argument margins=True to add totals to columns and rows. You can also specify a name for the totals using margins_name.

np.round(pd.pivot_table(data, values='price', 
                                index=['body-style'], 
                                columns=['num-of-doors'], 
                                aggfunc=[np.sum],
                                fill_value=0,
                                margins=True, margins_name='Total'),2)
Adding totals in pandas pivot table
A chart showing the totals added using pandas pivot table. | Image: Rebecca Vickery
A tutorial on the basics of pandas pivot tables. | Video: codebasics

 

How to Style Your Pandas Pivot Table

When summarizing data, styling is important. We want to ensure that the patterns and insights that the pivot table is providing are easy to read and understand. In the pivot tables used in earlier parts of the article, very little styling has been applied. As a result, the tables are not easy to understand or visually appealing.

We can use another Pandas method, known as the style method to make the tables look prettier and easier to draw insights from. The code below adds appropriate formatting and units of measurement to each of the values used in this pivot table. It is now much easier to distinguish between the two columns and to comprehend what the data is telling you.

pivot = np.round(pd.pivot_table(data, values=['price', 'horsepower'], 
                                index=['make'], 
                                aggfunc=np.mean,
                                fill_value=0),2)

pivot.style.format({'price':'${0:,.0f}',
                   'horsepower':'{0:,.0f}hp'})
pandas pivot table with styling added for price and horsepower
Adding styling to the pivot table. | Image: Rebecca Vickery

We can combine different formats using the styler and use the pandas built-in styles to summarize data in a way that instantly draws insights out. In the code and pivot table shown below, we have ordered the make of the car by price from high to low value, added appropriate formatting to the numbers and added a bar chart overlaying the values in both columns. This makes it easier to draw conclusions from the table, such as which make of car is the most expensive and how horsepower relates to the price for each car make.

pivot = np.round(pd.pivot_table(data, values=['price', 'horsepower'], 
                                index=['make'], 
                                aggfunc=np.mean,
                                fill_value=0),2)
                                
pivot = pivot.reindex(pivot['price'].sort_values(ascending=False).index).nlargest(10, 'price')

pivot.style.format({'price':'${0:,.0f}',
                   'horsepower':'{0:,.0f}hp'}).bar(color='#d65f5f')
pandas pivot table with bar chart styling
Using built-in styles within pandas to make it easier to draw insights from the data. | Image: Rebecca Vickery

More on PandasA Beginner’s Guide to Using Pandas for Text Data Wrangling With Python

 

Advantages of Pandas Pivot Tables

Pivot tables have been in use since the early ’90s with Microsoft patenting the famous Excel version known as “PivotTable” in 1994. They are still widely used today because they are such a powerful tool for analyzing data. The Pandas pivot table brings this tool out of the spreadsheet and into the hands of Python users.

This guide gave a brief introduction to the usage of the pivot table tool in Pandas. It is meant to give a beginner a quick tutorial to get up and running with the tool but I suggest digging into the pandas documentation, which gives a more in-depth guide to this function.

Hiring Now
Iodine Software
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
SHARE