When data is imported into Pandas DataFrame, it sometimes contains incorrect or messy column names, requiring you to go through the tedious process of renaming all or some of them.
4 Ways to Rename Columns in Pandas
- Rename columns in Pandas using dictionary with the
pandas.DataFrame.rename()
function. - Rename columns in Pandas by passing a function to the columns parameter.
- Rename Columns in Pandas with
pandas.DataFrame.columns
. - Rename Columns in Pandas with
pandas.DataFrame.set_axis
.
Replacing messy column names with meaningful ones is an essential step in data cleaning. It makes the entire code more readable and saves a lot of time during the next steps of data processing.
I’m going to demonstrate the four best methods to easily change the Pandas DataFrame column names.
I’ll be using a self-created Dummy_Sales_Data which you can get on my Github repo.
Let’s import the data set first:
import pandas as pd
df = pd.read_csv("Dummy_Sales_Data_v1.csv")
df.head()
It is a simple 10000 x 12 data set, which I created.
Now, let’s get started. We’ll begin with the most simple and straightforward method before moving on to other approaches.
Rename Columns in Pandas Using Dictionary
pandas.DataFrame.rename()
is a DataFrame function that alters the axis
labels. Here, the word — axis — refers to both rows and columns depending on which value we set for the parameter axis in this function.
As we are more interested in understanding how to change the column name, let’s focus on that. So, the important parameter for us in .rename()
function is columns as shown below.
To replace some or all of the column names, all you need to do is pass a dictionary where keys
will be old column names and values
will be the new column names as mentioned below.
As you can see, I passed dictionary in the parameter columns in df.rename()
, where keys
are Status
and Quantity
, which are the old column names. And values
are Order_Status
and Order_Quantity
, which are the new column names.
df.rename()
consists of an inplace parameter that is False
by default. In order to retain the changes in the column names, you need to make inplace = True
.
Because I didn’t want to retain the changed column names, I used .head()
method to see how it looks with the changed column name.
Before making inplace = True
in any function, it’s always a good idea to use .head()
to see how the change looks before you finalize it. The next method is a slight variation of .rename() function
.
Rename Columns in Pandas Using Functions
Just like the first method above, we will still use the parameter columns in the .rename()
function. But instead of passing the old name-new name key-value pairs, we can also pass a function to the columns parameter.
For example, converting all column names to upper case is quite simple using this trick below.
df.rename(columns=str.upper).head()
I simply used a string function str.upper
to make all column names in upper case, as you can see in the above picture.
In this way, all the column names will be altered in one go. However, this can be made flexible through user-defined functions.
You can pass any user-defined function to the parameter columns to change the column names based on a function.
For example, you can write a simple function to split the column names on underscore ( _ )
, and select only the first part. And then you’ll pass this function to the columns as shown below.
def function_1(x):
x = x.split('_')[0]
return x
df.rename(columns=function_1).head()
The changed column names can be noticed in the above output. As per the applied function, the column names containing _
are split on _
and only the first part of it is assigned as a new column name. For example, Product_Category
becomes Product
.
And if it’s a simple function, like the one above, you can use the lambda function as well.
Rename Columns in Pandas With pandas.DataFrame.columns
This is the method that allows you to return the list of all the column names of the DataFrame, such as: df.columns
However, in the reverse way, we can also pass the list of new column names to df.columns. Now, the new column names will be assigned to the DataFrame.
Here is how it works:
df.columns = ['OrderID', 'Order_Quantity',
'UnitPrice(USD)', 'Order_Status',
'OrderDate', 'Product_Category',
'Sales_Manager', 'Shipping_Cost(USD)',
'Delivery_Time(Days)', 'Shipping_Address',
'Product_Code', 'OrderCode']
df.head()
As you can see, I assigned a list of new column names to df.columns
and the names of all columns are changed accordingly.
For this to work, you need to pass the names of all the columns. The length of this names list must be exactly equal to the total number of columns in the DataFrame.
And without any other options like inplace, the column names are changed directly and permanently. As a result, this method is a bit risky.
So, I would suggest using it only when you are 100 percent sure that you want to change the column names.
You should also remember that the sequence of the column names in the list should match the columns in the DataFrame. Otherwise, the column names can be assigned incorrectly.
With all of the above points kept in mind, this is the best method to change all columns in one go.
Rename Columns in Pandas With pandas.DataFrame.set_axis
This method is originally used to set labels to DataFrame’s axis, i.e. this method can be used to label columns as well as rows.
All you need to do is simply pass the list of column names to the .set_axis()
function and specify axis = 1
to rename columns, like below:
df.set_axis(['A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J', 'K', 'L'], axis=1).head()
This is how you can change the column names for all or some of the columns. One also has to consider all of the points that I mentioned in the previous method.
However, .set_axis()
is a safer version of the previous method df.columns because this contains the inplace parameter. So even before applying changes you can preview future changes.
And to retain the changed column names, simply make inplace = True
.
That’s all you need to know about changing column names. I hope you’ve found this article interesting, useful and refreshing. It’s always important to have column names in a more readable and uniform style. Therefore, renaming columns is one of the essential steps that needs to be done at the beginning of your project.
Frequently Asked Questions
How do I rename a column name in Pandas?
There are four common ways to rename a column in Pandas:
- Using dictionary with the pandas. DataFrame. rename() function.
- Passing a function to the columns parameter.
- Renaming columns with pandas. DataFrame. columns.
- Renaming columns with pandas. DataFrame. set_axis.
How do I rename the last column in Pandas?
Basic methods for renaming only the last column in Pandas include directly assigning a new name to the column or using the rename() function.
How do I fix column names in Pandas?
The same methods for renaming columns can also be used for fixing column names, including using the rename() function and the set_axis function. Another option is to simply assign new names to each column.