Pandas is a data analysis and manipulation library for Python. It provides numerous functions and methods to manage tabular data. The core data structure of pandas is DataFrame, which stores data in tabular form with labeled rows and columns.
How to Add a Column to a pandas DataFrame
How to add a column onto the end of a pandas DataFrame:
df["new column"] = 1
or
df["new column"] = [1, 2, 3]
In this code, the first set of brackets represents the name of the new column, while values after the =
sign is the value(s) assigned under the column.
How to insert a column in a pandas DataFrame:
df.insert(1, "new column", 1)
or
df.insert(1, "new column", [1, 2, 3])
This code uses insert(), which requires three parameters: the index of where the new column will be added, the name of the new column and the new value(s) under the column.
Both methods above use example code, so make sure to replace the values and parameters with the ones you need for your own DataFrame.
From a data perspective, rows represent observations or data points. Columns represent features or attributes about the observations. Consider a DataFrame of house prices. Each row is a house and each column is a feature about the house such as age, number of rooms, price and so on.
Adding or dropping columns is a common operation in data analysis. We’ll go over four different ways of adding a new column to a DataFrame.
First, let’s create a simple DataFrame to use in the examples.
import numpy as np
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4],
"B": [5, 6, 7, 8]})
df
4 Pandas Add Column Methods
Below are four methods for adding columns to a pandas DataFrame.
Method 1: Adding Columns on the End
This might be the most commonly used method for creating a new column.
df["C"] = [10, 20, 30, 40]
df
We specify the column name like we are selecting a column in the DataFrame. Then, the values are assigned to this column. A new column is added as the last column, i.e. the column with the highest index.
We can also add multiple columns at once. Column names are passed in a list and values need to be two-dimensional compatible with the number of rows and columns. For instance, the following code adds three columns filled with random integers between zero and 10.
df[["1of3", "2of3", "3of3"]] = np.random.randint(10, size=(4,3))
df
Let’s drop these three columns before going to the next method.
df.drop(["1of3", "2of3", "3of3"], axis=1, inplace=True)
Method 2: Add Columns at a Specific Index
In the first method, the new column is added at the end. Pandas also allows for adding new columns at a specific index. The insert function can be used to customize the location of the new column. Let’s add a column next to column A.
df.insert(1, "D", 5)
df
The insert function takes three parameters that are the index, the name of the column and the values. The column indices start from zero, so we set the index parameter as one to add the new column next to column A. We can pass a constant value to be filled in all rows.
Method 3: Add Columns with Loc
The loc method allows you to select rows and columns using their labels. It’s also possible to create a new column with this method.
df.loc[:, "E"] = list("abcd")
df
In order to select rows and columns, we pass the desired labels. The colon indicates that we want to select all the rows. In the column part, we specify the labels of the columns to be selected. Since the DataFrame does not have column E, pandas creates a new column.
Method 4: Add Columns With the Assign Function
The last method is the assign function.
df = df.assign(F = df.C * 10)
df
We specify both the column name and values inside the assign function. You may notice that we derive the values using another column in the DataFrame. The previous methods also allow for similar derivations.
There is an important difference between the insert and assign functions. The insert function works in place, which means adding a new column is saved in the DataFrame.
The situation is a little different with the assign function. It returns the modified DataFrame but does not change the original one. In order to use the modified version with the new column, we need to explicitly assign it.
We’ve now covered four different methods for adding new columns to a pandas DataFrame, a common operation in data analysis and manipulation. One of the things I like about pandas is that it usually provides multiple ways to perform a given task, making it a flexible and versatile tool for analyzing and manipulating data.