In computer science, a heap is a type of tree-shaped data structure that has the special property of being an almost-completely binary structure satisfying the heap property. A heap can be either a min heap or a max heap. A max heap is a data structure in which each child node is less than or equal to its parent node. A min heap is a similar type of data structure where each child node is greater than or equal to its parent node. Thus, heapifying a heap tree means reordering the child nodes so that they conform to either min heap or max heap rules.
When min heap or max heap constraints are placed on tree data structures, we end up with trees of relatively short length. Because heaps are shorter, traversing from the minimum to the maximum value takes less time, which makes the process of searching for values within the tree much faster.
How to Heapify a Heap Tree in C++
What Are Min Heaps and Max Heaps?
Let’s consider the following max heap.
The property of the max heap is that the root node has the maximum value. Further, the value of each node is less than or equal to its parent node. At the top of the tree we have the root node with a value of 90, which is the largest value in the tree. Further, at the second level we see the values 79 and 72, which are less than 90, and then 30 and 65 which are less than 79, and so on.
Conversely, take a look at the example of the min heap below.
If we look at the value at the root compared to the values at each node below the root, we see that 12 is the smallest value in the tree. At the level below, we have 20 and 29 which are both greater than 12, and so on.
How to Heapify a Heap Tree in C++
The task of heapifying a heap tree is the process of reordering the elements of a tree such that it has the properties of a min or max heap.
Max Heapify
Specifically, max heapify is the process of taking an array that is represented as a binary tree and recording the values at each node such that the child nodes are either less than or equal to the parent, satisfying a max heap:
Min Heapify
Min heapify is the process of recording the values at each node such that the child is greater than or equal to the parent node, satisfying a min heap:
Heap data structures can also be used for finding and keeping track of the minimum/maximum value in an array. This can be useful for scheduling tasks in a priority queue for customers, where customers with issues that take the shortest amount of time are prioritized. This can lead to a lower average waiting time for all customers. Heaps are also used in graph algorithms such as Djiktra’s algorithm, which is used to find the shortest path between two nodes in an array. This can be used for infrastructure planning tasks such as establishing a road network, electricity line, or oil pipeline.
Defining a Heapify Function
To explain how best to heapify your data, let’s consider the following array:
array_in = [3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29]
This array has the corresponding complete binary tree, meaning that each parent node has precisely two child nodes:
We can define a heapify function that takes the array as input and converts it into a max or min heap. Let’s consider converting this binary tree into a max heap. The first thing we need to do is find the last node that isn’t a leaf. A leaf is a node that doesn’t have any children. We see that 11, 13, 19, 22, 24, and 29 are all leaves since they do not point to any children:
Further, reading the nodes in each tree level (that is, a level above the leaf nodes) from left to right we see that the last non-leaf node is 17. This is also the parent of the last node:
If we want to build a max-heap from our binary tree, we can do this by heapifying the nodes up to the last non-leaf node [3,5,8,10,17] going in reverse order. We apply the heapify operation in reverse level order, meaning starting from right to left at each level we compare each child node to its parent. For max-heapify, if the child node is greater than its parent, swap the values. For example, we start the heapify operation by swapping 17 with the value of its furthest right child, 29, since the child is greater than the parent:
We then move to the next node, going from right to left, and compare 24 with 29. This satisfies the max-heap property so we then move on to 22, which we compare to 10. Since 10 is at a parent node and is less than 22, it does not satisfy the heap property, so we swap:
We then move to the next node. Since 19 is less than 22, it satisfies the max heap principle so we move on to the next level. We start at 13 and compare it to its parent. It does not satisfy the heap property so we swap 8 and 13:
The next node values to swap are 5 with 29, then 5 with 24:
Then we swap 3 and 29, 3 and 24, and then 3 and 17, so that 3 works its way down the tree to become a leaf node and 29 is the root:
Indexing a Heap
Before we can move on to write code to heapify our array in C++, we need to quickly identify the index.
We can find the index, which is simply the position in the tree, of the last non-leaf node by taking the floor of half the number of nodes minus one, where the floor is the greatest integer less than or equal to the input. For example, the floor of 2.2 is 2.0. Let’s calculate the index of the last non-leaf node:
index of last non-leaf node = floor of (number of nodes)/2 - 1.
In our earlier example there are 11 nodes so the index of the last non-leaf node is:
index of last non-leaf node = floor of 11/2-1, which would be 5.5 -1 = 4.5. The floor of 4.5 = 4.0.
So the index of the last non-leaf node is 4, which has the value of 17. As a reminder, we start indexing nodes with 0. For example, the first element has an index of 0, the second has an index of 1, the fifth element has an index of 4, and so forth.
Writing Code for Heapifying in C++
Let’s write some C++ code that implements this heapify logic. Let’s create a .cpp file called heapify_code.cpp
. In terminal type:
vi heapify_code.cpp
Let’s start by including iostream
which allows us to write to the standard input/output streams.
#include <iostream>
Let’s also define a function called heapify
that returns void:
void heapify()
{
}
The function will take an integer array input. Let’s call the integer array array_in
. It will also take an integer, subtree_root_index
, for the index of subtree root . It will also take an integer, array_size
, for the size of the array:
void heapify(int array_in[], int index, int array_size)
{
}
Next, we need to define a few variables within the scope of our function. Let’s initialize a variable called largest_value
. Let’s also initialize variables for the left and right children. For the left child, the index is 2*subtree_root_index +1
and the right child is 2*subtree_root_index +2
.
void heapify(int array_in[], int array_size, int subtree_root_index)
{
int largest_value = subtree_root_index;
int left = 2*subtree_root_index + 1;
int right = 2*subtree_root_index + 2;
}
Next let’s add logic that checks if the left child is larger than the root. If the left child is greater than the root, we redefine the largest_value
as the left child. Within this logic we also need to make sure that the index of the left child is less than the size of the array:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (left < array_size && array_in[left] > array_in[largest_value]){
largest_value = left;
}
}
Next we need to add logic that checks if the right child is larger than the root. Like the previous check, if the right child is greater than the root, we redefine the largest_value
as the right child. We also need to make sure that the index of the right child is less than array_size
:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (left < array_size && array_in[left] > array_in[largest_value]){
largest_value = left;
}
if (right < array_size && array_in[right] > array_in[largest_value]){
largest_value = right;
}
}
Finally, we need to check if the largest value is equal to the value at the root. If it is not we swap the values at the root with the largest value:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value];
}
}
And finally we recursively call the heap function on the subtree under the condition largest_value
is not equal to subtree_root_index
:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value]
heapify(array_in, array_size, subtree_root_index);
}
}
The full function is as follows:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
int largest_value = subtree_root_index;
int left = 2*subtree_root_index + 1;
int right = 2*subtree_root_index + 2;
if (left < array_size && array_in[left] > array_in[largest_value]){
largest_value = left;
}
if (right < array_size && array_in[right] > array_in[largest_value]){
largest_value = right;
}
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value]
heapify(array_in, array_size, largest_value);
}
}
Using Our Heapify Function to Define a Heap Constructor
Now that we’re done writing our heapify function, we can write another function that allows us to construct a heap given an input array. This function will take an array and its size as inputs, and within a for loop call the heapify function on the array starting from the last-node leaf node. We will call the function construct Heap
:
void construct_heap(int array_in[], int array_size){
}
Let’s define a variable called last_non_leaf_node
which is the (array_size/2)-1:
void construct_heap(int array_in[], int array_size){
int last_non_leaf_node = (array_size/2) -1;
}
Next we can loop in reverse order starting from the last leaf node, iteratively reducing the index by 1, and call on the heapify function with each value for the index:
void construct_heap(int array_in[], int array_size){
int last_non_leaf_node = (array_size/2) -1;
for (int subtree_root_index = last_non_leaf_node; subtree_root_index >=0; subtree_root_index-=1){
heapify(array_in, array_size, subtree_root_index);
}
}
Next let’s define a print function that will allow use to print out the values in our heap:
void print_heap(int array_in[], int array_size){
cout << "Printing values at each node in heap" << endl;
for (int index = 0; index < array_size; index+=1){
cout<< array_in[index] << endl;
}
}
Now we can define our main function which will serve as the driver code for executing our heapify
, construct_heap
and print_heap
functions. Let’s define the array we were working with earlier, array_in = [3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29], which has the corresponding tree representation:
int main(){
int array_in[] = { 3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29};
int array_size = sizeof(array_in) / sizeof(array_in[0]);
construct_heap(array_in, array_size);
print_heap(array_in, array_size);
}
Let’s compile our script:
g++ heapify_code.cpp
And run our compiled script:
./a.out
And we should get the following output:
Printing values at each node in heap
29
24
13
22
17
11
9
19
10
5
3
This has the array representation heap = [29, 24, 13, 22, 17, 11, 8, 19, 10, 5, 3] and the following transformation we performed is as follows:
The code used in this post is available on GitHub.
Why and When Heapifying Is a Good Idea
In summary, heapifying a tree is important since it allows us to benefit from the favorable properties of the heap data structure. Heap data structures can significantly speed up these algorithmic tasks. Heaps also have applications such as min/max searching, order statististics, and finding the shortest paths. An order statistic corresponds to the Kth smallest (or largest) value in a collection of items. This has applications in tasks such as quickly finding the median in an array. It is generally useful any time you need to repeatedly select largest or smallest values from a collection of items which is the case with priority queues and order statistics.