Sorting Arrays using Heaps — Heap Sort
Heap Sort is one of the popular and easy to understand sorting techniques for arrays in computer science. This technique was first invented by J.W.J. Williams in 1964. It is a very well-known comparison based sorting technique where a certain element in picked and compared with several other elements and arranged in a special order so as to form a heap, learning this sorting technique requires the knowledge of mainly two data structures — arrays because we are eventually interested in sorting arrays and trees because heap are typically formed similar to complete binary trees.
The general working of Heap Sort can be described by assuming an array as a heap, we try to find the maximum or the minimum element from the array and place it at the end position this step is recursively done to finally get a completely sorted array at the end. This algorithm typically places elements of an array into a sorted and unsorted region after each iteration our goal is to find maximum element or minimum element from the unsorted region and place it in the sorted region. It is not particularly a stable sort algorithm which means if we have two equal elements initially in the array then their order is not necessarily maintained while sorting rather it’s an in-place algorithm which means it does not use any auxiliary data structure to transform input.
What is a Binary Heap?
Binary heap is a special type of data structure which is very similar to a complete binary tree except for the last level. Binary heaps are maintained in special order but first let’s focus on their structure, it starts with a typical root node where the whole tree is supposed to originates from, each node has exactly two nodes here except for the last level. Root node is always at level 0 then its child nodes are on level 1 and so on. Excluding the last level, we can find total nodes at each level by the formula 2^n where “n” is value of level. At the last level, the previous level nodes can have either 0,1 or 2 nodes.
Max and Min Heaps
A binary heap is mainly divided into two types — max heap and min heap.
Now a max heap is the heap in which all the nodes are greater than their children nodes, so in a typical max heap the root node is always the greatest element in the whole tree, and its children are always smaller than it and so on. Now when we form a max heap, we particularly take largest element from the root and place it at the last position of array after which it is removed from heap then we again form the max heap take the largest element and place it at 2nd last position in the array and so on, in this manner we finally get an array sorted in increasing order.
Min heap is exactly same as the max heap, but the only difference is that here all the nodes are smaller than their children nodes, so in a typical min heap the root node is the smallest element in the whole tree, and its children are always greater than it and so on. Using min heap technique on an array we finally get a sorted array in decreasing order.
Array Index and Heap Element
Now, when we sort an array using heaps then we first need to represent an array in the form of a heap and to do that we have a really interesting relation between array index and nodes of the heap. For that the first point to be noted is that each array index would have a node in the heap representing it. Now lets see how array index form the heap. So, if the index of array element is “i” then its left child is at index “2*i+1”, its right child is at index “2*i+2” and and its parent is at index “(i-1)/2”. To form the heap we start from array index 0 which is initially the root node in the heap and the rest of the heap is formed by finding the children nodes using the above given formulas.
How to Build a Heap?
Till now we have seen how heap works, general sorting process and all, now lets see how we actually build heaps. For this lets consider forming a max-heap. By running a function called heapify on all the non-leaf element of a binary tree we form max-heap. Now heapify is a recursion technique where we first find the max element among a parent and its children and then if the largest element is a child node then we swap it with parent node and again call the heapify function on the swapped parent node, this is done so as to maintain the max-heap property on the sub-trees because a swapping can disrupt the max-heap for sub-trees. Let’s take a quick look on the pseudo code of heapify function:
heapify(array, i)
largest = array[i]
largest = max(array[i],array[2*i+1] ,array[2*i+2])
if(largest != array[i])
swap(array[i],largest)
heapify(array, largest)
Building Max-Heap and Implementation
To build a max-heap, first we need to heapify the whole tree, this process starts from the last non-leaf node which is the last node in the second last level of tree and then we move towards other non-leaf nodes that are before it, this process takes place in a bottom-up manner. This step is practically done as:
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);
Let’s take a sample array of size 5 and we will implement the above steps on it.
The last non leaf node is 4 with two children node 10 and 1, here 10 is largest among parent 4 and its children 10 and 1, so we need to swap 10 and 4. Now next non-leaf node for this case is 5 which is the root node, and here 10 is largest among parent 5 and its children 10 and 3, so we swap 10 and 5. Now the largest element of whole tree is at the root node and the tree is heapified.
Now, starts the heap sorting where we swap the root of tree with the last leaf node and then remove it from the tree. This step is practically done as:
for (int i = n - 1; i > 0; i--)
{
swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
After swapping root 10 with last leaf node 1, we removed 10 from tree and now 1 is root. Now, again heapifying from last non-leaf node 5, we can see that 5 is already greater than its child 4 so no change, now heapifying the root node we can see 5 is largest among parent 1 and its children 5 and 3, so we swap them, this affects the below sub-tree as child of 1 is greater than it so we swap them as well now. And the tree is again heapified.
After swapping root 5 with last leaf node 1, we removed 5 from tree and now 1 is root. Now, again heapifying from last non-leaf node 1 that is root, we can see that 4 is largest among parent 1 and its children 4 and 3, so we swap them. And the tree is now heapified. Swap root 4 with last leaf node 3, and remove 4 from the tree so 3 is now the root.
Here 3 was already greater than its child 1, so swap 3 and 1 and remove 3 from tree. Now we are only left with root 1 and it can’t compare itself with anything so remove it as well and we get the final sorted array.
Source Code for Heap Sort
// Source code of Heap Sort in C++
void heapify(int arr[], int n, int i)
{
int largest = i;
int l = 2 * i + 1;
int r = 2 * i + 2; if (l < n && arr[l] > arr[largest])
largest = l;
if (r < n && arr[r] > arr[largest])
largest = r;
if (largest != i)
{
swap(arr[i], arr[largest]);
heapify(arr, n, largest);
}
}void heapSort(int arr[], int n)
{
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);
for (int i = n - 1; i > 0; i--)
{
swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
}
Complexity
Heapify function has a time complexity of O(Logn). Overall heap sort has a time complexity of O(nLogn) for all cases. Space complexity for heap sort is O(1).
Thanks for sticking around till the end. Having a good understanding of different algorithm is very important in computer science and it was really fun to study this sorting technique called Heap Sort.