Sorting Notes
Author: Kelly Lin

A. Comparison Sorts

Below is a table with the best and worst case runtimes and stability of the following comparison sorting algorithms with respect to N, the length of the list being sorted.

Worst Case Best Case Stable? (Yes/No)
Selection sort ϴ(N2) ϴ(N2) No
Insertion sort ϴ(N2) ϴ(N) Yes
Merge sort ϴ(NlogN) ϴ(NlogN) Yes
Quicksort ϴ(N2) ϴ(NlogN) No
Heapsort ϴ(NlogN) ϴ(N) No

Selection Sort

Worst case: On the first iteration of the selection sort algorithm, selection looks through all N items to find the smallest one. On the next iteration, it looks through N-1 items to find the second-smallest item. The same logic applies for the rest of the sorting iterations. The runtime can therefore be determined by counting the total number of times an item was compared throughout the sorting process, calculated as follows: selection sort worst case

Therefore, the worst case runtime for selection sort is ϴ(N2).

Best case: The selection sort algorithm always goes through the same steps regardless of the state of the list/array being sorted (i.e. selection sort does not notice or care if the list is sorted/partially sorted). Because selection sort will perform the same number of comparisons on all lists, regardless of their state, the best case runtime is the same as the worst case runtime: ϴ(N2).

Stable? No. The swapping process in selection sort is not guaranteed to preserve the original ordering of equivalent items.

For example, consider sorting the list [2a, 2b, 1] where 2a and 2b both have the value of 2. After running selection sort, we get the sorted order [1, 2b, 2a]. The original ordering of elements 2a and 2b was not preserved, proving that selection sort is not a stable sort.

Insertion Sort

General case: The runtime of insertion sort is ϴ(N+K), where K is the total number of inversions (recall that an inversion is defined as any pair of elements that are not in the correct sorted order). We get the value N because we must look at all N items in the list in order to sort the list. We get the value K because we perform one pairwise swap for each inversion that we find in our array.

Worst case: We get a worst case scenario when we try to sort a list with the maximum possible number of inversions, i.e. fully reversed list. For each item at index i, we must shift its position to the left by i places. To get the worst case runtime we simply sum up the total number of swaps needed to sort items from index 0 through n-1. We can compute this as follows:

selection sort worst case

Note that we perform a swap each time we encounter an inversion in the array. Counting the number of swaps performed is therefore the same as counting the number of inversions in the array. Because we calculated that ϴ(N2) swaps are needed to sort the array, we can conclude that we have K ϵ ϴ(N2) inversions. By plugging in this result into our general equation for insertion sort runtime (ϴ(N+K)), we get a worst case runtime of ϴ(N+N2) ϵ ϴ(N2).

Best case: If the list is almost sorted and has only ϴ(N) inversions, then we only have to do ϴ(N) swaps over all the items, giving us a best case runtime of ϴ(N).

Stable? Yes. Insertion sort is a stable sort. During the selection sort process, we will only swap the ordering of any two items if the item on the right is less than the item to its left. Therefore, the ordering of two equivalent items will always be preserved in insertion sort.

Merge Sort

Worst case: At each level of our tree, we split the list into two halves, resulting in a tree with a height of logN. At each one of the logN merge levels, we merge a total of N items together. As a result of performing N comparisons at logN levels, we get a worst case runtime of ϴ(NlogN).

Another approach is to sum up the amount of work done at each level of the call tree. Each level has 2i nodes, and each node does N/(2i) amount of work from the linear-time merging operation. So that total amount of work is:

merge sort worst case

Best case: The merge sort algorithm does the same amount of work regardless of the initial ordering of the list being sorted. We still have to do all of the comparisons between items regardless of their initial ordering, so the best case runtime is the same as the worst case runtime: ϴ(NlogN).

Stable? Yes. During the merging process, we preserve the ordering of equivalent objects. If the first element of the left list being merged is equivalent to the first element of the right list, we always insert the element from the left list before the right list. Through this process we always preserve the inital relative ordering of equivalent items in the list, resulting in a stable sorting algorithm.

Quicksort

Worst case: In the quicksort algorithm, the work done at each iteration comes from moving the unsorted items to the correct side of the pivot. In the worst case we choose bad pivots, i.e. pivots that do not split the unsorted array into relatively even halves. In the worst case, we consistently choose partitions that are either the max or min of the unsorted items. When we choose bad partitions, our quicksort algorithm begins to look a bit like selection sort. For the first iteration, we move N items. For the second iteration, we shift N-1 items. For the third iteration, we shift N-2 items. For a list of length N we end up recursing N times. We sum up the total amount of work to get our worst case runtime as follows:

quicksort worst case

Therefore the worst case runtime for quicksort is ϴ(N2).

If we end up always choosing the smallest item or largest item for our next pivot, each partition of a sub-array of size m will leave us with only one non-empty partition of size m-1, since every other item will be larger than or smaller than our pivot. With a total of N elements, we end up recursing N times, each time doing a linear time partition, for a total of ϴ(N2) time.

Best case: If we can always partition our sub-arrays roughly in half, our recursive call tree is identical to that of merge sort, resulting in a best case runtime of ϴ(NlogN).

Stable? For in-place implementations of quicksort, no. Through the process of moving items across the partition, there is no guarantee that equivalent items will stay in the same relative positioning as the initial unsorted list. For not in-place implementations of quicksort it is easier to implement stable quicksort. There are existing implementations of in-place quicksort that are stable, but those require extra logic and are therefore a bit more annoying to implement.

Heapsort

General case: In-place heapification (bottom-up bubble down heapification) takes O(N) time (naive in-place heapification, or top-bottom bubble up heapification, takes ϴ(NlogN) time). After heapification, we simply remove from the max-heap. The runtime of heapsort is ϴ(N + (time to remove all items)). The time it takes to remove all N items from the heap depends on the values in the heap, which differ in the worst and best case.

Worst case: As mentioned above, the runtime of heapsort is ϴ (N + (time to remove all items)), where the ϴ(N) comes from the in-place heapification procedure. In the worst case, each removal from the max-heap results in the new root being bubbled down all the way down to the bottom level of the heap. As a result, each removal takes ϴ(logN) time from bubbling the new root from the top level to the logNth level. The removal of N items, each taking ϴ(logN) time, takes ϴ(NlogN) time. Therefore, the worst case runtime is ϴ(N + (time to remove all items)), or ϴ(N + NlogN) ϵ ϴ (NlogN).

Best case: In the rare event where all the items in the list are the same, removing the maximum valued item takes ϴ(1) time (we don't have to bubble the new root down whenever we remove an item). Performing N removals that each take ϴ(1) time gives us ϴ(N) time to remove all items from the heap. Therefore, the best case runtime for heapsort is ϴ(N+N) ϵ ϴ(N).

Stable? No. Through the insertion and removal process, with all the bubbling up and bubbling down there is no guarantee of stability.

B. Counting Sorts

Below is a table with the best and worst case runtimes and stability of the following counting sorting algorithms with respect to N, the length of the list being sorted. Assume we are sorting integers and L is the average number of digits in the integers being sorted. Let R be the size of the radix being used.

Worst Case Best Case Stable? (Yes/No)
Distribution Counting ϴ(N) ϴ(N) Yes
LSD radix sort ϴ(NL) ϴ(NL) Yes
MSD radix sort ϴ(NL) ϴ(NL) Yes

Distribution Counting

Worst case: For the distribution counting algorithm, we take one linear pass through the list to calculate the indices for each group. Then we take another linear pass through the list to populate the final sorted list based off the indices calculated in the previous state. Therefore, the worst case runtime for distribution counting is ϴ(N + N) ϵ ϴ(N).

Best case: Same as worst case. The runtime of distribution counting does not depend on the ordering of the list being sorted, yielding a best case runtime of ϴ(N).

Stable? Yes! When we're populating the final sorted array, we process and add items in the order in which we encounter them, from index 0 to length-1. Because we're only adding equivalent items in the order in which they initially appeared, distribution counting is a stable sorting algorithm.

LSD Radix Sort

Worst Case: There are N items, and each have approximately L digits. For each of these digits, we have to look through all N numbers and sort them by that digit. Since there are L digits and N integers, this gives us a runtime of ϴ(NL).

Best Case: For LSD, you still have to look through all N items L times regardless of the input, resulting in a best case runtime of ϴ(NL).

Stable? Recall that distribution counting is a stable sorting algorithm. Since LSD is just a result of running distribution counting across all the digits of the numbers being sorted, LSD radix sort is also a stable sorting algorithm. Note that LSD radix sort would not work if distribution counting was not a stable sorting algorithm.

MSD Radix Sort

Worst Case: Same explanation as the worst case runtime for LSD radix sort. In the worst case we must perform distribution counting across all the digits of the N digits being sorted, giving us a worst case runtime of ϴ(NL).

Best Case: MSD radix sort can short-circuit once each item is in its own bucket. If N ≤ R (R is the size of the radix), then it's possible to get each item in its own bucket after only 1 pass of distribution counting. This yields a runtime of ϴ(N). However, this is not a strictly correct asymptotic runtime. As N grows towards infinity and becomes greater than R, it's not possible to put every item in its own bucket after looking at only the most significant digit. We need to look at at least logRN digits to get all of our items in their own buckets and end MSD early in the best case. If logRN is greater than L, the length of our items, we can't end MSD early and will simply go through the entire L iterations of distribution counting, yielding a best case runtime of ϴ(NL).

Stable? Just like LSD sort, since MSD radix sort is just distribution counting applied across each digit of the numbers being sorted, MSD radix sort is also a stable sort.