Merge Sorting

Merge sorts are O(N lg N) algorithms for sorting a list of N items using comparisons. There are many variations, all of which follow a divide-and-conquer scheme: divide the data into subsequences, recursively sort the subsequences, and finally merge the sorted subsequences into increasingly large sorted sequences.

To merge two or more sorted sequences, one repeatedly removes the smallest item from the input sequences, which because they are sorted, is always at the front of one of them. One can actually merge any number of sequences in this fashion; for large numbers of sequences, one might use some variety of priority queue (or heap) to keep track of which sequence currently has the smallest item. At any given time, one need only look at the initial member of each sequence.

Because it deals with data sequentially and requires only enough memory to buffer its input and output—a finite amount—merge sorting is ideally suited to external sorting, in which the amount of data to be sorted exceeds the capacity of primary memory by arbitrarily large amounts. The size of the sorted data is limited only by the size of one's secondary memory. In addition, because of its divide-and-conquer character, merge sorting is also eminently parallelizable, as suggested in the variation shown here.

The classic recursize mergesort algorithm looks like this (where merge(A, B, C) merges sorted arrays B and C, putting the result in A):

    To sort elements L through U of A
    mergesort(A, L, U):
        if L < U:
            M = ⌊(L+U)/2⌋
            mergesort(A, L, M)
            mergesort(A, M+1, U)
            merge(A[L..U], A[L..M], A[M+1..U])
The base case here is a single element (which is already sorted). The algorithm merges pairs of elements into 2-element sequences, pairs of 2-element sequences into 4-element sequences, etc. When U-L+1 is not a power of two, there will be merges of differing-length sequences at some point. The algorithm below uses a different sequence of merges, sorting non-overlapping 2-element subsequences, then non-overlapping 4-element subsequences, etc., pairing up sequences from the left. The 2-element sequences are sorted in place, and subsequent merges use a temporary array to hold the result.

Fast Slow
Input size:
Initial array: Array: # comparisons: