Homework 7 | CS 61B Fall 2019

Homework 7: A Balanced Variety of all Sorts of Fun!

This homework is intended to give you a chance to better understand sorting and balanced search trees.

Implementing Sorting Algorithms
Quicksort and Mergesort Mechanics
Sorting Problems
Left-Leaning Red-Black Trees (LLRB)
Making a Balanced Tree
Submission

A. Implementing Sorting Algorithms

The skeleton, MySortingAlgorithms.java provides a template for implementing various sorting algorithms.

Implement the algorithms below (the ones with an asterisk next to it are required). Clicking on the links will take you to an animation of the sort to refresh your memory.

*Selection sort: Easier
*Insertion sort: Easier
*Mergesort: Moderate
Distribution sort: Moderate
Heapsort: Harder
Quicksort: Harder
*LSD radix sort: Harder
MSD radix sort: Harder

You can use MySortingAlgorithmsTest.java to test your implementations.

Once you're done, here are some questions to ponder (you don't need to record nor submit your responses):

How many items are sorted in the video for selection sort?
Why does insertion sort take longer/more compares than selection sort?
At what time stamp does the first partition complete for Quicksort?
Could the size of the input used by mergesort in the video be a power of 2?
What do the colors mean for heapsort?
How many characters are in the alphabet used for the LSD sort problem?
How many digits are in the keys used for the LSD sort problem?
The complete video ends with a rather odd sort that doesn't complete. Without looking at the captions, can you tell what it's doing?
How does the standard sort from the GCC library, a form of quicksort, differ from the other version of quicksort shown?

Additionally, the RunBenchmarks class runs timing tests on our various sorts. Currently, it is configured to perform two timing tests:

Compare (y)our implementation against Arrays.sort (which uses Quicksort for ints).
Time insertion sort for almost sorted arrays.

Run the code, and you should see results that are in line with things we've learned in class.

If the second test takes too long to complete, you can "Edit Configurations" in IntelliJ and type a number for program arguments to run the test on a smaller input size.

Other interesting tests you might try:

How do LSD and MSD compare with Quicksort and Mergesort? How large must N be before they become faster?
Find a case where LSD is faster than MSD.
For what N is insertion sort faster than Quicksort?
How do selection sort and insertion sort compare?

Feel free to post interesting tests and/or observations on Piazza. You don't need to submit anything related to this Benchmark test.

B. Quicksort and Mergesort Mechanics

Interestingly enough, quicksorting an array is equivalent to inserting all of its items into a BST. In this problem, we'll see why.

First, we need a specific implementation for partitioning.

Consider the array [5, 3, 2, 1, 7, 8, 4, 6], and suppose that we pick the leftmost item 5 as our pivot. One approach to partitioning is to perform a "stable" partitioning where all items that are less than 5 appear in the same order as they did before partitioning, and likewise for the greater items. For the given array, we'd get [3, 2, 1, 4, 5, 7, 8, 6].

One inefficient but simple way to implement this stable partitioning algorithm is to perform the following steps:

Create three empty Lists for storing integers smaller, equal to, and larger than the pivot, respectively.
Go through each item of the list, comparing it to the pivot, and adding items to the respective list based on the comparison.
Concatenting the three Lists into a single concatenated List.
Copying the concatenated List back into the array.

So for example if we partition [5, 3, 2, 1, 7, 8, 4, 6] from index 0 to index 7, we'd get:

Smaller items: [3, 2, 1, 4]
Equal items  : [5]
Larger items : [7, 8, 6]

The concatenation of these lists is just [3, 2, 1, 4, 5, 7, 8, 6]. Along the way, we compared the following pairs of numbers: 5-3, 5-2, 5-1, 5-4, 5-7, 5-8, and 5-6.

If we are using partition to sort, we then repeat this process for the left half and right half sides as discussed in class.

In sortingProblems.txt, fill out the list of comparisons used by Quicksort. You might find running Quicksort.java from the skeleton to be useful.

BSTs (1b, 1c, 1d in sortingProblems.txt)

Now draw the BST that results when you insert [5, 3, 2, 1, 7, 8, 4, 6] (in that order) into an initially empty BST. Record the comparions you observe in sortingProblems.txt. Answer questions 1c and 1d in sortingProblems.txt.

Mergesort (1e in sortingProblems.txt)

Finally, for 1e, give an example of a comparison performed by mergesort that is not performed by Quicksort or BST. This isn't particularly interesting, but we're just asking to make sure you understand how Mergesort works.

C. Sorting Problems

There are many problems for which sorting provides a fast solution, even though the problem isn't really about sorting. We encourage you to do all of these, but you only need to do any one of them for full credit.

Intervals (Intervals.java)

Define an interval to be a pair of numbers $[x_i, x_i']$ such that $x_i < x_i'$. An interval specifies a range of values in 1D space, where we can call $x_i$ the start point, and $x_i'$ the end point.

Given a list of such intervals, we want to know the total length of the regions covered by one or more of the intervals. This is not simply the sum of their lengths, $\Sigma x_i' - x_i$, since several may cover the same span.

For example, if we have intervals $[19, 30]$, $[6, 12]$, $[4, 5]$, $[8, 15]$, and $[3, 10]$, then the total length covered is 23: the last four intervals together totally cover the interval $[3, 15]$ of length 12, and the first covers a disjoint interval of length 11.

Fill in Intervals.java so that the coveredLength method returns the correct total length in $\Theta(N log N)$ time.

There is a clever trick to this problem that you'll need to figure out to get $\Theta(N log N)$ time. You do not need to use any nested loops for this problem, and in fact if you find yourself using them, you probably haven't found the right approach.

Distribution Count for Large Numbers (SortInts.java)

[Goodrich & Tamassia] Given a sequence of n distinct integers, each one of which is in the range $[0, n^2 - 1]$, develop an $O(n)$ algorithm for sorting them. See the skeleton file SortInts.java. You can't use ordinary distribution sort for this, because that would require initializing and traversing arrays of size $n^2$, which would take too long

Inversion Counting (Inversions.java)

Find an algorithm that runs in $O(n \lg n)$ time for computing the number of inversions in a list of n items. Array elements that are "out of order" can be corrected by swapping two adjacent elements at a time, each of which counts as a single inversion. See the skeleton file Inversions.java. To test that your code is actually $O(n \lg n)$, provide it a very large list (say hundreds of thousands).

Two Sum (Sum.java)

[Goodrich & Tamassia] Given two sequences of integers, $A$ and $B$, find an algorithm that runs in $O(n \lg n)$ time (where n is the total number of integers in A and B) that determines, for a given parameter $m$, whether there is an integer $a$ in $A$ and an integer $b$ in $B$ such that $m = a + b$. See the skeleton file Sum.java. To test that your algorithm runs in $O(n \lg n)$ time, provide sequences of hundreds of thousands of integers. Feel free to use any of the methods in java.util.Arrays.

D. Left-Leaning Red-Black Trees (LLRB)

In discussion, we examined 2-4 trees their relationship to Red-Black trees. If you need a reference or refresher, take a look at this page. Here, we will examine 2-3 trees and their corresponding Left-Leaning Red-Black trees.

2-3 trees are B-trees, just like 2-4 trees. However, they can have up to 2 elements and 3 children, whereas 2-4 trees could have one more of each. Naturally, we can come up with a Red-Black tree counterpart. However, since we can only have nodes with 1 or 2 elements, we can either have a single black node (for a one-element 2-3 node) or a "section" with a black node and a red node (for a two-element 2-3 node). As seen in the previous section, we can either let the red node be a left child or a right child. However, we choose to always let the red node be the left child of the black node. This leads to the name, "Left-Leaning" Red-Black tree.

The advantages of the LLRB tree over the usual Red-Black tree is the ease of implementation. Since there are less special cases for each "section" that represents a 2-3 node, the implementation is much simpler.

Normal binary search tree insertions and deletions can break the Red-Black tree invariants, so we need additional operations that can "restore" the Red-Black tree properties. In LLRB trees, there are two key operations that we use to restore the properties: rotations and color flips.

Rotations

Consider the following tree:

   parent
     |
     7
   /   \
  1     c
 / \
a   b

a is a subtree with all elements less than 1, b is a subtree with elements between 1 and 7, and c is a subtree with all elements greater than 7. Now, let's take a look at another tree:

parent
  |
  1
 / \
a   7
   / \
  b   c

There are few key things to notice:

The root of the tree has changed from 7 to 1.
a, b, and c are still correctly placed. That is, their items do not violate the binary search tree invariants.
The height of the tree can change by 1.

Here, we call this transition from the first tree to the second a "right rotation on 7".

Now, convince yourself that a "left rotation on 1" on the second tree will give us the first tree. The two operations are symmetric, and both maintain the binary search tree property!

Discussion: Rotation by Hand

We are given an extremely unbalanced binary search tree:

Write down a series of rotations (i.e. rotate right on 2) that will make tree balanced and have height of 2. HINT: Two rotations are sufficient.

Exercise: Rotation Implementation

Now we have seen that we can rotate the tree to balance it without violating the binary search tree invariants. Now, we will implement it ourselves! In RedBlackTree.java, implement rotateRight and rotateLeft. For your implementation, make the new root have the color of the old root, and color the old root red. Why should we have the colors change here and what might happen if we did not change the colors?

Hint: The two operations are symmetric. Should the code significantly differ?

Color Flip

Now we consider the color flip operation that is essential to LLRB tree implementation. Given a node, this operation simply flips the color of itself, and the left and right children. However simple it may look now, we will examine its consequences later on.

For now, take a look at the implementation provided in RedBlackTree.java.

Insertion

Finally, we are ready to put the puzzles together and see how insertion works on LLRB trees!

Say we are inserting x.

If the tree is empty, let x be the root with black color.
Otherwise do the normal binary search tree insertion, and color x red.
Restore LLRB properties.

Restoring LLRB Properties after Insertion.

First, let's assume that our new node x is the only child of a black node. That is:

     parent (black)
    /
   x (red)

   parent (black)
        \
         x (red)

Since we decided to make our tree left leaning, we know that the first tree is the valid form and we will not have to do anything else. If we end up with the second tree (x > parent) we can simply apply rotateLeft on parent to get the first tree.

Now, let's consider the case when our new node x becomes a child to a black node which already has a left child, or a child to a red node. LLRB have a one-to-one mapping to 2-3 trees. This is like inserting x into a 2-3 tree node that already has 3 children!

Here, we have to deal with 3 different cases, and we will label them case A, B, C.

Case A: x ends up as the right child of the black node.

         |
         5(black)
     /      \
    1(red)   x(red)

For case A, the resulting section is the same as a 2-3 tree node with one extra element:

      |
   1  5  x

To fix it, we "split" the 2-3 node into two halves, "pushing" up the middle element to its parent:

        |
     5  (sibling)
   |  |         |
   1  x     (nephews)

Analogously, for our LLRB section, we can apply flipColor on 5. This results in:

      |
      5 (red)
  /      \
1(black)  x(black)

This exactly models the 2-3 node we desired. 5 is now a red node, which means that it is now part of the "parent 2-3 node section". Now, if 5 as a new red node becomes a problem, we can recursively deal with it as we are dealing with x now. Also, the root of the whole tree should always be black, and it is perfectly fine for the root to have two black children. It is simply a root 2-3 node with single element and two children, each with single element.

Case B: x ends up as the left child of the red node.

     |
     5 (black)
    /
   1 (red)
  /
 x (red)

In this case, we can apply rotateRight on 5, which will result in:

        |
        1 (black)
   /        \
  x(red)     5 (red)

This should look familiar, since it is exactly case A that we just examined before! After a rotation, our problem reduces to solving case A. Convince yourself that rotation performed here correctly handles the color changes and maintains the binary search tree properties.

Case C: x ends up as the right child of the red node.

          |
          5 (black)
       /
     1 (red)
      \
       x (red)

In this case, we can apply rotateLeft on 1, which will result in:

         |
         5 (black)
        /
       x (red)
      /
     1 (red)

This also should look familiar, since it is exactly case B that we just examined. We just need one more rotation and color flip to restore LLRB properties.

Exercise:

Now, we will implement insert in RedBlackTree.java. We have provided you with most of the logic structure, so all you need to do is deal with normal binary search tree insertion and handle case A, B, and C.

Discussion.

We have seen that even though the LLRB tree guarantees that the tree will be almost balanced, LLRB insert operation requires many rotations and color flips. Examine the procedure for the insertion and convince yourself that the insert operation still takes O(log(n)) as in balanced binary search trees.

Hint: How long is the path from root to the new leaf? For each node along the path, are additional operations limited to some constant number? What does that mean?

Deletion

Deletion deals with many more corner cases and is generally more difficult to implement. For time's sake, deletion is left out for this assignment.

E. Making a Balanced Tree

Exercise: Build a Balanced Tree from a Linked List

In BST.java complete the linkedListToTree method, which should build a balanced binary search tree out of an already sorted LinkedList. Also, provide a good comment for the method linkedListToTree. In the BST constructor, we pass in list.iterator() to linkedListToTree. What do we know about repeated calls to iter.next() given that the LinkedList is already sorted?

HINT: Recursion!

Question: If it's a BST, why can the items just be of Object type in order to do the problem? Why not Comparable?

Answer: Because you shouldn't need to ever do any comparisons yourself. Just trust that the order in the LinkedList is correct.

Self-test: `linkedListToTree` Speed

Give the runtime of linkedListToTree, where N is the length of the linked list. The runtime is in (stroke over the areas after ||| to see if an answer is corect).

O(N) ||| Correct! We can only call iter.next() N times, and since we call it at most once per recursive call, this must be the runtime of the algorithm.
O(N^2) ||| Incorrect. In each recursive call we call iter.next() exactly once (or return null). How many of these recursive calls can there be?
O(log N) ||| Incorrect. Even though we make a recursive call where we divide the problem in half, we make two of these recursive calls, so we don't reduce to log N time.
(N*log N) ||| Incorrect. In each recursive call we call iter.next() exactly once (or return null). How many of these recursive calls can there be?

F. Submission

You will be required to submit:

MySortingAlgorithms.java with Selection Sort, Insertion Sort, Mergesort, and LSD Radix Sort implemented.
sortingProblems.txt.
RedBlackTree.java.
At least one of Intervals.java, SortInts.java, Inversions.java, and Sum.java. (We urge you to do all of them, however.)
BST.java.

Navigation

Navigation

A. Implementing Sorting Algorithms

B. Quicksort and Mergesort Mechanics

BSTs (1b, 1c, 1d in sortingProblems.txt)

Mergesort (1e in sortingProblems.txt)

C. Sorting Problems

Intervals (Intervals.java)

Distribution Count for Large Numbers (SortInts.java)

Inversion Counting (Inversions.java)

Two Sum (Sum.java)

D. Left-Leaning Red-Black Trees (LLRB)

Rotations

Discussion: Rotation by Hand

Exercise: Rotation Implementation

Color Flip

Insertion

Restoring LLRB Properties after Insertion.

Exercise:

Discussion.

Deletion

E. Making a Balanced Tree

Exercise: Build a Balanced Tree from a Linked List

Self-test: `linkedListToTree` Speed

F. Submission

Navigation

Navigation

A. Implementing Sorting Algorithms

B. Quicksort and Mergesort Mechanics

BSTs (1b, 1c, 1d in sortingProblems.txt)

Mergesort (1e in sortingProblems.txt)

C. Sorting Problems

Intervals (Intervals.java)

Distribution Count for Large Numbers (SortInts.java)

Inversion Counting (Inversions.java)

Two Sum (Sum.java)

D. Left-Leaning Red-Black Trees (LLRB)

Rotations

Discussion: Rotation by Hand

Exercise: Rotation Implementation

Color Flip

Insertion

Restoring LLRB Properties after Insertion.

Exercise:

Discussion.

Deletion

E. Making a Balanced Tree

Exercise: Build a Balanced Tree from a Linked List

Self-test: linkedListToTree Speed

F. Submission

Self-test: `linkedListToTree` Speed