Homework 8: Midterm Preparation: Sorting, et. al.

Due: Wednesday, 11 November 2015 at 1:00PM

Navigation

This homework is intended to give you a chance to better understand sorting and searching in preparation for the test. Feel free to use as much as you find useful. Just submit as usual.

A. HW6 and HW7 Solutions

Compare your answers to HW6 and HW7 to the reference solutions. Look for places where your code is overly complex relative to the reference.

Direct links for convenience: HW6, HW7.

B. Sorting Algorithms: Big Ideas

Here are some animations of selected sorting algorithms from the YouTube video I showed in lecture:

Questions to ponder:

C. Implementing Sorting Algorithms

In the HW8 files, MySortingAlgorithms.java provides a skeleton for implementing the various sorting algorithms from 61B. SortingAlgorithms.java provides solutions.

If you'd like, try implementing the sorting algorithms. If you're optimizing your study time, I'd recommend against doing any of the moderate (or harder) sorts. While you'll certainly gain a very deep understanding of these algorithms by implementing them, we don't expect you to understand various second-order issues (like auxiliary arrays, resizing for heapsort, bit hacking for LSD and MSD sort, etc.) for 61B.

You can use MySortingAlgorithmsTest to test your implementations.

Alternately, simply copy and paste from SortingAlgorithms.java to MySortingAlgorithms.java. Be warned that the code for SortingAlgorithms is a bit messy.

D. Timing Sorting Algorithms

The RunBenchmarks class runs timing tests on our various sorts. Currently, it is configured to perform two timing tests:

Run the code, and you should see results that are in line with things we've learned in class.

Other interesting tests you might try:

Feel free to post interesting tests and/or observations on Piazza.

E. Quicksort and Mergesort Mechanics (sortingProblems.txt)

Interestingly enough, quicksorting an array is equivalent to inserting all of its items into a BST. In this problem, we'll see why.

First, we need a specific implementation for partitioning.

Consider the array [5, 3, 2, 1, 7, 8, 4, 6], and suppose that we pick the leftmost item 5 as our pivot. One approach to partitioning is to perform a "stable" partitioning where all items that are less than 5 appear in the same order as they did before partitioning, and likewise for the greater items. For the given array, we'd get [3, 2, 1, 4, 5, 7, 8, 6].

One inefficient but simple way to implement this stable partitioning algorithm is to perform the following steps:

So for example if we partition [5, 3, 2, 1, 7, 8, 4, 6] from index 0 to index 7, we'd get:

Smaller items: [3, 2, 1, 4]
Equal items  : [5]
Larger items : [7, 8, 6]

The concatenation of these lists is just [3, 2, 1, 4, 5, 7, 8, 6]. Along the way, we compared the following pairs of numbers: 5-3, 5-2, 5-1, 5-4, 5-7, 5-8, and 5-6.

If we are using partition to sort, we then repeat this process for the left half and right half sides as discussed in class.

In sortingProblems.txt, fill out the list of comparisons used by Quicksort. You might find running Quicksort.java from the skeleton to be useful.

BSTs (1b, 1c, 1d in sortingProblems.txt)

Now draw the BST that results when you insert [5, 3, 2, 1, 7, 8, 4, 6] (in that order) into an initially empty BST. Record the comparions you observe in sortingProblems.txt. Answer questions 1c and 1d in sortingProblems.txt.

Mergesort (1e in sortingProblems.txt)

Finally, for 1e, give an example of a comparison performed by mergesort that is not performed by Quicksort or BST. This isn't particularly interesting, but we're just asking to make sure you understand how Mergesort works.

F. Sorting as a Tool

[UPDATED. The original problem has a trivial \((\Theta(N)\)) solution. This is, perhaps, a bit more interesting.]

There are a large number of problems for which sorting provides a fast solution, even though the problem isn't really about sorting.

Define an interval to be a pair of numbers \(([x_i, x_i']\)) such that \((x_i < x_i'\)). An interval specifies a range of values in 1D space, where we can call \((x_i\)) the start point, and \((x_i'\)) the end point.

Given a list of such intervals, we want to know the total length of the regions covered by one or more of the intervals. THis is not simply the sum of their lengths, \((\Sigma x_i' - x_i\)), since several may cover the same span.

For example, if we have intervals \(([19, 30]\)), \(([6, 12]\)), \(([4, 5]\)),
\(([8, 15]\)), and \(([3, 10]\)), then the total length covered is 23: the last four intervals together totally cover the interval \(([3, 15]\)) of length 12, and the first covers a disjoint interval of length 11.

Fill in Intervals.java so that the coveredLength method returns the correct total length in \((\Theta(N log N)\)) time.

There is a clever trick to this problem that you'll need to figure out to get \((\Theta(N log N)\)) time. You do not need to use any nested loops for this problem, and in fact if you find yourself using them, you probably haven't found the right approach.

G. Various Problems

For those of you who want to get a little more practice with algorithm design, consider the following problems.

Distribution Count for Large Numbers (SortInts.java)

[Goodrich & Tamassia] Given a sequence of n distinct integers, each one of which is in the range \(([0, n^2 - 1]\)), develop an \((O(n)\)) algorithm for sorting them. See the skeleton file SortInts.java. You can't use ordinary distribution sort for this, because that would require initializing and traversing arrays of size \((n^2\)), which would take too long

Inversion Counting (Inversions.java)

Find an algorithm that runs in \((O(n \lg n)\)) time for computing the number of inversions in a list of n items. See the skeleton file Inversions.java. To test that your code is actually \((O(n \lg n)\)), provide it a very large list (say hundreds of thousands).

Two Sum (Sum.java)

[Goodrich&Tamassia] Given two sequences of integers, \((A\)) and \((B\)), find an algorithm that runs in \((O(n \lg n)\)) time (where n is the total number of integers in A and B) that determines, for a given parameter \((m\)), whether there is an integer \((a\)) in \((A\)) and an integer \((b\)) in \((B\)) such that \((m = a + b\)). See the skeleton file Sum.java. To test that your algorithm runs in \((O(n \lg n)\)) time, provide sequences of hundreds of thousands of integers. Feel free to use any of the methods in java.util.Arrays.

H. Balanced Search Trees

Put answers to the following problems in the file treeProblems.txt.

  1. Give a formula for the exact height of a binary search tree (as you implemented in homework 6) as a function of N if we insert N values in increasing order.
  2. Give a formula for the exact height of a 2-4 tree in terms of N if we insert N values in increasing order.
  3. Test your understanding of the isometry between 2-4 trees and red- black trees by completing this exercise.