Homework 7: Hashing and Ranges

Due: Monday, 2 November at 1300

A. ECHashStringSet

For this problem, you'll expand on your work from HW6 by creating a hashing-based implementation of the StringSet interface.

To implement the interface, you'll be working completely from scratch, with no skeleton code. This one will be a fair bit more difficult than the BSTStringSet from HW6.

Create a class ECHashStringSet that implements the StringSet interface using an external chaining hash table as its core data structure. In order to ensure that put and contains operations are fast, you should resize the hash table as soon as the load factor exceeds 5. For memory efficiency, you should ensure that the load factor is never less than 0.2, except for empty lists, for which the load factor is allowed to be zero. You do not need to worry about downsizing the set since we don't have a remove operation.

There is one annoying issue you'll enounter: If the hashCode() is negative, we need to remove the top bit (since negative array indices are not allowed in Java). You can do this using the bit operations we discussed earlier in class, e.g. s.hashCode() & 0x7fffffff) % bucketCount.

There's no restriction on what you are allowed to use, but you should avoid using any library class or method that has "hash" in the name, since that would defeat the whole point of this problem.

For an extra challenge, don't use anything that requires using a Java library class.

As references, you might find the following resources useful:

Your correctness tests from hw6 should work almost without modification (you'll just need to change the type of object that is instantiated). Make sure to test something that inserts a large number of strings. One testing approach is to generate random strings, insert them into both your ECHashStringSet and a TreeSet<String>, then iterate through the entire TreeSet and ensure that all of its members are also contains in your ECHashStringSet.

One solution from a previous semester (which does use an import for handling the list of items that go in each bucket) is 66 lines including comments and whitespace.

B. ECHashStringSet Timing

Run the provided timing test InsertRandomSpeedTest. You'll need a copy of your BSTStringSet from HW6, or you can use the skeleton code (a copy of our solution). Fill out hw7timing.txt as directed.

Repeat this for InsertInOrderSpeedTest.

If you're having timing issues, make sure your hash table is actually resizing.

C. BSTStringSet Bounded Iterator

Starting from either your own BSTStringSet implementation from HW6 or the solution provided in the skeleton here, create a version that implements SortedStringSet, providing an iterator that returns elements within a specified range of values in ascending order. The iterator should produce its M values in time O(M + h), where h is the height of the tree, as described in lecture.

The tricky part of this assignment is producing an inorder tree iterator. You'll definitely find a stack useful for this purpose.