CompSci-61B Lab 7a
Hash Tables With Linear Probing

Instructor: Prof. Robert Burns

Hash tables are implementations of the key-value dictionary interface whose algorithms for "add", "remove", "getValue", and "contains" are all O(1). This is accomplished by (almost) eliminating iteration to locate an entry. Instead, each entry is assigned a unique location in an array, based on value of its key -- kind of like personal reserved spaces in a car parking lot. The procedure for converting a key value into an array index is called "hashing".

The practical problem of this approach is that there has to be lots of empty space in the array to accommodate the possible range of key values. But there are techniques such as "probing" and "chaining" that help to minimize this problem.

In this assignment you will use the compsci61b.MyDictionary interface to create 2 new classes: compsci61b.testing.MyHashTable and compsci61b.MyHashTableLP. The first is a simple hash table that does not account for "collisions" and is therefore not very useful -- but it allows us to be introduced to the concept of hashing without the complication of collision handling at the same time. The second is a full implementation of hashing, using linear probing to manage collisions. You will also do big oh analyses to determine how full (percentage-wise) a hash table has to be before its approximate O(1) behavior deteriorates appreciably.

Also in this assignment, you will be introduced to the protected keyword in Java, which is an alternative for private. The difference is that otherwise inaccessible class members become directly accessible to any sub-class. For example, is we had used this in MyArrayList, we could have accessed nValues directly in MySortedArrayList instead of using the "size" accessor.

One more Java construct that is introduced for the first time in our lab assignments is a constructor that has a parameter list.

This assignment is worth 40 points, and consists of 4 exercises. Zero points will be awarded for programs with misspelled filenames, or incorrect case. Zero points will be awarded for programs that do not compile or run on the department's UNIX server. Partial points will be awarded for programs that are run but do not conform to all specifications, at the discretion of the grader. Be sure to complete both exercises and post the lab7a.jar before midnight of the due date, because there is a 5-point-per-day lateness penalty for late work. Check the course outline for due dates for lab assignments.

EXERCISE 1: A "Perfect" Hash Table (10 points)
Write MyHashTable.java in package compsci61b.testing, using compsci61b.MyArrayDictionary as a model, with an array for data storage and an int to track size. Extend nothing and implement compsci61b.MyDictionary<K, V> and java.io.Serializable.

Make the otherwise private members protected, except for the ones associated with iterators. There is no need for sub-classes to access the iterators directly, so don't bother to make them protected. This will have no effect on this class, but it will be meaningful in the following exercise.

Make the inner class (Entry) and all of its members public, so that this inner class can be accessed directly by classes that extend its owner class, even though they might be in a diffent package. If you were to make these protected instead, they would be accessible only by extending classes in the same package. This is not a problem with the data, accessor, and mutator methods, so they can be protected.

Rename private static final int INITIAL_CAPACITY to something that refers to default size instead, because there will be no array expansion in this implementation. (But there will be in the next exercise’s implementation.) The array elements are to be references to "Entry" objects, encapsulating key/value pairs. You may have constructors, accessors, and mutators in your Entry class, or not -- it's your choice.

Note that the array size never changes, so it is possible for the data structure to become full. It will also be possible for there to be "holes" in the array. A hole is represented as a null reference to an Entry object -- the array values are all initially null.

Follow these specs for implementing the methods:

Include a default constructor and another that has one int parameter, used to size the data storage array. If the array size is to be set to a non-prime number, determine the next higher prime number, and use that instead. Use the this(...) construct in the writing of this pair of constructors.
The add mutator should calculate the hash code of the key to get the "desired index" for the entry, and then modulus that with the array size, to determine the "wrapped index" at which to store the value. Do not accept null keys or values -- return false for that (although it would be okay for any data fields of a non-null record key or value to be null). Do not allow duplicate keys -- if there is already an entry at the wrapped index, and its key matches, replace the value for the existing matching key’s entry with the value to be added. But if the keys do not match, return false -- that’s a "collision", and we do not allow collisions in our "perfect" hash table. (Note that the interface has no replace mutator -- adding is the mechanism for this.)
The remove mutator should calculate the hash code of the key, and modulus that with the array size, to determine the index at which to store the value. If there is a value there, save it so that you can return it, and set the location in the array to null. Otherwise return null.
The contains accessor should calculate the hash code of the key, and modulus that with the array size, to determine the index at which to store the value. If there is a value there, return true. Otherwise return false.
The iterator accessor should work just like the iterator in compsci61b.MyArrayList, except that it should skip over "holes".
The keyIterator accessor should be the same as iterator, but it should return the "key" instead of the "value".
The isEmpty mutator should return true if the number of entries is zero.
The isFull accessor should return true if there are any values stored at all. The purpose of this method is to determine if the next "add" is guaranteed not to fail on account of there not being a space available to store the value. (It may still fail if it has a duplicate key, and the implementation does not allow duplicate keys). In this implementation, if any index has a value in it, there is no guarantee that the next "add" will not try to use that index.
The size accessor should return the number of used (non-null) index locations.
The clear mutator should set all locations in the array to null, or recreate the array. The array size should remain the same before and after recreation -- that is, if the programmer specified the array size upon construction, use that size instead of the default initial size.
The toString accessor return a textual representation of the data, skipping over any "holes".

In the main test method, create a "compsci61b.MyDictionary" reference to a "compsci61b.testing.MyHashTable" object, and exercise all of its methods. If you use integer keys and string values, you can create some sample student records with numeric IDs as keys and names as values. Or you can use string keys and integer values in a phonebook-style application. Makeup a few records so store and retrieve, etc -- enough to convince yourself that the class works correctly. But note that collisions will not be handled, and that an attempt to add a value after the 1st value may fail. In your testing, predict collisions and see if you class behaves properly -- do not just add values and use you class to tell you if a collision happened. You test your code -- don't let it test you!

For public interface validation, use the same test.java as in lab 2a's exercise 2, but use these statements to create and validate the data structure object:

compsci61b.MyDictionary<Integer, String> a = new compsci61b.testing.MyHashTable<Integer, String>(100);
...
System.out.println("Should be false: " + (a instanceof compsci61b.MyList));
System.out.println("Should be true: " + (a instanceof compsci61b.MyDictionary));
System.out.println("Should be false: " + (a instanceof compsci61b.MyArrayBasedDataStructure));
System.out.println("Should be true: " + (a instanceof compsci61b.testing.MyHashTable));
System.out.println("Should be true: " + (a instanceof java.io.Serializable));

Compile, jar as lab7a.jar, and run, but do not submit -- the program will be further developed in the following exercises.

EXERCISE 2: A Hash Table With Linear Probing (10 points)
Write MyHashTableLP.java in package compsci61b, extending compsci61b.testing.MyHashTable. Change the methods to implement "linear probing". Here are the specs:

A hole is represented either as a null or as a reference to a "blank" object -- an Entry object whose key and value are their default values (e.g., both null). In this way, a distinction can be made between holes that are original (nulls) and holes that were once filled (references to the "blank" object). Include a private non-static constant to store the the "blank" object. Let array locations reference the blank when their entry is removed, instead of resetting the location to null.
Include a private static constant to store the maximum allowable load factor. This is to be determined by the designer of the class (i.e., you) and is not adjustable or even visible to the programmer using the class. When adding, check the #of stored values, and if it equals the size times maximum allowable load factor, expand the array by doubling it. Be sure to "rehash" by redistributing the already-stored values into the newly expanded array, because their wrapped indexes will all change! Determination of the load factor should be O(1), so that adding maintains its O(1) average behavior.
As in the previous exercise, include a default constructor and another that has one int parameter, used to size the data storage array. If the array size is to be set to a non-prime number, determine the next higher prime number, and use that instead. Use the this(...) and super(...) constructs in the writing of this pair of constructors.
Override the methods that calculate and use a wrapped index based on a modulus of the hash code of the key, in order to deal with collisions. In the event of a "collision", where the key at the wrapped index does not match what you expect the key to be, search the rest of the array in a circular fashion, starting at the wrapped index. Ideally, if the load factor is small enough, the next adjacent array location (wrapped index + 1, or zero if the wrapped index + 1 equals the array size) will be selected. But in any case, always stop searching the array when a null is reached, because nulls occupy spaces that have never been used, so there is no reason to continue linear probing once a null is reached.
The add mutator should add at the first null or blank location that you find at or after the desired location. Duplicate values are not allowed, when the key to be added matches a key already in the hash table, replace its value and call it success. Failure cannot happen due to fullness, because the array size is to be doubled and rehashed when the maximum allowable load factor is exceeded.
In the remove and contains mutators, if the key at the calculated index does not match, search the rest of the array for the key in a circular fashion, starting at the expected index. Fail if the key is not found before traversing to the first null. When removing a value, replace the location in the array with a "blank" instead of a null, in order to prevent linear probing from stopping at the null.
The isFull accessor should return false, always.
The clearmutator should reset the array to the default initial size, modified to the next higher prime number if necessary. If the programmer specified the initial size upon construction, there is no need to remember or use that value.

For public interface validation, use the same test.java as in lab 2a's exercise 2, but use these statements to create and validate the data structure object:

compsci61b.MyDictionary<Integer, String> a = new compsci61b.MyHashTableLP<Integer, String>(100);
...
System.out.println("Should be false: " + (a instanceof compsci61b.MyList));
System.out.println("Should be true: " + (a instanceof compsci61b.MyDictionary));
System.out.println("Should be false: " + (a instanceof compsci61b.MyArrayBasedDataStructure));
System.out.println("Should be true: " + (a instanceof compsci61b.testing.MyHashTable));
System.out.println("Should be true: " + (a instanceof java.io.Serializable));

Use the same "main" testing that you applied to the hash table from the previous exercise. Compile, jar as lab7a.jar, and run, but do not submit -- the program will be further developed in the following exercises.

EXERCISE 3: Solve The Game Of Life (10 points)
Write compsci61b.testing.GameOfLife.java, applying your compsci61b.MyHashTableLP class to implement a simulation of the classic "Game Of Life". Click here for the solution to the Game Of Life, using a hash table. It's incomplete, so you'll have some work to do here -- just follow the pseudocode and the instruction comments. Click here for an online simulation of the Game Of Life.

Here are some additional rules:

There are two places where new hash table objects are created. If you specify a size for the hash tables, do not type the number as a literal integer value in both places. Instead use a global constant.
You may use any character that you like to denote a live cell, instead of 'x', if you wish.
You may choose any method for console input that you like, including BufferedReader and Scanner. The Burns textbook contains sample code for entering a list of integer values all on one line with the BufferedReader. Java SE's Scanner class lets you do this automatically.
You may write the Cell class separate from the public class (that is, above it) and not as a static inner class, if you prefer.

To test your Game Of Life simulation, try these input sequences:

1 1 2 2 3 3 3 4 3 5 -1 -1

1 1 2 2 3 3 4 4 3 4 4 3 5 2 1 4 -1 -1

(If you require the user to enter one value per typed line of input, tell him so!)

Also test your own patterns, comparing to the online simulation referred to above, to make sure your simulation gives the same results.

Compile, jar as lab7a.jar, and run, but do not submit -- the program will be further developed in the following exercise.

EXERCISE 4: Timing Studies For Linear Probing (10 points)
Write compsci61b.testing.LinearProbingTimingTests.java, similar to lab 4's compsci61b.testing.TimingTests.java. Prove that getValue is approximately O(1). You may fill the hash table with randomly-generated Double-Double pairs, because it is easiest to do that. Prompt the user to enter the number of entries for the first iteration (where norm = 1.00), as you have done in past timing tests. It will require repetitions that start and stop the timer and acculumate the elapsed time, because the operation is so fast. In each repetition, generate an different Double key to search for -- it will usually not be found, which is the worst case.

Perform timing tests on each to confirm your determination of the big oh. Use the spec from lab 4's exercises 5-8, with nicely-aligned columns and 2 decimal digits in the last column's values. There may be a slight rising trend in the normalized factor, due to the imperfectness of the big oh for probing.

unsuccessful getValue from a compsci61b.MyHashTableLP, approx O(1)
    n     big oh   elapsed time in millis   elapsed time / big oh  norm
-----  ---------   ----------------------   ---------------------  ----
10000          ?                        ?                       ?  1.00
20000          ?                        ?                       ?     ?
40000          ?                        ?                       ?     ?
80000          ?                        ?                       ?     ?

Compile and run. Create your final version of lab7a.jar, with files from this and all previous labs. Post your lab7a.jar file to your student UNIX account for grading and credit.

[ Home | Contact Prof. Burns ]

CompSci-61B Lab 7aHash Tables With Linear Probing

CompSci-61B Lab 7a
Hash Tables With Linear Probing