CompSci-61B Lab 7b
Hash Tables With Chaining

Instructor: Prof. Robert Burns

Chaining is a technique in hash tables that "stacks" the entries in the locations that match their "wrapped index". This can makes the load factor a bit less important because entries that collide do not have to compete for the same spaces as entries with different wrapped indexes. The data storage area essentially expands to accommodate each newly added entry by adding it to a chain attached to the wrapped index location in the array. So the array becomes an array of heads of linked lists of key-value pair entries, instead of an array of key-value pair entries.

The concept of load factor still exists, but it needs to be interpreted differently -- it's possible for the load factor (#of stored entries / array length) to exceed one! Array doubling is also less important because it's not a matter of running out of array space -- each chain is itself unlimited in size. But the burden of searching long chains affects the O(1) ideal for hash tables, so array doubling (and rehashing) still has a place in chaining.

In this assignment you will use the compsci61b.MyDictionary interface to create one new class: compsci61b.MyHashTableC. This is a full implementation of hashing, using chaining to manage collisions. You will also do big oh analyses to determine how load factor affects big oh.

Another Java construct is introduced for the first time in our lab assignments: setting constant member data values in constructors. If a data member is declared with the final keyword (but not static), you would normally assign its value upon declaration. Any attempt to reassign its value would result in a compiler error. But Java allows such constants to be left unassigned upon their declations, as long as they are assigned a value in the constructor. All constructors must assign values to all oterwise unassigned constants, or compilation will fail.

This assignment is worth 40 points, and consists of 4 exercises. Zero points will be awarded for programs with misspelled filenames, or incorrect case. Zero points will be awarded for programs that do not compile or run on the department's UNIX server. Partial points will be awarded for programs that are run but do not conform to all specifications, at the discretion of the grader. Be sure to complete both exercises and post the lab7b.jar before midnight of the due date, because there is a 5-point-per-day lateness penalty for late work. Check the course outline for due dates for lab assignments.



EXERCISE 1: A Hash Table With Chaining, Without Array Expansion (10 points)
Write MyHashTableC.java in package compsci61b, using compsci61b.testing.MyHashTable and compsci61b.MyHashTableLP as models. Extend nothing and implement compsci61b.MyDictionary<K, V> and java.io.Serializable. Note that this class will be further developed in exercise 2 below.

Leave private members private, or use protected as you wish. For purposes of future lab assignments, this class will not be extended, so you do not need to allow protected access, but you may wish to for your own purposes or for the practice of it.

Use an array of linked lists for data storage. Use either your own compsci61b.MyLinkedList or compsci61b.MySortedLinkedList, or Java's java.util.LinkedList for each element of the array. If you have a good reason to use any other "collection" class instead, you may do so -- just make sure that the big-oh-ness of operations is not adversely affected. For convenience, use a separate integer to track the number of stored entries in the hash table.

Assign an empty linked list to each location in the array. This is to prevent the programming complication of managing the constant attaching and removing of linked lists as entries are added and removed, without an appreciable overhead penalty. Note that there should be no holes in the linked lists themselves, so a "blank" object reference is not needed as it was with probing.

Follow these specs for implementing the methods:

In the main test method, create a "compsci61b.MyDictionary" reference to a "compsci61b.MyHashTableC" object, and exercise all of its methods. If you use integer keys and string values, you can create some sample student records with numeric IDs as keys and names as values. Or you can use string keys and integer values in a phonebook-style application. Makeup a few records so store and retrieve, etc -- enough to convince yourself that the class works correctly. Compile and run, but do not jar or submit -- the program will be further developed in the following exercises.



EXERCISE 2: A Hash Table With Chaining, With Array Expansion (10 points)
Modify MyHashTableC.java in package compsci61b that you wrote in exercise 1. Change the methods to track load factor, and to manage it with array expansion and rehashing. Here are the specs:

For public interface validation, use the same test.java as in lab 2a's exercise 2, but use these statements to create and validate the data structure object:

compsci61b.MyDictionary<Integer, String> a = new compsci61b.MyHashTableC<Integer, String>(100);
...
System.out.println("Should be false: " + (a instanceof compsci61b.MyList));
System.out.println("Should be true: " + (a instanceof compsci61b.MyDictionary));
System.out.println("Should be false: " + (a instanceof compsci61b.MyArrayBasedDataStructure));
System.out.println("Should be false: " + (a instanceof compsci61b.testing.MyHashTable));
System.out.println("Should be true: " + (a instanceof java.io.Serializable));
Use the same "main" testing that you applied to the hash table from the previous exercise. Compile, jar as lab7b.jar, and run, but do not submit -- the program will be further developed in the following exercises.

EXERCISE 3: Investigating Load Factor vs. Performance (10 points)
Write compsci61b.testing.LoadFactorTests.java, similar to lab 4's exercise 1. The results of this exercises would normally be submitted on a written or typed piece of paper. But for our purposes, it's done as the output of a Java program instead.

Express your results as follows:

Time to do 1,000,000 repetitions of MyHashTableC.add
   n   load factor   elapsed time in millis
----  ------------   ----------------------
1000           0.5                        ?
1000           1.0                        ?
1000           1.5                        ?
1000           2.0                        ?

2000           0.5                        ?
2000           1.0                        ?
2000           1.5                        ?
2000           2.0                        ?

4000           0.5                        ?
4000           1.0                        ?
4000           1.5                        ?
4000           2.0                        ?

I observe this about the effect of load factor:...
Use as many repetitions as you need, and start with any value of n that you want. Just be sure to double n twice as shown in the above example.

"Load factor" is not the maximum allowable -- it's the actual load factor. Use the getLoadFactor accessor to determine the load factor of your test runs. Try to avoid array doubling -- start with array sizes large enough that doubling does not affect the results. In the repetitions, generate new keys randomly for each new add, and remove the same key to reset for the next rep. Use enough reps to get meaningful elapsed millis. If your load factors do not match exactly the 0.5, etc., in the table, extrapolate them. Determine your results, and type them into your output statements so that your program prints your report.

Jar in lab7b.jar, but do not submit -- the program will be further developed in the following exercise.



EXERCISE 4: Timing Studies For Chaining (10 points)
Write compsci61b.testing.ChainingTimingTests.java, similar to lab 7a exercise 4's compsci61b.testing.LinearProbingTimingTests.java. Prove that getValue is approximately O(1). You may fill the hash table with randomly-generated Double-Double pairs, because it is easiest to do that. Prompt the user to enter the number of entries for the first iteration (where norm = 1.00), as you have done in past timing tests. It will require repetitions that start and stop the timer and acculumate the elapsed time, because the operation is so fast. In each repetition, generate an different Double key to search for -- it will usually not be found, which is the worst case.

Perform timing tests on each to confirm your determination of the big oh. Use the spec from lab 4's exercises 5-8, with nicely-aligned columns and 2 decimal digits in the last column's values. There may be a slight rising trend in the normalized factor, due to the imperfectness of the big oh for probing.

unsuccessful getValue from a compsci61b.MyHashTableC, approx O(1)
    n     big oh   elapsed time in millis   elapsed time / big oh  norm
-----  ---------   ----------------------   ---------------------  ----
10000          ?                        ?                       ?  1.00
20000          ?                        ?                       ?     ?
40000          ?                        ?                       ?     ?
80000          ?                        ?                       ?     ?
Compile and run. Create your final version of lab7b.jar, with files from this and all previous labs. Post your lab7b.jar file to your student UNIX account for grading and credit.

[ Home | Contact Prof. Burns ]