Lab 9: Project 2, Hash Table Experiments

Submit by Friday, Oct 31 at 11:59 PM (spooooky)

Table of Contents

As usual, there will be a MWotD that should be submitted as "hello.txt". If you don't want to attend lab, submit your work before Wednesday at 2 PM (as usual you may continue resubmissions until Friday at 11:59 PM).

Pre-lab

Make sure you understand the basic ideas behind hash tables. See lecture 25 and/or Chapter 7 of DSIJ.

Introduction to Project 2

For this part of the project, you do not need to hw init proj2, though you're welcome to, if you wish.

Log in to the instructional machines and run the staff version of jump61 (the game that you'll be building for proj2) with the following command:

staff-jump61

A prompt should appear. Type help, and you'll see a bunch of commands. Don't read this yet, let's just try playing a game. Enter the following two commands at the prompts shown.

> start
red> dump

When we enter start at the provided prompt, a game begins pitting you (red) against the computer (blue). The red> prompt on the next line is asking you to input your move for the game. By entering dump, you're asking to see the game board printed out. You should see something like the following:

===
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
===

What this all means: This is a 6x6 board, and currently all of the squares contain one dot and are uncolored (dashes indicate uncolored squares). We'll come back to what colors and dots mean later. Let's try entering a move. At the red> prompt, enter:

red> 1 4

This will place a red dot at position 1 4, and the blue player should respond with an automatic move. Type "dump" again, and you'll see something like:

===
    1- 1- 1- 2r 1- 1-
    1- 1- 2b 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
    1- 1- 1- 1- 1- 1-
===

You'll see that both red and blue control exactly one square, each with two dots. I don't expect that this makes much sense, but don't worry, I'll send you to the spec to learn what this all means in two paragraphs.

Let's restart on a smaller board and set up the game so that it prints out the game board every time you enter a move. Use the following commands:

red> size 3
> verbose
> start

The size command ends the game and creates a newly initialized 3x3 grid, and the verbose command tells the game to show the game state every time you make a move.

Now using the project 2 specification as a guide, try playing through an entire game to get a feeling for how it works. You should find yourself trounced by an uncaring and evil AI.

Writing Tests for Project 2

In project 1, we used two different testing techniques, each of which were good for different purposes: Unit testing (through JUnit) and system testing (using .in and .out files with the tester.py program). The latter were necessary in Project 1 because it would have been quite tricky to write unit tests to do things like testing that the output of some sequence of commands to CommandIntepreter yields the right output, particularly given the need to ignore prompts, whitespace, and ordering of rows.

In Project 2, we'd face similar difficulties if we tried to unit test everything, and thus, as with project 1, we're providing a program to help test your code. This time, we've provided a Python script, test-jump61. Typing 'test-jump61' (with no arguments) will print documentation of this testing program. Normally, one supplies one or more input files. There are two input file formats, which we'll call the 'competitive' and 'basic' forms, shown below on the left and right, respectively:

   # Any number of # comment lines     # Any number of # comment lines  
   COMMAND1                            COMMAND1                         
   COMMAND2                            None                         
   ===#1===                            ===#1===                         
   INPUT1                              INPUT1                           
   ===#2===                            ===#2===                         
   INPUT2                              

The competitive form (on the left) will be used for having two instances of jump61 play against each other, which will allow you to test competing AIs. For verifying program correctness we'll only use the basic form (on the right). At the risk of getting too vague, the exact specification for the files is as follows:

For an example, open proj2tests/test00-1.in. This is an input file of the basic form. The command should mostly look familiar, except for:

Note that input files contain not just lists of commands, but also expected output. This is in contrast to our project 1 testers, where expected output was the exclusive domain of our .out files.

There is one exception: For testing the results of "dump" commands, you'll need to use .out files. For example, open proj2tests/test00-1.out. This tells us the expected output for the corresponding .in file. In this case, test00-1.out gives the output of all dump commands (in this case, just one) for test00-1.in.

More generally, for each input file FOO.in, test-jump61 will take the outputs from the two programs, filter out everything except board dumps (delimited by === and ===), and compare those outputs with the corresponding FOO.out (if it exists).

Writing your own tests

Using everything discussed above, write four tests:
  1. A test (test1.in and test1.out) that the 'set' command works. This test should use 'dump' to check that a certain setup is as expected.
  2. A test (test2.in, test2.out) that several moves of a fully manual game work properly, i.e. test2.in should contain many moves for both red and blue. Again, use 'dump' to check the result.
  3. A test (test3.in) that the staff program can play a small game with both sides automated and come up with a winner. You'll need to look up the appropriate % code in either the ReadMe (which you'll get when you run hw init proj2) or by running test-jump61 with no arguments.
  4. Optional to complete in lab, but highly recommended: A test (test4.in) that one instance of the staff program can play a small game against another instance. The first instance of the staff program will have an automated red player and a manual blue player, and the second will have a manual red play and an automated blue player. The first program will send red moves to the second, and the second will send blue moves to the first. For this test, you'll need to create a test in the competitive form.
Warning: If you include these tests in your proj2 submission (and you should), be sure to substitute your own program execution for staff-jump61! It would be a shame to get tricked into thinking your program is working when it is not.

When submitting these tests for lab, make sure they're in the proj2tests folder created by hw init lab9.

Hash Table Experiments

As discussed in class, the Java library allows any kind of reference object to be stored in a hash table. To make this work, it makes use of the following two methods defined in java.lang.Object:

  public native int hashCode();  
     // "Native" means "implemented in some other language".

  public boolean equals(Object obj) {
      return (this == obj);
  }

Since all Objects have at least these default implementations, all objects may be stored in and retrieved from hash tables.

When defining a new type, you may freely override both methods. However, they are related---when you override one, you should generally override the other, or uses of hash tables will break, as we'll see. You might want to find the documentation of the Java library class java.lang.Object and look up the comments on these two methods.

Throughout this section of this lab, we'll see how Java's built-in hash table implementations respond when storing certain pathological types of objects. In HW7, you'll have a chance to build your own hash table implementation.

Effect of Hash Function on HashTable Performance

The file HashTesting.java contains various routines and classes for testing and timing hash tables.

In this subsection of the lab, we'll consider how hash tables perform when storing objects of type String1. Each String1 object simply contains a String which it returns via the toString() method. Similary, the .equals method is essentially the same as for Strings, returning true if the two String1 objects contain the same string.

Things only get interesting when we move on to the hashCode() method, which is an adaptation of that used in the real String class. Specifically, the constructor for String1 has a parameter that gives you the ability to "tweak" the algorithm by choosing to have the hashCode method look only at some of the characters. Take a look at class String1, and especially at its hashCode method.

The test

  java HashTesting test1 words 1

will read a list of about 230000 words from words (a small dictionary taken from a recent Mac OS X version), store them in a hash table (the Java library class HashSet), and then check that each is in the set. It times these last two steps and reports the time. If you're using your own machine, you can get a words file from this link.

The argument 1 in the test above causes it to use the same hashing function for the strings as Java normally does for java.lang.String.

By contrast, if you run

  java HashTesting test1 words 2

the hash function looks only at every other character, which will make the hash function take less time to compute. For example, if you give it "porcupine" it will use only p, r, u, i, and e to compute the hashCode, halving the time needed to complete execution of the hashCode() method.

Try this command with various values of the second parameter. Explain why the timings change as they do in lab9.txt, question #1.

Effect of Data on HashTable Performance

The command

  java HashTesting test2 N

for an integer N, will time the storage and retrieval of N2 four- letter "words" in a hash table. These words will take on all N2 combinations xxyy, where the character codes for x and y vary from 1 to N, i.e. the first word will be 1111, the next will be 1122, etc. where 1 means the (Character) 1, and likewise for 2.

For example if x=103 and y=441, then the word will be ggƺƺ, since 103 corresponds to 'g' and 441 is 'ƺ'. If you try printing out the words, be warned that x/y less than around 32 will not print anything to the screen (these are reserved for special purposes like spaces, tabs, newlines, beeps, etc.)

Don't run test 2 yet. First, consider test 3:

  java HashTesting test3 N

This does the same thing, but uses a differently generated set of "words". In this case, the words have the form xXyY, where x and y vary as before, and (noting carefully the capitalization):

    X = 216 - 31x - 1,      Y = 216 - 31y - 1

Run these two commands with various values of N (start at 20).

Explain in as much detail as you can the reasons for the relative timing behavior of these tests (that is, why test3 takes longer than test2), putting your answer in lab9.txt, question #2.

Repairing a Faulty Class

The class FoldedString is another wrapper class for Strings (a la String1 from above) whose .equals and .compareTo methods ignore case, e.g., they treat "foo", "Foo", and "FOO" as equivalent. The command

  java HashTesting test4 the quick BROWN fox

will create FoldingStrings for each of the trailing command-line arguments ("the", "quick", etc.), and insert them into a HashSet and (for comparison) a TreeSet, which uses a balanced binary search tree to store its data. The program will then check that it can find the all-upper-case version of each of the command-line arguments in both of the sets. For the example above, it will check that "THE", "QUICK", "BROWN", and "FOX" are all part of both the TreeSet and HashSet. As long as FoldedStrings are correctly implemented, all four should be present.

Try running test4 with the quick brown fox example from above. The HashSet fails to find upper-case versions of the arguments, whereas the TreeSet seems to work just fine. Explain why in lab9.txt, question #3.

The problem is in the FoldedString class. Change it so that java HashTesting test4... works. Try to do so in a reasonable way: make sure that large HashSets full of FoldedStrings will continue to complete operations in a timely manner. You may need to change more than one method.

Optional: List Experiment

The type java.util.LinkedList allows one to create ListIterators on a list, which provide the .previous() as well the .next() methods. The provision of .previous() suggests that LinkedList is, in fact, a doubly linked list (indeed, the documentation says so.)

Fill in the program ListTesting.java to provide a demonstration that LinkedList is (or is not) indeed doubly linked. That is, your program should perform some kind of measurement that evidences double linking. You might find useful bits of code in HashTesting. Explain your demonstration in lab9.txt.