Lab 9: Starting Project 2 and a Little Hashing

A. Starting Project 2

If you have not already, start your project in the usual way:

git fetch shared
git checkout master
git merge -m "Start project 2" shared/proj2
git push -u origin master

or, if you prefer separate branches:

git fetch shared
git checkout -b proj2 shared/proj2
git push -u origin proj2

If you have already started your project, on the other hand, be sure you have the most recent version of the skeleton (there have been updates since the first release):

git fetch shared
git checkout master
git merge -m "Include modifications since release" shared/proj2
git push master

or

git fetch shared
git checkout proj2
git merge -m "Include modifications since release" shared/proj2
git push master

if you created a proj2 branch.

For those of you using your own machines, also make sure that cs61b-software is up to date, since it, also, has some recent changes.

Playing the Game

It would be a good idea to read the project spec first to learn the rules of Lines of Action.

Log in to the instructional machines and run the staff version of Lines of Action (the game that you'll be building for proj2) with the following command:

staff-loa

A prompt should appear. Type help, and you'll see a bunch of commands. Don't read this yet, let's just try playing a game. Enter the following two commands at the prompts shown.

-> manual white
-> start
b> dump

The manual command causes the second (white) player to take moves from the terminal, like the black player (by default). When we enter start at the provided prompt, a game begins, allowing the entry of moves. The b> prompt on the next line is asking you to input a move for Black. The dump command prints out the board in a standard format. You should see something like the following:

===
    - b b b b b b - 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    - b b b b b b - 
Next move: black
===
b>

The squares marked - are empty; those marked b are black pieces, and those marked w are white pieces. Try entering a move:

b> d1-d3
w> dump

The result is

===
    - b b b b b b - 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    w - - - - - - w 
    w - - b - - - w 
    w - - - - - - w 
    - b b - b b b - 
Next move: white
===
w>

Now it's time for a white move. However, first it would be helpful to be reminded of the row and column denotations. The staff version provides a somewhat more helpful way to print the board (which you are not required to implement):

w> b

(or board), which prints

8 - b b b b b b - 
7 w - - - - - - w 
6 w - - - - - - w 
5 w - - - - - - w 
4 w - - - - - - w 
3 w - - b - - - w 
2 w - - - - - - w 
1 - b b - b b b - 
  a b c d e f g h   white to move

Now after typing

w> a3-d3
b> b

we see

8 - b b b b b b - 
7 w - - - - - - w 
6 w - - - - - - w 
5 w - - - - - - w 
4 w - - - - - - w 
3 - - - w - - - w 
2 w - - - - - - w 
1 - b b - b b b - 
  a b c d e f g h   black to move

We can now turn on the AI to play white against you (which is the default):

b> auto white
-> start
b> g8-e6

The machine will respond, possibly with

W::a5-c3

indicating that an automated White player moves a5-c3:

b> b
8 - b b b b b - - 
7 w - - - - - - w 
6 w - - - b - - w 
5 - - - - - - - w 
4 w - - - - - - w 
3 - - w w - - - w 
2 w - - - - - - w 
1 - b b - b b b - 
  a b c d e f g h   black to move
b>

Feel free to carry on from here. Or, you can try playing the machine against itself:

b> auto black
-> start

and see what happens. Try the help command to see what commands are available (you are not required to implement all of them).

B. Writing Tests for Project 2

In project 1, we used two different testing techniques, each of which was good for a different purpose: Unit testing (through JUnit) and integration testing (using .in and .out files with the tester.py program). The latter were necessary in Project 1 because it would have been quite tricky to write unit tests to do things like testing that the output of some sequence of commands to CommandIntepreter yields the right output, particularly given the need to ignore prompts, whitespace, and ordering of rows.

In Project 2, we'd face similar difficulties if we tried to unit test everything, and thus, as with project 1, we're providing a program to help test your code. This time, we've provided a Python script, test-loa. Typing 'test-loa' (with no arguments) will print documentation of this testing program. Normally, one supplies one or more input files. There are two input file formats, which we'll call the 'competitive' and 'basic' forms, shown below on the left and right, respectively:

# Any number of # comment lines     # Any number of # comment lines  
COMMAND1                            COMMAND1                         
COMMAND2                            None                         
===#1===                            ===#1===                         
INPUT1                              INPUT1                           
===#2===                            ===#2===                         
INPUT2                              

The competitive form (on the left) will be used for having two instances of loa play against each other, which will allow you to test competing AIs. At the risk of getting too vague, the exact specification for the files is as follows:

For an example, open simplemoves.in in your proj2/testing directory. This is an input file of the basic form. The first line of the input causes both players to take manual input. This particular test merely makes some moves and dumps the board. The file simplemoves.out contains the expected output from all dump commands, as well as any "White wins." or "Black wins." messages (there are none for this input.) The test-loa program will compare output from your program against this file.

Now take a look at proj2/testing/ai2.in. This time, we set up two machine players and play them against each other. The line %bw is not sent to the program. It tells test-loa to expect the program to output a sequence of moves, starting with a black one, and continuing until someone wins (that is, until there is a "…wins." message).

More generally, for each input file FOO.in, test-loa will take the outputs from the two programs, filter out everything except board dumps (delimited by === and ===) and "wins" messages, and compare those outputs with the corresponding FOO.out (if it exists).

Writing your own tests

Using everything discussed above, write four tests:

When submitting these tests for this lab, make sure they're in the proj2tests folder created by checking out lab9.

C. Hash Table Experiments

As discussed in class, the Java library allows any kind of reference object to be stored in a hash table. To make this work, it makes use of the following two methods defined in java.lang.Object:

public native int hashCode();  
// "Native" means "implemented in some other language".

public boolean equals(Object obj) {
    return (this == obj);
}

Since all Objects have at least these default implementations, all objects may be stored in and retrieved from hash tables.

When defining a new type, you may freely override both methods. However, they are related—when you override one, you should generally override the other, or uses of hash tables will break, as we'll see. You might want to find the documentation of the Java library class java.lang.Object and look up the comments on these two methods.

Throughout this section of this lab, we'll see how Java's built-in hash table implementations respond when storing certain pathological types of objects. In HW7, you'll have a chance to build your own hash table implementation.

Effect of Hash Function on HashTable Performance

The file HashTesting.java contains various routines and classes for testing and timing hash tables.

In this subsection of the lab, we'll consider how hash tables perform when storing objects of type String1. Each String1 object simply contains a String, which it returns via the toString() method. Similarly, the .equals method is essentially the same as for Strings, returning true if the two String1 objects contain the same string.

Things only get interesting when we move on to the hashCode() method, which is an adaptation of that used in the real String class. Specifically, the constructor for String1 has a parameter that gives you the ability to "tweak" the algorithm by choosing to have the hashCode method look only at some of the characters. Take a look at class String1, and especially at its hashCode method.

The test

java HashTesting test1 ~cs61b/lib/words 1

will read a list of about 230000 words from words (a small dictionary taken from a recent Mac OS X version), store them in a hash table (the Java library class HashSet, and then check that each is in the set. It times these last two steps and reports the time.

The argument 1 in the test above causes it to use the same hashing function for the strings as Java normally does for java.lang.String.

By contrast, if you run

java HashTesting test1 words 2

the hash function looks only at every other character, which will make the hash function take less time to compute. For example, if you give it "porcupine" it will use only p, r, u, i, and e to compute the hashCode, halving the time needed to complete execution of the hashCode() method.

Try this command with various values of the second parameter. Explain why the timings change as they do in lab9.txt, question #1.

Effect of Data on HashTable Performance

The command

java HashTesting test2 N

for an integer N, will time the storage and retrieval of N2 four-letter "words" in a hash table. These words will take on all N2 combinations xxyy, where the character codes for x and y vary from 1 to N. For example, when x=103 and y=52, then the word will be "gg44", since 103 corresponds to 'g' and 52 is '4'. Many of the words will not be printable. Values of x or y less than 33, for example, will not print anything visible to the screen (these are reserved for special purposes like blanks, tabs, newlines, beeps, etc.)

Don't run test 2 yet. First, consider test 3:

java HashTesting test3 N

This does the same thing, but uses a differently generated set of "words". In this case, the words have the form xXyY, where x and y vary as before, and where (noting carefully the capitalization):

Run these two commands with various values of N (start at 20).

Explain in as much detail as you can the reasons for the relative timing behavior of these tests (that is, why test3 takes longer than test2), putting your answer in lab9.txt, question #2.

D. Repairing a Faulty Class

The class FoldedString is another wrapper class for Strings (a la String1 from above) whose .equals and .compareTo methods ignore case, e.g., they treat "foo", "Foo", and "FOO" as equivalent. The command

java HashTesting test4 the quick BROWN fox

will create FoldingStrings for each of the trailing command-line arguments ("the", "quick", etc.), and insert them into a HashSet and (for comparison) a TreeSet, which uses a balanced binary search tree to store its data. The program will then check that it can find the all-upper-case version of each of the command-line arguments in both of the sets. For the example above, it will check that "THE", "QUICK", "BROWN", and "FOX" are all part of both the TreeSet and HashSet. As long as FoldedStrings are correctly implemented, all four should be present.

Try running test4 with the quick brown fox example from above. The HashSet fails to find upper-case versions of the arguments, whereas the TreeSet seems to work just fine. Explain why in lab9.txt, question #3.

The problem is in the FoldedString class. Change it so that java HashTesting test4... works. Try to do so in a reasonable way: make sure that large HashSets full of FoldedStrings will continue to complete operations in a timely manner. You may need to change more than one method.