THIS ASSIGNMENT IS OUTDATED, AND SHOULD NOT BE USED AFTER SPRING 2007. THE NEW VERSION IS PRESENT IN ~/sp07/assignments/a6. I spent plenty of time working on that assignment, so please use the new version!

Assigned Tuesday, March 13th
Due Tuesday, April 10th, by 11:59pm - submit electronically.

In this assignment, you will implement a system that learns right-regular grammars using a simplified version of the model merging algorithm, in Java. This involves several steps; make sure to read through the entire assignment before you begin coding.

--> THE ASSIGNMENT: [PDF] <-- (right... open up the PDF file...)

The starter code: [.tar.gz] [.zip]

New: example output: a6-example-output.txt(alpha=1) and /a6-example-output-alpha5.txt(alpha=5)

If you have a lot of trouble with them, post to the newsgroup. We think the classes and methods are necessary, but perhaps not sufficient for the assignment, so let us know.

Please check back to this page for possible updates to the assignment.
As noted in the assignment, you should show your algorithm working on some sample data.
Here is an example of how the algorithm should run
Here is help based on last year's reactions, including links to javadoc and a description of the tree data structure you will want to use.
You should feel free to make up your own data set, or use the provided sample training data.
As usual, if you encounter problems in getting your algorithm to work, you should try to identify where the problems arise. If you are having problems debugging something, it's better to turn in a run with debugging print statements that expose/describe the problem than to turn in something that either can't run or runs without working.

What to submit
You should submit your assignment using the submit program available on the EECS instructional system. Instructions for doing this can be found here. Be sure to include all files mentioned in the assignment (Java files, answers.txt, test.txt).

Readings
Reader: O2 (Bailey et al.)

Some links that may be of interest:

Wikipedia's information on right-regular grammars and linear grammars (that I edited), which will have more context if you also read Wikipedia's article on formal grammars.
info on finite-state technology
A longer technical paper on model merging, Inducing Probabilistic Grammars by Bayesian Model Merging (Stolcke and Omohundro, 1994). This paper describes model merging for hidden Markov models and stochastic (probabilistic) context-free grammars. Note that both of these applications differ from the deterministic finite-state grammars to be learned in this assignment, but it may be of interest anyway.

Note again: If you cheat, we will catch you. Don't cheat!