----------------------------------------- MODEL MERGING ASSIGNMENT HINTS/DISCUSSION ----------------------------------------- Here is some useful information that is pulled from the 2006 newsgroup. Using the newsgroup for questions benefits everyone. This document contains information on: a link for the java api for those of you that need help with java, a reminder to make the alpha value available in your AffixTree/AffixTreeNode and more description on how the AffixTreeNode works. -------------- JAVA API LINKS -------------- Just in case you don't know about this, you can find information on the built-in Java classes here: http://java.sun.com/j2se/1.5.0/docs/api/ So if you don't know all the methods on HashMap, for instance, you can go to the api and find how to use the hashtable and how to get all the keys and values out of it. -------------------------- MORE INFO ON AFFIXTREENODE -------------------------- Consider the following rules (numbered for convenience) 1. Q -> a b c 2. Q -> a b d 3. Q -> a b c Z 4. Q -> b g 5. Z -> b g 6. Z -> b h One could make a prefix tree from these rules that looked like this: root-node | | Q-> Z-> / \ \ Q->a Q->b Z->b / | | \ Q->ab Q->bg Z->bg Z->bh / \ Q->abc Q->abd / Q->abcZ The notation in this picture gives the nodes an ID made with the LHS, an arrow, and the prefix so far. For some of the nodes in the tree, I will show what the values of its fields would be if it was represented by the AffixTreeNode class: root-node depth = 0 parent = null affixWord = "" rules = (1,2,3,4,5,6) terminatingRules = () children "Q-" -> Q- "Z-" -> Z- Q- depth = 1 parent = root-node affixWord = "Q" rules = (1,2,3,4) terminatingRules = () children "a" -> Q-a "b" -> Q-b Z- depth = 1 parent = root-node affixWord = "Z" rules = (5,6) terminatingRules = () children "b" -> Z-b Q-a depth = 2 parent = Q- affixWord = "a" rules = (1,2,3) terminatingRules = () children "b" -> Q-ab Q-b depth = 2 parent = Q- affixWord = "b" rules = (4) terminatingRules = () children "g" -> Q-bg Q-bg depth = 3 parent = Q-b affixWord = "g" rules = (4) terminatingRules = (4) children //none Q-ab depth = 4 parent = Q-a affixWord = "b" rules = (1,2,3) terminatingRules = () children "c" -> Q-abc "d" -> Q-abd Q-abc depth = 5 parent = Q-ab affixWord = "c" rules = (1,3) terminatingRules = (1) children "Z" -> Q-abcZ Q-abd depth = 4 parent = Q-ab affixWord = "d" rules = (2) terminatingRules = (2) children //none Z-b depth = 2 parent = Z- affixWord = "b" rules = (5,6) children "g" -> Z-bg "h" -> Z-bh ... and so on. 1. In these examples, rules was a list of IDs, but in the starter code, it is a vector of Rule instances. 2. HashMaps map keys (in this case a string) to a value (in this case an AffixTreeNode). 3. The HashMap notation I adopted for the children uses "->" to indicate the mapping from a string (key) to a node. Do not confuse the arrows used under the children as Grammar productions. 4. While the nodes have IDs similar to the string with which they get hashed, this is just an artifact of writing this down. You will be manipulating java instances in the code. 5. The left hand side (LHS) of the rule is added to the key of a prefix tree to differentiate prefixes of different left hand sides. (Remember that the prefix merge only works for rules with the same LHS). The suffix tree wouldn't have the lhs in the key; instead, it has an extra word, "DUMMY", where the LHS symbol would go (i.e. the first element of the affix tree). 6. If you have worked through the handout, you know that the information associated with each AffixTreeNode is nearly enough to calculate the cost of a merge at that node. (You of course need alpha also.)