----------------------------------------- MODEL MERGING ASSIGNMENT HINTS/DISCUSSION ----------------------------------------- Here is some useful information that is pulled from the newsgroup. Using the newsgroup for questions benefits everyone. This document contains information on: a link for the java api for those of you that need help with java, a reminder to make the alpha value available in your AffixTree/AffixTreeNode and more description on how the AffixTreeNode works. -------------- JAVA API LINKS -------------- Just in case you don't know about this: http://java.sun.com/j2se/1.3/docs/api/index.html This is the url of the java api for java 1.3. It's not the most recent API, but that's intentional. The starter code was written before newer java "features" like generics were added, so the older API will be less confusing. So if you don't know all the methods on Hashtable, for instance, you can go to the api and find how to use the hashtable and how to get all the keys and values out of it. -------------------- ALPHA AND AFFIXTREES -------------------- Make sure that the alpha value is passed into AffixTree. One can't find the best merge without it. Depending on how you design the code, you may or may not want to also pass it into AffixTreeNode. It depends on where you want the calculation of the cost of a merge to take place. -------------------------- MORE INFO ON AFFIXTREENODE -------------------------- Consider the following rules (numbered for convenience) 1. Q -> a b c 2. Q -> a b d 3. Q -> a b c Z 4. Q -> b g 5. Z -> b g 6. Z -> b h One could make a prefix tree from these rules that looked like this: root-node / \ \ Q-a Q-b Z-b / | | \ Q-ab Q-bg Z-bg Z-bh / \ Q-abc Q-abd / Q-abcZ The notation in this picture gives the nodes an ID made with the LHS a dash and the prefixsofar. For some of the nodes in the tree, I will show what the values of its fields would be if it was represented by the AffixTreeNode class: root-node height = 0 parent = null affix = "" rules = (1,2,3,4,5,6) children "Q-a" -> Q-a "Q-b" -> Q-b "Z-b" -> Z-b Q-a height = 1 parent = root-node affix = "a" rules = (1,2,3) children "Q-ab" -> Q-ab Q-ab height = 2 parent = Q-a affix = "ab" rules = (1,2,3) children "Q-abc" -> Q-abc "Q-abd" -> Q-abd Q-abc height = 3 parent = Q-ab affix = "abc" rules = (1,3) children "Q-abcZ" -> Q-abcZ Q-abd height = 3 parent = Q-ab affix = "abd" rules = (2) children //none Z-b height = 1 parent = root-node affix = "b" rules = (5,6) children "Z-bg" -> Z-bg "Z-bh" -> Z-bh 1. In these examples, rules was a list of IDs, but in your program, you should make it a vector of Rule instances. 2. Hashtables map keys (in this case a string) to a value (in this case an AffixTreeNode). 3. The Hashtable notation I adopted for the children uses "->" to indicate the mapping from a string (key) to a node. Do not confuse the arrows used under the children as Grammar productions. 4. While the nodes have IDs similar to the string with which they get hashed, this is just an artifact of writing this down. You will be manipulating java instances in the code. 5. The left hand side (LHS) of the rule is added to the key of a prefix tree to differentiate prefixes of different left hand sides. (Remember that the prefix merge only works for rules with the same LHS). The suffix tree wouldn't have the lhs in the key 6. If you have worked through the handout, you know that the information associated with each AffixTreeNode is enough to calculate the cost of a merge at that node. (Don't forget to pass alpha in to the node's constructor.) 7. Remember that this is only one possible way of arranging your prefix tree. An alternative is to make the children of the root hold the LHS symbol, as in the slides from section 10.