2-3 TREES =========== A 2-3 tree is a perfectly balanced tree. It has a big advantage over regular binary search trees: because the tree is perfectly balanced, find, insert, and remove operations take O(log n) time, even in the worst case. 2-3 trees are thus named because every node has 2 or 3 children, except leaves, which are all at the bottom level of the tree. Each node stores 1 or 2 entries, which determine how other entries are distributed among its children's subtrees. Each internal (non-leaf) node has one more child than entries. For example, a node with keys [20, 40] has three children. Eack key k in the subtree rooted at the first child satisfies k <= 20; at the second child, 20 <= k <= 40; and at the third child, k >= 40. Some valid 2-3 trees: |1| |1 2| |2| |2 5| |3| / \ / | \ / \ |1| |3| |1| |4| |7 9| |1 2| |5 9| Some invalid 2-3 trees: |1 2 3| |2| |2| |2 3| |3 5| / / \ / \ / | \ |1| |3| |4| |1| |5| |1| |4| |7 9| / \ / | \ |0| |2| |6| |8| |10| 2-3 trees are a type of B-tree, which you may learn about someday in connection with fast disk access for database systems. B-trees on disks usually use the top-down insertion/deletion algorithms, because accessing a disk track is slow, so you'd rather not revisit it multiple times. [1] Entry find(Object k); Finding an entry is straightforward. ======= Start at the root. At each node, +40 50+ check for the key k; if it's not /----=======-----\ present, move down to the / / \ \ appropriate child chosen by ---- ---- ======= comparing k against the keys. |32| |43| +70 79+ Continue until k is found, ---- ---- ======= or k is not found at a / \ / \ / | \ leaf node. For example, ---- ---- ---- ---- ------- ==== ---- find(74) visits the |25| |33| |42| |47| |57 66| +74+ |81| double-lined boxes at right. ---- ---- ---- ---- ------- ==== ---- Incidentally, you can define an inorder traversal on 2-3 trees analogous to that on binary trees, and it visits the keys in sorted order. [2] Entry insert(Object k); When inserting into a 2-3 tree, you always add to a leaf node. That means you must traverse all the way down to the bottom of the tree, similar to the traversal for find(). Then you add the node to the proper place in the leaf. For inserting into a leaf with one value, this is straightforward. Let's add 7 to the following nodes: |4| -> (add 7) -> |4 7| |9| -> (add 7) -> |7 9| Either addition creates a three-node, so this is fine. For inserting into a leaf with two values, this becomes more complicated. A node cannot hold more than 2 values, so one must be kicked out of the leaf. To do this, first create a temporary four-node (not allowed in 2-3 trees), then push the middle node up. Let's add to a three-node: |5| / \ |4 7| -> (add 5) -> |4 5 7| -> (push up middle) -> |4| |7| What happens when we add to a bigger tree? A student might try to do something like this: |5| |5| |5| |7| / \ -> (add 7) -> / \ -> (push middle) -> / \ / \ |4| |6 8| |4| |6 7 8| |4| |6| |8| (THIS IS INCORRECT) This is incorrect because the tree must have a root. If push nodes up like this from anywhere except the top of the tree, it will create a second "root", which is not allowed. Instead, when the node is kicked out it gets pushed in to the node above it. Here is the correct way to insert: |5| |5| |5 7| / \ -> (add 7) -> / \ -> (push middle) -> / | \ |4| |6 8| |4| |6 7 8| |4| |6| |8| Note that the steps still contain a temporary 4-node. The end result this time is a valid tree because there is only one root, and no nodes contain more than 2 values. What happens with a more complicated tree? |3 8| |3 8| |3 5 8| / | \ -> (add 6) -> / | \ -> (push up 5) -> / | | \ |1 2| |4 5| |9| |1 2| |4 5 6| |9| |1 2| |4| |6| |9| |5| / \ |3 5 8| |3| |8| / | | \ -> (push up 5) -> / \ / \ |1 2| |4| |6| |9| |1 2| |4| |6| |9| We still push up the middle node if a 4-node is created. The middle node is not always the newly added element, it's purely coincidental that 6 was pushed both times. The splitting process is applied recursively going back up the tree, possibly as high as the root. At most, there are log(n) splits. Runtime ------- First we need to find which leaf the value will be added to. Because 2-3 trees are balanced, there are exactly log(n) levels to travese. After that, any 4-nodes must be broken, which will push nodes up through at most log(n)+1 levels of the tree, where n is the number of items in the tree.