AVL Trees (by Patrick Lutz) ========= The third type of balanced binary search tree that we will look at is called an AVL Tree (for its inventors, Adelson-Velskii and Landis). AVL trees maintain the following invariant: for any node, the heights of the left and right subtrees can differ by at most one. Because of this invariant, AVL trees have worst-case run time of O(log(n)) for searching, inserting and removing (and unlike splay trees, this is not amortized time-- i.e. every insert and remove operation takes O(log(n))). Examples of AVL Trees: 6 9 / \ / \ 3 8 5 17 / \ \ / \ 1 5 9 3 1003 / \ 0 2 Nonexamples of AVL Trees: 6 / \ 3 8 / \ \ This tree is not a Binary Search Tree, 1 5 9 so it is not an AVL Tree / \ \ 0 2 7 7 / \ 4 9 This tree is a Binary Search Tree, but / / \ it does not mantain the other invariant of AVL 2 8 10 trees since 4's left subtree has a height of 2 / \ while 4's right subtree has a height of 0. 0 2014 Similarly to splay trees, AVL trees stay balanced through the use of rotations while inserting and removing from the tree. AVL Tree Operations ------------------- [1] Find The find() operation in an AVL tree works exactly the same as in an ordinary, unremarkable Binary Search Tree. [2] Insert Inserting a node in an AVL tree begins the same way as a normal insertion into a Binary Search Tree. One or more nodes of the tree are then rotated in order to maintain the AVL tree's invariant. Before we get down to the nitty-gritty of how this works, we are going to define a few things. If a node's left subtree is taller than its right subtree (i.e. the height of the left subtree is greater), we will say that the node leans left. If the right subtree is taller, we will say that the node leans right. If the AVL tree invariant is not satisfied at some node (i.e. the heights of the node's left and right subtrees differ by more than one), we will say that the node is unbalanced. With that in mind, let's look at this process in a little more detail: 1. The first step is to perform an insertion in exactly the same way we would on a regular BST -- insert the new node as a leaf node such that the tree remains a BST. 2. Starting with the parent of the newly inserted node, look for the lowest (farthest from the root) unbalanced node in the tree (note that we only have to look at ancestors of the newly inserted node since only those nodes could have become unbalanced by the insertion). If no such node exists, the insertion is done. 3. Let A be the lowest node that is unbalanced. There are 4 possible cases (h denotes a subtree of height h): i) Left Right case: A leans left and its left subtree leans right. ii) Left Left case: A leans left and its left subtree leans left. iii) Right Left case: A leans right and its right subtree leans left. iv) Right Right case: A leans right and its right subtree leans right. Left Right Case Left Left Case Right Left Case Right Right Case ----------------- ---------------- ----------------- ------------------ A A A A / \ / \ / \ / \ B h B h h B h B / \ / \ / \ / \ h C h+1 h C h h h+1 / \ / \ h h-1 (new node inserted h-1 h (new node inserted (new node inserted in the left subtree (new node inserted in the right subtree in the left subtree of B) in the right subtree of B) of C) of C) In the first case, we will rotate B left to reduce it to the same situation as the second case. A A / \ / \ B h --> C h / \ / \ h C B h-1 / \ / \ h h-1 h h In the second case, we rotate A right so that the entire tree becomes balanced again: A B / \ / \ B h --> h+1 A / \ / \ h+1 h h h The third and fourth cases are handled similarly since they are just the mirror images of the first two cases. After performing these rotations we go back to step 2 until there are no more unbalanced nodes. Let's look at an example: Insert 7: Tree before insertion: After step 1: Step 2: No nodes are unbalanced, 3 3 so we are done / \ / \ 1 9 1 9 / \ \ / \ / \ 0 2 11 0 2 7 11 Let's see what would have happened if we inserted 12 instead. Insert 11: Tree before insertion: After step 1: Step 2: The lowest unbalanced node is 3 3 9 since its left subtree has / \ / \ height 0, but its right subtree 1 9 1 9 has height 2 / \ \ / \ \ 0 2 11 0 2 11 \ 12 After step 3: 3 3 / \ / \ 1 9 --> 1 11 The unbalanced node leans right since the right / \ \ / \ / \ subtree has a greater height than the left. Since the 0 2 11 0 2 9 12 right subtree also leans right, this is an example \ of the fourth case. So we have to rotate 9 left. 12 Let's look at one final example: Insert 10: Tree before insertion: After step 1: Step 2: The lowest unbalanced node is 3 3 9 since its left subtree has / \ / \ height 0, but its right subtree 1 9 1 9 has height 2 / \ \ / \ \ 0 2 11 0 2 11 / 10 After step 3: 3 3 3 The unbalanced node leans right / \ / \ / \ and its right subtree leans 1 9 --> 1 9 --> 1 10 left, so this is an example of / \ \ / \ \ / \ / \ the third case. So we have to 0 2 11 0 2 10 0 2 9 11 rotate 11 right and then 9 left. / \ 10 11 In these examples, the resulting tree (after insertion) was maximally balanced. THIS WILL NOT ALWAYS BE THE CASE. [3] Remove Removal in an AVL tree is very similar to insertion. First we remove the node as we would in a boring old regular BST and then, starting at the position of the deleted node, we travel towards the root, along the way applying rotations to fix unbalanced nodes. Once again, let's look at this process in a little more detail. 1. If the node to be removed has zero or one children, then we simply remove it. One child: . . No children: . . . . . . \ \ \ \ A --> B B --> B \ / \ \ B . . A / \ . . . . . . If the node has two children, switch it with its inorder successor (which will always have zero or one children) and then remove it. 2. Starting in the position of the removed node, look for the lowest unbalanced node and apply the appropriate rotations to fix it, in the exact same way as when inserting a node. Keep doing this until all nodes satisfy the AVL tree invariant. Just like when inserting a node, the only nodes that could be unbalanced are the ancestors of the deleted node, so these are the only nodes that we have to examine. There is only one extra case (other than the four cases outlined in the section on insertion). That is when a node is unbalanced, but its taller subtree doesn't lean left or right. In this case, we can simply rotate the unbalanced node towards the shorter subtree. For example, say we try to delete 9 from the following AVL tree: 7 7 4 / \ / \ / \ 4 8 --> 4 8 --> 3 7 / \ \ / \ / / \ 3 5 9 3 5 2 5 8 / \ / \ \ 2 6 2 6 6 9 is a leaf, so we can just 7 is now unbalanced. 7 leans To fix it, we simply rotate remove it. left, but 4 doesn't lean left 7 right. or right, so it is not one of the four cases from insert Because of the invariant that AVL trees satisfy, these three operations all run in O(log(n)) time in the worst case (where n is the number of nodes in the tree). Note: we often refer to the difference between the heights of the left and right subtrees as the *balance factor* of a node. In an AVL tree, no node has a balance factor of less than -1 or greater than 1. Keeping track of this balance factor for each node can help improve efficiency and make it easier to tell when a rotation is required (and what type of rotation to perform). Lagniappe: The Maximum Height of an AVL Tree -------------------------------------------- It may not be clear from all this why the invariant of an AVL tree implies that the basic operations take at most O(log(n)) time. To see why this is, we are going to have to find some sort of bound for the height of an AVL tree. As long as this boound is proportional to log(n), then we are done (since rotations take constant time and for each operation at worst we have to examine the balance factor of a single node on each level of the tree -- i.e. each ancestor of the node we inserted/removed). To figure out a bound on the height of an AVL tree with n nodes, we will first consider the minimum number of nodes in an AVL tree of height h. Let A_h be an AVL tree of height h with the fewest possible nodes. If A_h has height h, then one of its children must have height h-1. We want A_h to have as few nodes as possible, so its other child must have height h-2 (since shorter trees can have fewer nodes). In particular, since A_h has the minimum possible number of nodes, the child with height h-1 must have the fewest possible nodes of any AVL tree of height h-1 and the child of height h-2 must have the fewest possible nodes of any AVL tree of height h-2. Let N_h be the minimum number of nodes of any AVL tree of height h. As we have just seen, N_h = N_h-1 + N_h-2 + 1 (the number of nodes in the two subtrees plus the root). In addition, N_0 = 0 and N_1 = 1 since trees of height 0 always have 0 nodes and trees of height 1 always have 1 node. This relationship (usually called a recurrence relation) may seem a little familiar. Let's see what the first few values are: N_0 = 0 N_2 = 1 + 0 + 1 = 2 N_4 = 4 + 2 + 1 = 7 N_6 = 12 + 7 + 1 = 20 N_1 = 1 N_3 = 2 + 1 + 1 = 4 N_5 = 7 + 4 + 1 = 12 N_7 = 20 + 12 + 1 = 33 You may have heard of the Fibonacci numbers. They are defined by F_0 = 1, F_1 = 1, F_n = F_n-1 + F_n-2. The first few Fibonacci numbers are: F_0 = 1 F_2 = 2 F_4 = 5 F_6 = 13 F_8 = 34 F_1 = 1 F_3 = 3 F_5 = 8 F_7 = 21 You may have noticed that the Fibonacci numbers are kind of similar to N_h. In fact, examining the first few values closely, it seems like N_h = F_h+1 - 1. Is this a coincidence? NO!! Look at the relationship between the N_h again. If we add one to both sides, we get: N_h + 1 = (N_h-1 + 1) + (N_h-2 + 1). This is exactly the Fibonacci relationship! In fact we have actually just proved that N_h = F_h+1 - 1 for any h! It is widely known that the closed form expression for the Fibonacci numbers is: F_n = (phi^n - psi^n)/(phi - psi) where phi = (1 + sqrt(5))/2 and psi = -1/phi. So an AVL tree of height h has at least (phi^(h+1) - psi^(h+1))/(phi - psi) - 1 nodes. We now note that this is greater than phi^(h+1) for all h>0. Thus, for any AVL tree, n > phi^(h+1) where n is the number of nodes and h is the height. Therefore, for any AVL tree, log_phi(n) >= h + 1. So h <= log_phi(n) - 1 <= log_phi(n). Thus the basic operations of an AVL tree take worst case O(log(n)) time. Note: A more precise bound on the height is log_phi(sqrt(5)*(n + 2)) - 2. Also, phi often goes by another name, the "golden ratio" and it shows up surprisingly often in math.