Trees

This week we talked about two-dimensional data structures, specifically trees. This is one of the subjects students have the most difficulty with on MT2, so it's worth your time to work through these practice problems!

We've seen four kinds of trees by now; let's recap:

	Branches have data	Branches do not have data
N children	"Capital-T" Trees `MAKE-TREE DATUM CHILDREN LEAF?` (names from CS61A) No empty Trees; "base case" is rarely necessary.	Deep lists (SICP "trees") `CONS LIST APPEND CAR CDR NULL? LIST?` etc. Base cases are the empty list (no more children) and other not-pairs ("atoms"), which are data.
2 children	Binary trees `MAKE-BINARY-TREE ENTRY LEFT-BRANCH RIGHT-BRANCH NULL?` (names from SICP) Base case is the "empty binary tree" (empty list).	`car`/`cdr` recursion `CONS CAR CDR NULL? PAIR?` Base case is any not-pair ("atom"), though sometimes the empty list is treated specially.

Pairwise recursion is the one that trips most people up, but it's basically how you do list recursion most of the time anyway (do something to the car, then to the cdr). Refer to the lecture notes in the Volume II reader for a good review.

The problems we did in section mostly focused on Trees:

Write flatten, a procedure that takes in a Tree and returns (in any order) a list containing every datum in the tree.

Extra: Does your version of flatten return the elements in preorder or postorder? Or does it use some other order?

(show answer)
First let's think about domain and range. flatten is going to take in a tree and return a list, so every one of its cases has to do that.

Let's try doing this using mutual recursion.
```
(define (flatten tree)
  (cons (datum tree) (forest-flatten (children tree))) )
```
Looks good so far…we just need to write forest-flatten. But why don't we have a base case? Well, our base case would be "the empty Tree", but capital-T Trees don't have an "empty Tree" case. There's a tree with no children (a leaf), but forest-flatten should take care of that.
```
(define (forest-flatten forest)
  (if (null? forest)
      '()
      (append (flatten (car forest)) (forest-flatten (cdr forest))) ))
```
This time we have a base case: when we run out of children. Note that this handles the case when there were no children to begin with. The range of forest-flatten is also lists, so we know we're on the right track. This works!

Because in flatten we cons the current datum to the list of descendent data, this will give us a preorder list.

If you want, this can also be written very compactly using higher-order procedures.
```
(define (flatten tree)
  (accumulate append
              (list (datum tree))
              (map flatten (children tree)) ))
```
There's a lot going on here, so let's review accumulate first.
```
              +------------+
              |            V
(accumulate - 0 '(1 2 3))  0
                  | | \   /
                  | |  \ /
                  | |   3
                  | \  /
                  \  -1
                   \ /
                    2
```
accumulate works from right-to-left, combining elements until there's only one left. So (accumulate append '() '((1 2) (3 4) (5 6))) combines lists using append, until there's only one left.

But what's that (list datum) doing there? Well, we have to get the current Tree's datum in there somehow—otherwise it'll just be a list of the children's data. (Actually, since this is a recursive procedure, no data will end up in the result list! But remember, we assume a procedure works correctly when we make a recursive call to it.)

Do we need a base case? Well, map takes care of stopping once every child has been handled. What if tree is a leaf node? Well, then map will just get an empty list of children, and return an empty list to accumulate, which will then use the provided "base case" value of (list datum). Exactly what we wanted. So, no, we don't have to explicitly check for a base case here.

In this case, because accumulate puts the base case at the end, we'll get a postorder list of the data.

You don't have to use HOPs to solve tree problems. This is just a very nice compact representation of what we're doing: "start with my datum, flatten all my children, and put all the lists together using append". Simple if you think about it that way, huh?

(hide answer)

Here's the main group problem I posed in class: given a tree where all data are numbers, find a node which has a given number as its datum and return it. Return #f if no such node exists. Can you solve it...

...for Trees?

(show answer)
The easier way is to use a custom FOREST-SEARCH, which does the custom combining for you:
```
(define (tree-search tree num)
  (if (= num (datum tree))
      tree
      (forest-search (children tree) num) ))

(define (forest-search forest num)
  (if (null? forest)
      #f
      (or (tree-search (car forest) num)
          (forest-search (cdr forest) num) )))
```
Try reading it in concepts. tree-search says "if this tree's datum is what we're looking for, we found it! Otherwise, let's search the children." forest-searh says "if I'm out of trees, it's not here, so return false. Otherwise, see if the first tree has what we're looking for, or if the rest of the forest has what we're looking for." The last line relies on the fact that or in Scheme returns the first true argument you give it, and #f if none of the arguments are true, which in this case is exactly what we want.

You can also use higher-order procedures to do this, but in this particular case it doesn't make it that much better.
```
(define (tree-search tree num)
  (if (= num (datum tree))
      tree
      (accumulate (lambda (child-result result-so-far)
                    (or child-result result-so-far)))
                  #f
                  (map (lambda (child) (tree-search child num))
                       (children tree) )))
```
This isn't really much cleaner. In particular, it's extra ugly because we have to wrap or in a function to use it for accumulate. Why? Because or isn't a function, it's a special form…and you can't pass special forms as functions.

(hide answer)
...for binary trees?

(show answer)
```
(define (binary-tree-search tree num)
  (cond ((empty-binary-tree? tree) #f)
        ((= num (entry tree)) tree)
        (else (or (binary-tree-search (left-branch tree) num)
                  (binary-tree-search (right-branch tree) num) ))))
```
Same basic idea as tree-search, but we only have two branches to deal with. Because of this, we do away with HOPs and just check the two branches manually. Again, the last line relies on the fact that or in Scheme returns the first true argument you give it, and #f if none of the arguments are true, which in this case is exactly what we want.

What is the empty-binary-tree? procedure? Because every binary tree node has exactly two children, there has to be some kind of placeholder that means "no left child" or "no right child". Often this placeholder is nil (the empty list), and empty-binary-tree? will just be null?. SICP just uses null? inline—a DAV, but a small, relatively common, and reasonably acceptable one.

(hide answer)
...for deep lists? (Just return #t or #f)

(show answer)
```
(define (deep-search list-or-element num)
  (if (number? list-or-element)
      (= num list-or-element)
      (let ((child-results
             (filter (lambda (child) (deep-search child num))
                     list-or-element )))
        (not (null? child-results)) )))
```
The recursive case is kind of tricky. Let's see what we do in order.
1. Filter out any children that do not have the number somewhere in their descendents.
2. See if there are any results left, i.e. #t results from children. If so, we'll return #t as well.
As for the base case, it's when the current element is not a list. In this case we just have to check if it's the number we're looking for.

(hide answer)
...for arbitrary pair structures? (Again, just return #t or #f)

(show answer)
```
(define (pair-search pair-or-atom num)
  (cond ((null? pair-or-atom) #f)
        ((pair? pair-or-atom)
         (or (pair-search (car pair-or-atom) num)
             (pair-search (cdr pair-or-atom) num) ))
        (else (= num pair-or-atom)) ))
```
Again, this looks a lot like binary-tree-search. And like deep-search, we can't return anything more useful than the number itself, because only leaf nodes have data. So we'll just return #t or #f.

Like binary-tree-search, we have an explicit null? check for nodes at the bottom of the tree. Although you might be uncomfortable with using the empty list as a sentinel in a pair structure, remember that lists and pairs are sort of a shared data type already, so we can allow it. (To properly follow the original problem, every car or cdr would be either a number or another pair, but lots of times we use car/cdr recursion for deep-list-type structures.)

(hide answer)

We did a few more practice problems with Trees in class: tree-map and leaf?. Both are also in the lecture notes (page 300 in my copy of the Vol. II reader), so I'm not going to discuss them here.

Finally, here's a quick thought experiment (that we didn't get to in class). What if we used Trees in the calculator program, rather than deep lists? What do we have to change in calc-eval to make this work?

```
; Original CALC-EVAL, without error handling
(define (calc-eval exp)
  (if (number? exp)
      exp
      (calc-apply (car exp) (map calc-eval (cdr exp))) ))

(define (calc-apply fn-name args)
  (cond ((eq? fn '+) (accumulate + 0 args))
        ;; ... other COND clauses omitted here
        (else (error "Calc: bad operator: " fn)) ))
```
(show answer)
```
(define (tree-eval tree)
  (if (leaf? tree)
      (datum tree)
      (calc-apply (datum tree) (map tree-eval (children tree))) ))
```
We didn't change much! Our base case test is now leaf?, and our "number" case returns the datum of the tree instead of the tree itself. The trickier case is the call to calc-apply. The procedure name is the datum of the tree, and again, to do proper applicative order, we have to evaluate all the children before calling calc-apply. Not so bad after all.

Many people might be tempted to use tree-map instead of map. But think about the range of tree-map. It takes a tree and returns a new tree, with all the data modified. We want to take a tree and return a number (evaluating the subexpression represented by a child tree). Then we pass the list of numbers to calc-apply. So we're doing it right after all.

Again, you always have to think about domain and range. If you're working with forests, use list procedures. If you're working with trees, use tree procedures. If your function returns a list, it needs to do that in the base case and the recursive case. If your function returns a tree, it needs to do that in the base case and the recursive case.

(hide answer)

Back to Jordy's home page | Course home page