Lab 5: Trees, Dictionaries, Mutable Values

Due at 11:59pm on 02/25/2015.

Starter Files

Download lab05.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

Submission

By the end of this lab, you should have submitted the lab with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be graded.

To receive credit for this lab, you must complete Questions 1, 2, 3, and 4 in lab05.py and submit through OK.
Questions 6, 7, 8, and 9 are extra practice. They can be found in the lab05_extra.py file. It is recommended that you complete these problems on your own time.

Trees

Terminology
Implementation
Question 1
Question 2

Dictionaries

Question 3

Mutable Values

Question 4

Extra Questions
More Trees

Question 5
Question 6
Question 7

Dictionaries + Shakespeare

Question 8
Question 9
Putting it all together

Trees

Trees are a way to represent a hierarchy of information. A file directory is a good example of a tree structure. There is a root folder that contains several other folders — home, bin, user, etc. — and within each of these there exists a similar hierarchy.

The name "tree" comes from the branching structure, like real trees in nature — except that CS trees are drawn with the root at the top and the leaves at the bottom.

Terminology

node: a point in the tree. In these pictures, each node includes a datum (value at each node).
root: the node at the top of a tree. Every tree has one root node.
branches: the branches of a node are the trees directly beneath the node.
leaf: a node that has no branches.
depth: the depth of a node is the number of levels below the root node of the tree. The depth of the root is 0, and the depth of its branches is 1.

Implementation

For this lab, we will be using trees according to the following specification: a tree consists of a root and a list of branches. Each of these branches is itself a tree. A leaf is represented as a tree whose list of branches is an empty list.

Our implementation of trees can be found in lab05.py, though since it is an Data Abstraction, the implementation is not important. You can treat the object returned as a tree-type object, no matter what its actual type is. The interface for our trees consists of the following functions:

Constructor
- tree(root, branches=[]): Creates a tree object with the given data.
Selectors
- root(tree): Returns the value at the root of the tree.
- branches(tree): Returns a list of tree objects that are the branches of the given tree.

For example, the tree generated by

t = tree(1, [tree(2),
             tree(3, [tree(4), tree(5)]),
             tree(6, [tree(7)])])

would look like this:

It may be easier to visualize this translation by formatting the code like this:

t = tree(1,
         [tree(2),
          tree(3,
               [tree(4),
                tree(5)]),
          tree(6,
               [tree(7)])])

To extract the number 3 from this tree, we would do this:

root(branches(t)[1])

The print_tree function prints out a tree in a human-readable form. The exact form follows the pattern illustrated above, where the root is unindented, and each of its branches is indented one level further.

def print_tree(t, indent=0):
    """Print a representation of this tree in which each node is
    indented by two spaces times its depth from the root.

    >>> print_tree(tree(1))
    1
    >>> print_tree(tree(1, [tree(2)]))
    1
      2
    >>> print_tree(numbers)
    1
      2
      3
        4
        5
      6
        7
    """
    print('  ' * indent + str(root(t)))
    for branch in branches(t):
        print_tree(branch, indent + 1)

Question 1

Define the function countdown_tree so that it returns the specific tree below. Make sure to use the tree constructor from the Data Abstraction!

The doctest below shows the print_tree representation:

def countdown_tree():
    """Return a tree that has the following structure.

    >>> print_tree(countdown_tree())
    10
      9
        8
      7
        6
          5
    """
    "*** YOUR CODE HERE ***"
    return tree(10, [tree(9, [tree(8)]),
                    tree(7, [tree(6, [tree(5)])])])

Question 2

Define the function size_of_tree, which takes in a tree as an argument and returns the number of entries in the tree.

def size_of_tree(t):
    """Return the number of entries in the tree.

    >>> print_tree(numbers)
    1
      2
      3
        4
        5
      6
        7
    >>> size_of_tree(numbers)
    7
    """
    "*** YOUR CODE HERE ***"
    return 1 + sum([size_of_tree(t) for t in branches(t)])

# Alternate solution
def size_of_tree(t):
    branches_sum = 0
    for branch in branches(t):
        branches_sum += size_of_tree(branch)
    return 1 + branches_sum

Dictionaries

Dictionaries are unordered sets of key-value pairs. To create a dictionary, use the following syntax:

>>> singers = { 'Iggy Azalea': 'Fancy', 'Beyonce': 'Flawless', 'Adam Levine': 'Maps'}

The curly braces denote the key-value pairs in your dictionary. Each key-value pair is separated by a comma. For each pair, the key appears to the left of the colon and the value appears to the right of the colon. You can retrieve values from your dictionary by "indexing" using the key:

>>> singers['Beyonce']
'Flawless'
>>> singers['Iggy Azalea']
'Fancy'

You can modify an entry for an existing key in the dictionary using the following syntax. Adding a new key follows identical syntax!

>>> singers['Beyonce'] = 'Survivor'
>>> singers['Beyonce']
'Survivor'
>>> singers['Nicki Minaj'] = 'Anaconda' # new entry!
>>> singers['Nicki Minaj']
'Anaconda'

You can also check for membership of keys!

>>> 'Adam Levine' in singers
True

Question 3

Implement the function counter which takes in a string of words, and returns a dictionary where each key is a word in the message, and each value is the number of times that word is present in the original string.

def counter(message):
    """ Returns a dictionary of each word in message mapped
    to the number of times it appears in the input string.

    >>> x = counter('to be or not to be')
    >>> x['to']
    2
    >>> x['be']
    2
    >>> x['not']
    1
    >>> y = counter('run forrest run')
    >>> y['run']
    2
    >>> y['forrest']
    1
    """
    word_list = message.split()
    "*** YOUR CODE HERE ***"
    result_dict = {}
    for word in word_list:
        if word in result_dict:
            result_dict[word] += 1
        else:
            result_dict[word] = 1
    return result_dict

Mutable Values

Question 4

What does Python print? Think about these before typing it into an interpreter!

>>> lst = [1, 2, 3, 4, 5, 6]
>>> lst[4] = 1
>>> lst
______
[1, 2, 3, 4, 1, 6]
>>> lst[2:4] = [9, 8]
>>> lst
______
[1, 2, 9, 8, 1, 6]
>>> lst[3] = ['hi', 'bye']
>>> lst
______
[1, 2, 9, ['hi', 'bye'], 1, 6]

>>> lst[3:] = ['oski', 'bear']
>>> lst
______
[1, 2, 9, 'oski', 'bear']
>>> lst[1:3] = [2, 3, 4, 5, 6, 7, 8]
>>> lst
______
[1, 2, 3, 4, 5, 6, 7, 8, 'oski', 'bear']

>>> lst == lst[:]
______
True
>>> lst is lst[:]
______
False
>>> a = lst[:]
>>> a[0] = 'oogly'
>>> lst
______
[1, 2, 3, 4, 5, 6, 7, 8, 'oski', 'bear']

>>> lst = [1, 2, 3, 4]
>>> b = ['foo', 'bar']
>>> lst[0] = b
>>> lst
______
[['foo', 'bar'], 2, 3, 4]
>>> b[1] = 'ply'
>>> lst
______
[['foo', 'ply'], 2, 3, 4]
>>> b = ['farply', 'garply']
>>> lst
______
[['foo', 'ply'], 2, 3, 4]
>>> lst[0] = lst
>>> lst
______
[[...], 2, 3, 4]
>>> lst[0][0][0][0][0]
______
[[...], 2, 3, 4]

Extra Questions

The following questions are for extra practice — they can be found in the the lab05_extra.py file. It is recommended that you complete these problems on your own time.

More Trees

Question 5

Define the function height, which takes in a tree as an argument and returns the depth of the deepest node in the tree.

def height(t):
    """Return the depth of the deepest node in the tree.

    >>> height(tree(1))
    0
    >>> height(tree(1, [tree(2), tree(3)]))
    1
    >>> print_tree(numbers)
    1
      2
      3
        4
        5
      6
        7
    >>> height(numbers)
    2
    """
    "*** YOUR CODE HERE ***"
    if is_leaf(t):
        return 0
    deepest = 0
    for branch in branches(t):
        deepest = max(deepest, height(branch))
    return deepest + 1

# Alternate solution
def height(t):
    if is_leaf(t):
        return 0
    return 1 + max([height(branch) for branch in branches(t)])

# Alternate solution 2
from functools import reduce

def height(t):
    if is_leaf(t):
        return 0
    return reduce(max, [height(branch) for branch in branches(t)], 0)

Question 6

The squirrels on campus need your help! There are a lot of trees on campus and the squirrels would like to know which ones contain acorns. Define the function acorn_finder, which takes in a tree and returns True if the tree contains a node with the value 'acorn' and False otherwise.

def acorn_finder(t):
    """Returns True if t contains a node with the value 'acorn' and
    False otherwise.

    >>> sproul = tree('roots', [tree('branch1', [tree('leaf'), tree('acorn')]), tree('branch2')])
    >>> acorn_finder(sproul)
    True
    >>> acorn_finder(numbers)
    False
    """
    "*** YOUR CODE HERE ***"
    if root(t) == 'acorn':
      return True
    for branch in branches(t):
      if acorn_finder(branch) == True:
        return True
    return False

Question 7

Define the function preorder, which takes in a tree as an argument and returns a list of all the entries in the tree in the order that print_tree would print them. This ordering of the nodes in a tree is called a preorder traversal (you will learn about more orders of traversing a tree in CS 61B).

def preorder(t):
    """Return a list of the entries in this tree in the order that they
    would be visited by a preorder traversal (see problem description).

    >>> preorder(numbers)
    [1, 2, 3, 4, 5, 6, 7]
    >>> preorder(tree(2, [tree(4, [tree(6)])]))
    [2, 4, 6]
    """
    "*** YOUR CODE HERE ***"
    if branches(t) == []:
        return [root(t)]
    flattened_branches = []
    for branch in branches(t):
        flattened_branches += preorder(branch)
    return [root(t)] + flattened_branches

# Alternate solution
def preorder(t):
    return reduce(add, [preorder(branch) for branch in branches(t)], [root(t)])

Dictionaries + Shakespeare

We will use dictionaries to approximate the entire works of Shakespeare! We're going to use a bigram language model. Here's the idea: We start with some word — we'll use "The" as an example. Then we look through all of the texts of Shakespeare and for every instance of "The" we record the word that follows "The" and add it to a list, known as the successors of "The". Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat". Then we look up the successors of "cat" and randomly choose a word from that list, and we continue this process. This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a "successor table", although really it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

Question 8

Here's an incomplete definition of the build_successors_table function. The input is a list of words (corresponding to a Shakespearean text), and the output is a successors table. (By default, the first word is a successor to "."). See the example below:

def build_successors_table(tokens):
    """Return a dictionary: keys are words; values are lists of
    successors.

    >>> text = ['We', 'came', 'to', 'investigate', ',', 'catch', 'bad', 'guys', 'and', 'to', 'eat', 'pie', '.']
    >>> table = build_successors_table(text)
    >>> sorted(table)
    [',', '.', 'We', 'and', 'bad', 'came', 'catch', 'eat', 'guys', 'investigate', 'pie', 'to']
    >>> table['to']
    ['investigate', 'eat']
    >>> table['pie']
    ['.']
    >>> table['.']
    ['We']
    """
    table = {}
    prev = '.'
    for word in tokens:
        if prev not in table:
            "*** YOUR CODE HERE ***"
            table[prev] = []
        "*** YOUR CODE HERE ***"
        table[prev].append(word)        prev = word
    return table

Question 9

Let's generate some sentences! Suppose we're given a starting word. We can look up this word in our table to find its list of successors, and then randomly select a word from this list to be the next word in the sentence. Then we just repeat until we reach some ending punctuation.

Hint: to randomly select from a list, first make sure you import the Python random library with import random and then use the expression random.choice(my_list))

This might not be a bad time to play around with adding strings together as well. Let's fill in the construct_sent function!

def construct_sent(word, table):
    """Prints a random sentence starting with word, sampling from
    table.
    """
    import random
    result = ' '
    while word not in ['.', '!', '?']:
        "*** YOUR CODE HERE ***"
        result += word + ' '
        word = random.choice(table[word])
    return result + word

Putting it all together

Great! Now all that's left is to run our functions with some actual code. The following snippet included in the skeleton code will return a list containing the words in all of the works of Shakespeare.

Warning: do NOT try to print the return result of this function.

def shakespeare_tokens(path='shakespeare.txt', url='http://goo.gl/SztLfX'):
    """Return the words of Shakespeare's plays as a list."""
    import os
    from urllib.request import urlopen
    if os.path.exists(path):
        return open('shakespeare.txt', encoding='ascii').read().split()
    else:
        shakespeare = urlopen(url)
        return shakespeare.read().decode(encoding='ascii').split()

Next, we probably want an easy way to refer to our list of tokens and our successors table. Let's make the following assignments (Note: the following lines are commented in the provided file. Uncomment them before procceding)

tokens = shakespeare_tokens()
table = build_successors_table(tokens)

Finally, let's define an easy to call utility function:

>>> def sent():
...     return construct_sent('The', table)
>>> sent()
" The plebeians have done us must be news-cramm'd "

>>> sent()
" The ravish'd thee , with the mercy of beauty "

>>> sent()
" The bird of Tunis , or two white and plucker down with better ; that's God's sake "

Notice that all the sentences start with the word "The". With a few modications, we can make our sentences start with a random word. The following random_sent function (defined in your starter file) will do the trick:

def random_sent():
    import random
    return construct_sent(random.choice(table['.']), table)

Go ahead and load your file into Python (be sure to use the -i flag). You can now call the random_sent function to generate random Shakespearean sentences!

>>> random_sent()
' Long live by thy name , then , Dost thou more angel , good Master Deep-vow , And tak'st more ado but following her , my sight Of speaking false !'

>>> random_sent()
' Yes , why blame him , as is as I shall find a case , That plays at the public weal or the ghost .'