61A Homework 11

Due by 5pm on Friday, 11/16

Submission. See the online submission instructions. We have provided a starter file for the questions below.

Readings. Section 3 of the online lecture notes.

This homework consists of additional recursion problems, to be completed in Python.

Q1. Given a set of numbers s and a target difference d, how many different ways are there to split s into two subsets such that the sum of the first is within d of the sum of the second? The number of elements in each subset can differ:

def num_splits(s, d):
    """Return the number of ways in which s can be partitioned into two
    sublists that have sums within d of each other.

    >>> num_splits({1, 5, 4}, 0)  # splits to {1, 4} and {5}
    1
    >>> num_splits({6, 1, 3}, 1)  # no split possible
    0
    >>> num_splits({-2, 1, 3}, 2) # {-2, 3} {1} and {-2, 1, 3} {}
    2
    >>> num_splits({1, 4, 6, 8, 2, 9, 5}, 3)
    12

    Hint: You can split a set s into one arbitrary element and the rest with:

    k = s.pop()
    rest = set(s)
    s.add(k)

    After which s is unchanged, and {k}.union(rest) == s
    """
    "*** YOUR CODE HERE ***"

Q2. How many different possible full binary tree (each node has two branches or 0, but never 1) structures exist that have exactly n leaves?

For those interested in combinatorics, this problem does have a closed form solution):

def num_trees(n):
    """How many full binary trees have exactly n leaves? E.g.,

    1   2        3       3    ...
    *   *        *       *
       / \      / \     / \
      *   *    *   *   *   *
              / \         / \
             *   *       *   *

    >>> num_trees(1)
    1
    >>> num_trees(2)
    1
    >>> num_trees(3)
    2
    >>> num_trees(8)
    429

    """
    "*** YOUR CODE HERE ***"

Q3. Mario needs to jump over a sequence of Piranha plants, represented as a string of spaces (no plant) and P's (plant!). He only moves forward, and he can either step (move forward one space) or jump (move forward two spaces) from each position. How many different ways can Mario traverse a level without stepping or jumping into a Piranha plant? Assume that every level begins with a space (where Mario starts) and ends with a space (where Mario must end up):

def mario_number(level):
    """Return the number of ways that Mario can perform a sequence of steps
    or jumps to reach the end of the level without ever landing in a Piranha
    plant. Assume that every level begins and ends with a space.

    >>> mario_number(' P P ')   # jump, jump
    1
    >>> mario_number(' P P  ')   # jump, jump, step
    1
    >>> mario_number('  P P ')  # step, jump, jump
    1
    >>> mario_number('   P P ') # step, step, jump, jump or jump, jump, jump
    2
    >>> mario_number(' P PP ')  # Mario cannot jump two plants
    0
    >>> mario_number('    ')    # step, jump ; jump, step ; step, step, step
    3
    >>> mario_number('    P    ')
    9
    >>> mario_number('   P    P P   P  P P    P     P ')
    180
    """
    "*** YOUR CODE HERE ***"

Q4. (Extra for experts) Print the solution to a text maze consisting of spaces (open) and stars (blocked). You may assume that the maze has a solution, that the entrance is in the bottom left, and that the exit is in the top right. All mazes are rectangular and defined by strings:

maze1 = """
******** *
* *      *
* * ******
* *   *  *
* *** *  *
*   * *  *
*** * *  *
*        *
* ********
"""

maze2 = """
********** *
*      *   *
*** **** * *
*      * * *
* **** * * *
*      *** *
****** *   *
*        * *
* **********
"""

def print_path(maze):
    """Print a path through a maze.

    >>> print_path(maze1)
    ********.*
    * *......*
    * *.******
    * *...*  *
    * ***.*  *
    *   *.*  *
    *** *.*  *
    *.....   *
    *.********

    >>> print_path(maze2)
    **********.*
    *      *  .*
    *** **** *.*
    *      * *.*
    * **** * *.*
    *      ***.*
    ****** *...*
    *........* *
    *.**********
    """
    "*** YOUR CODE HERE ***"

Huffman Encoding Trees (Extra for Experts)

This problem is marked extra for experts because it requires a fair amount of reading. However, all students should be able to solve it.

We consider the problem of representing text as a sequence of ones and zeros (bits). For example, the ASCII standard code used to represent text in computers encodes each character as a sequence of seven bits, and 128 characters are encoded in total. In general, if we want to distinguish n different symbols, we will need to use log base 2 of n bits per symbol.

As a simpler example, consider encoding only A, B, C, D, E, F, G, and H. We can choose a code with three bits per character, for example:

A 000   C 010   E 100   G 110

B 001   D 011   F 101   H 111

With this code, the message:

BACADAEAFABBAAAGAH

is encoded as the string of 54 bits:

001000010000011000100000101000001001000000000110000111

Codes such as ASCII and the A-through-H code above are known as fixed-length codes, because they represent each symbol in the message with the same number of bits. It is sometimes advantageous to use variable-length codes, in which different symbols may be represented by different numbers of bits. If our messages are such that some symbols appear very frequently and some very rarely, we can encode data more efficiently (i.e., using fewer bits per message) if we assign shorter codes to the frequent symbols. Consider the following alternative code for the letters A through H:

A 0     C 1010   E 1100   G 1110

B 100   D 1011   F 1101   H 1111

With this code, the same message as above is encoded as the string:

100010100101101100011010100100000111001111

This string contains 42 bits, so it saves more than 20% in space in comparison with the fixed-length code shown above.

One of the difficulties of using a variable-length code is knowing when you have reached the end of a symbol in reading a sequence of zeros and ones. One solution is to design the code in such a way that no complete code for any symbol is the beginning (or prefix) of the code for another symbol. Such a code is called a prefix code. In the example above, A is encoded by 0 and B is encoded by 100, so no other symbol can have a code that begins with 0 or with 100.

In general, we can attain significant savings if we use variable-length prefix codes that take advantage of the relative frequencies of the symbols in the messages to be encoded. One particular scheme for doing this is called the Huffman encoding method, after its discoverer, David Huffman. A Huffman code can be represented as a binary tree whose leaves are the symbols that are encoded.

Each symbol at a leaf is assigned a weight (which is its relative frequency), and each non-leaf node contains a weight that is the sum of all the weights of the leaves lying below it. The weights are not used in the encoding or the decoding process. We will see below how they are used to help construct the tree.

The figure below shows the Huffman encoding tree for the A-through-H prefix code given above. The weights at the leaves indicate that the tree was designed for messages in which A appears with relative frequency 8, B with relative frequency 3, and the other letters each with relative frequency 1.

huf.png

Decoding

To decode a bit sequence using a Huffman tree, we begin at the root and use the successive zeros and ones of the bit sequence to determine whether to move down the left or the right branch. Each time we come to a leaf, we have generated a new symbol in the message, at which point we start over from the root of the tree to find the next symbol.

For example, suppose we are given the tree above and the sequence 10001010

Starting at the root, we move down the right branch, (since the first bit of the string is 1), then down the left branch (since the second bit is 0), then down the left branch (since the third bit is also 0). This brings us to the leaf for B, so the first symbol of the decoded message is B.

Now we start again at the root, and we make a left move because the next bit in the string is 0. This brings us to the leaf for A. Then we start again at the root with the rest of the string 1010, so we move right, left, right, left and reach C. Thus, the entire message is BAC.

Encoding

Given a Huffman tree, we can enumerate the codes of all letters by traversing the tree. That is, we can write a function that takes as input the tree and returns a list of letter-code pairs that gives the code for each letter. Then, encoding a message involves concatenating the code for each letter in the message.

Generating Huffman Encoding Trees

Given an alphabet of symbols and their relative frequencies, how do we construct the tree that will encode messages with the fewest bits? Huffman gave an algorithm for doing this and showed that the resulting code is indeed the best variable-length code for messages where the relative frequency of the symbols matches the frequencies with which the code was constructed.

The algorithm for generating a Huffman tree is very simple. The idea is to arrange the tree so that the symbols with the lowest frequency appear farthest away from the root. Begin with the set of leaf nodes, containing symbols and their frequencies, as determined by the initial data from which the code is to be constructed. Now find two leaves with the lowest weights and merge them to produce a node that has these two nodes as its left and right branches. The weight of the new node is the sum of the two weights. Remove the two leaves from the original set and replace them by this new node.

Now continue this process. At each step, merge two nodes with the smallest weights, removing them from the set and replacing them with a node that has these two as its left and right branches. The process stops when there is only one node left, which is the root of the entire tree. Here is how the previous example Huffman tree is generated, where trees are described by the set of letter they contain, along with their weight:

Initial leaves:

{(A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1)}

{(A 8) (B 3) ({C D} 2) (E 1) (F 1) (G 1) (H 1)}

{(A 8) (B 3) ({C D} 2) ({E F} 2) (G 1) (H 1)}

{(A 8) (B 3) ({C D} 2) ({E F} 2) ({G H} 2)}

{(A 8) (B 3) ({C D} 2) ({E F G H} 4)}

{(A 8) ({B C D} 5) ({E F G H} 4)}

{(A 8) ({B C D E F G H} 9)}

{({A B C D E F G H} 17)}

The algorithm does not always specify a unique tree, because there may not be unique smallest-weight nodes at each step. Also, the choice of the order in which the two nodes are merged (i.e., which will be the right branch and which will be the left branch) is arbitrary.

Trees are represented by the following two classes:

class Leaf(object):
    def __init__(self, letter, weight):
        self.letter = letter
        self.weight = weight

class Tree(object):
    def __init__(self, left, right):
        self.left = left
        self.right = right
        self.weight = left.weight + right.weight

A = Leaf('a', 8)
BCD = Tree(Leaf('b', 3), Tree(Leaf('c', 1), Leaf('d', 1)))
EFGH = Tree(Tree(Leaf('e', 1), Leaf('f', 1)),
            Tree(Leaf('g', 1), Leaf('h', 1)))
AH = Tree(A, Tree(BCD, EFGH))

Q5. (Extra for experts) Implement decode_one, which takes as arguments a Tree instance and a list of 0's and 1's (bits) and returns the first encoded letter in the code, removing its corresponding bits:

def decode(tree, code):
    """Decode a list of 0's and 1's using the Huffman encoding tree.

    >>> decode(AH, [1, 0, 0, 0, 1, 0, 1, 0])
    'bac'
    """
    word = ''
    while code != []:
        word += decode_one(tree, code)
    return word

def decode_one(tree, code):
    """Decode and remove the first letter in code, using tree.

    >>> code = [1, 0, 0, 0, 1, 0, 1, 0]
    >>> decode_one(AH, code)
    'b'
    >>> code
    [0, 1, 0, 1, 0]
    """
    "*** YOUR CODE HERE ***"

Q6. (Extra for experts) Implement encodings, which returns the encoding for each letter in a Tree as a dictionary:

def encodings(tree):
    """Return all encodings in a tree as a dict from letters to bit lists.

    >>> e = encodings(AH)
    >>> e['a']
    [0]
    >>> e['c']
    [1, 0, 1, 0]
    >>> e['h']
    [1, 1, 1, 1]
    """
    "*** YOUR CODE HERE ***"

Q7. (Extra for experts) Implement huffman, a procedure that takes a list of Tree or Leaf instances and outputs an optimal Huffman encoding tree. Assume that the elements of the input list are sorted from lowest to highest weight:

def huffman(elements):
    """Return an optimal Huffman encoding tree of elements, a sorted lists.

    >>> h = huffman([Leaf('c', 1), Leaf('d', 1), Leaf('b', 3), Leaf('a', 8)])
    >>> for letter, code in sorted(encodings(h).items()): print(letter + ':', code)
    a: [1]
    b: [0, 1]
    c: [0, 0, 0]
    d: [0, 0, 1]
    """
    "*** YOUR CODE HERE ***"