Due at 11:59pm on 03/18/2016.

Starter Files

Download lab08.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

Submission

By the end of this lab, you should have submitted the lab with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be graded.

  • To receive credit for this lab, you must complete questions 1, 2, and 3 in lab08.py and submit through OK.
  • Questions 4 through 8 are extra practice. They can be found in the lab08_extra.py file. It is recommended that you complete these problems on your own time.

Sets

A set is an unordered collection of distinct objects that supports membership testing, union, intersection, and adjunction. The main difference between sets and lists are 1. they are unordered, and 2. there are no duplicates. Other than that almost everything is the same.

>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> a  # No duplicates
{1, 2, 3}
>>> a = {3, 1, 2}
>>> a  # Not necessarily in same order
{1, 2, 3}

The Python documentation on sets has more details.

Question 1: WWPP: Sets

Use OK to test your knowledge with the following "What Would Python Print?" questions:

python3 ok -q sets -u

If you get stuck, try loading lab08.py into an interpreter. To load file xxx.py into the interpreter, use python3 -i xxx.py command. Using the interpreter is highly recommended because sets are unordered.

>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> len(a)
______
3
>>> sorted(a)
______
[1, 2, 3]
>>> a.add(4) >>> a.add(4) >>> a.remove(4) >>> 4 in a
______
False
>>> a = {1, 4, 12, 1000} >>> sum(a)
______
1017
>>> b = {1, 2, 4} >>> sorted(a.intersection(b))
______
[1, 4]
>>> sorted(a & b)
______
[1, 4]
>>> sorted(a.union(b))
______
[1, 2, 4, 12, 1000]
>>> sorted(a | b)
______
[1, 2, 4, 12, 1000]
>>> sorted(a - b)
______
[12, 1000]

Sets + Order of Growth

One really convenient thing about Python sets is that many operations on sets (adding elements, removing elements, checking membership) run in O(1) (constant) time (usually).

Some of the problems use a utility method called timeit, which takes a parameterless function as argument, executes it, and returns the time required to do so. It's a variation on the function timeit.timeit function in the Python3 library.

Question 2: Find duplicates

Write the following function so it runs in O(n) time.

def find_duplicates(lst):
    """Returns True if lst has any duplicates and False if it does not.

    >>> find_duplicates([1, 2, 3, 4, 5])
    False
    >>> find_duplicates([1, 2, 3, 4, 2])
    True
    >>> find_duplicates(list(range(100000)) + [-1]) # make sure you have linear time
    False
    """
"*** YOUR CODE HERE ***"
return len(set(lst)) != len(lst)

Use OK to test your code:

python3 ok -q find_duplicates

Question 3: Pow

Write the following function so it runs in O(log k) time.

Hint: this can be done using a procedure called repeated squaring.

def pow(n,k):
    """Computes n^k.

    >>> pow(2, 3)
    8
    >>> pow(4, 2)
    16
    >>> a = pow(2, 100000000) # make sure you have log time
    """
"*** YOUR CODE HERE ***"
if k == 1: return n if k % 2 == 0: return pow(n*n,k//2) else: return n * pow(n*n, k//2)

Use OK to test your code:

python3 ok -q pow

Extra Questions

Question 4: Prime

Write a function that returns whether a number is prime or not in O(sqrt(n)) time, where sqrt means square root. You can assume n >= 2.

Hint: you don't need to check whether every single number that is smaller than n divides n

from math import sqrt
def is_prime_sqrt(n):
    """Tests whether a number N is prime or not. Implement this function
    in O(sqrt(n)) time. You can assume n >= 2

    >>> is_prime_sqrt(2)
    True
    >>> is_prime_sqrt(67092481)
    False
    >>> is_prime_sqrt(524287)
    True
    >>> is_prime_sqrt(2251748274470911)
    False
    >>> is_prime_sqrt(6700417)
    True
    >>> is_prime_sqrt(44895587973889)
    False
    >>> is_prime_sqrt(2147483647)
    True
    >>> is_prime_sqrt(67280421310721)
    True
    """
    # sqrt(k) will give the square root of k as a floating point (decimal)
"*** YOUR CODE HERE ***"
if sqrt(n) < 2: return True for i in range(2, int(sqrt(n)) + 1): if n % i == 0: return False return True

Use OK to test your code:

python3 ok -q is_prime_sqrt

Question 5: Find duplicates k

Write the following function so it runs in O(n) time.

Hint: Sets have an add and a remove method.

def find_duplicates_k(k, lst):
    """Returns True if there are any duplicates in lst that are within k
    indices apart.

    >>> find_duplicates_k(3, [1, 2, 3, 4, 1])
    False
    >>> find_duplicates_k(4, [1, 2, 3, 4, 1])
    True
    >>> find_duplicates_k(4, [1, 1, 2, 3, 3])
    True
    """
"*** YOUR CODE HERE ***"
prev_set = set() for i, elem in enumerate(lst): if elem in prev_set: return True prev_set.add(elem) if i - k >= 0: prev_set.remove(lst[i - k]) return False

Use OK to test your code:

python3 ok -q find_duplicates_k

The following problems deal with binary trees, in which each node has at most two children. We further modify the definition of tree to allow empty trees, which we haven't dealt with up to now. So here, we define a binary tree as either

  • An empty tree, or
  • A label and two children, called the left and right child respectively, both of which are binary trees. The left and right children, therefore, are the roots of the left and right subtrees.

As a result of this definition, most recursion on binary trees uses the empty tree as a base case.

Our implementation of binary trees is the following:

# Tree definition
class BinTree:
    empty = ()

    def __init__(self, label, left=empty, right=empty):
        self.label = label
        self.left = left
        self.right = right

    def __repr__(self):
        if self.left == self.empty and self.right == self.empty:
            return 'BinTree({})'.format(repr(self.label))
        left = 'BinTree.empty' if self.left == self.empty else repr(self.left)
        right = 'BinTree.empty' if self.right == self.empty else repr(self.right)
        return 'BinTree({}, {}, {})'.format(repr(self.label), left, right)

def binTree(tupleTree):
    """A convenience method for succinctly constructing binary trees.  The
    empty tuple or list stands for BinTree.empty; (V,) or [V] stands
    for BinTree(V); (V, T1, T2) or [V, T1, T2] stands for
    BinTree(V, binTree(T1), binTree(T2)).
    >>> binTree((3,
    ...          (1, (), [2]),
    ...          (6, [5], [7])))
    BinTree(3, BinTree(1, BinTree.empty, BinTree(2)), BinTree(6, BinTree(5), BinTree(7)))
    """
    if len(tupleTree) == 0:
        return BinTree.empty
    elif len(tupleTree) == 1:
        return BinTree(tupleTree[0])
    else:
        return BinTree(tupleTree[0], binTree(tupleTree[1]), binTree(tupleTree[2]))

We also included a function print_bintree, which prints out a string representation of a tree:

>>> print_bintree(binTree( (1, (3, (), [2]), (4, [5], [6])) ))
 -1-
/   \
3   4
 \ / \
 2 5 6

Question 6: Size

Define the function size which takes in a BinTree as an argument and returns the number of entries in the tree.

def size(tree):
    """ Return the number of entries in the binary tree b.

    >>> b = BinTree(4,
    ...         BinTree(2,
    ...             BinTree(1)),
    ...         BinTree(6,
    ...             BinTree(5)))
    >>> size(b)
    5
    """
"*** YOUR CODE HERE ***"
if tree is BinTree.empty: return 0 return 1 + size(tree.left) + size(tree.right)

Use OK to test your code:

python3 ok -q size

Question 7: Tree to Ordered Link

Implement tree_to_ordered_link, which converts a binary search tree into a set represented as an ordered Link. If you can, implement this function so that it executes in O(n) time, where n is the number of elements in the tree:

def tree_to_ordered_link(s):
    """Return an ordered Link containing the elements of tree set s.

    Assume that s is a well-formed binary search tree.

    >>> b = binTree([7, [1, [], [4]], [9]])
    >>> tree_to_ordered_link(b)
    Link(1, Link(4, Link(7, Link(9))))
    """
"*** YOUR CODE HERE ***"
def ttos_iter(s, r): if s is BinTree.empty: return r return ttos_iter(s.left, Link(s.label, ttos_iter(s.right, r))) return ttos_iter(s, Link.empty)

Use OK to test your code:

python3 ok -q tree_to_ordered_link

Question 8: Next Smallest Element

Write a function next_element that takes in a tree T and a number N. It returns the smallest element that is larger than N. For example, if you have this tree:

  --8---   
 /      \  
 3-    10- 
/  \      \
1  6     14
  / \   /  
  4 7  13  

The next element of 3 is 4, the next element of 6 is 7, the next element of 7 is 8.

def next_element(t, n):
    """This function takes in a binary search tree T and a number N and
    it returns the smallest element that is greater than N. None if it has
    no such element, or t does not contain n

    >>> t = binTree([8, [3, [1], [6, [4], [7]]], [10, [], [14, [13], []]]])
    >>> next_element(t, 1)
    3
    >>> next_element(t, 3)
    4
    >>> next_element(t, 5)
    >>> next_element(t, 7)
    8
    >>> next_element(t, 10)
    13
    >>> next_element(t, 14)
    >>> result = [1]
    >>> a = next_element(t, 1)
    >>> while a:
    ...   result += [a]
    ...   a = next_element(t, a)
    >>> result
    [1, 3, 4, 6, 7, 8, 10, 13, 14]
    """
"*** YOUR CODE HERE ***"
def helper(t, n, root): if t is BinTree.empty: return None if t.label == n: if t.right is BinTree.empty: return root t = t.right while t.left is not BinTree.empty: t = t.left return t.label if t.label > n: return helper(t.left, n, t.label) return helper(t.right, n, root) return helper(t, n, None)

Use OK to test your code:

python3 ok -q next_element

How is Set Implemented?

This part is not covered in an exam. In case you are interested please read on. As you might have noticed, set is very different from list. Here's a quick table on their major differences:

Set List
Indexed? No Yes
Ordered? No Yes
Membership O(1) O(n)

To understand these differences, we need to look (briefly) at how the set is implemented. If the way a set stores items is like putting wine in the liquor section and putting shortbreads in the snack section in a supermarket, the way a list stores items is like putting every item in the supermarket on one big shelf, ordered by the time it arrives at the supermarket.

With a list, you would need to go through every item in that store before you can be convinced that it ran out of your favorite chips. Horrible, eh? The good news is that you can easily tell which item is the fifth to arrive, but that's probably not useful at all when finding a pack of chips takes more than one hour.

The way set actually works, is that instead of adding items in order, it assigns a position for them, so when one needs to check the membership, set can just look at the assigned position to see if it's there.

Suppose we want a set to keep track of all the one digit numbers, we can use a list of booleans initialized to be False.

0  1  2  3  4  5  6  7  8  9
F  F  F  F  F  F  F  F  F  F

Now if we want to add the number 1, the set simply marks the boolean at 1 to be True.

0  1  2  3  4  5  6  7  8  9
F  T  F  F  F  F  F  F  F  F

And when we want to see if digit 2 is in the set, we can check if index 2 is True or not.

But what if we want to keep track of all the numbers below 2^32? Obviously we can't have a list of length 2^32. What we can do, though, is modding the number with the length of the list. Let's start with a list of length 10 first, and _ means nothing is there:

0  1  2  3  4  5  6  7  8  9
_  _  _  _  _  _  _  _  _  _

Now if we want to add number 5, the set simply computes 5%10 = 5, and decides to put number 5 at index 5.

0  1  2  3  4  5  6  7  8  9
_  _  _  _  _  5  _  _  _  _

Now if we want to add number 321, the set simply computes 321%10 = 1, and decides to put number 321 at index 1.

0  1  2  3  4  5  6  7  8  9
_ 321  _  _  _  5  _  _  _  _

And when we want to see if number 321 is in the set, the set will compute 321%10 = 1 and decides that the number we are looking for might be at index 1, then go to index 1 to see that it's actually there.

However, since we are not only dealing with numbers in Python, we need a way to represent these items with number, and that process is called hashing. In Python, you can check the hash values for all the objects by calling hash on them:

>>> hash(1)
1
>>> hash('abc')
6015317455955025372

More interesting details, like dealing collision and keeping efficiency when there are many collisions can be found here, and will be throughly covered in CS61B.

We've also prepared a quick script for you to witness the difference. You can run it by typing

python3 timing.py

in the terminal in your lab folder.