Lab 8: Sets and Orders of Growth + Extra BST
Due at 11:59pm on 03/18/2016.
Starter Files
Download lab08.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.
Submission
By the end of this lab, you should have submitted the lab with
python3 ok --submit
. You may submit more than once before the
deadline; only the final submission will be graded.
- To receive credit for this lab, you must complete questions 1, 2, and 3 in lab08.py and submit through OK.
- Questions 4 through 8 are extra practice. They can be found in the lab08_extra.py file. It is recommended that you complete these problems on your own time.
Sets
A set is an unordered collection of distinct objects that supports membership testing, union, intersection, and adjunction. The main difference between sets and lists are 1. they are unordered, and 2. there are no duplicates. Other than that almost everything is the same.
>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> a # No duplicates
{1, 2, 3}
>>> a = {3, 1, 2}
>>> a # Not necessarily in same order
{1, 2, 3}
The Python documentation on sets has more details.
Question 1: WWPP: Sets
Use OK to test your knowledge with the following "What Would Python Print?" questions:
python3 ok -q sets -u
If you get stuck, try loading lab08.py into an interpreter. To load file
xxx.py
into the interpreter, usepython3 -i xxx.py
command. Using the interpreter is highly recommended because sets are unordered.
>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> len(a)
______3
>>> sorted(a)
______[1, 2, 3]
>>> a.add(4)
>>> a.add(4)
>>> a.remove(4)
>>> 4 in a
______False
>>> a = {1, 4, 12, 1000}
>>> sum(a)
______1017
>>> b = {1, 2, 4}
>>> sorted(a.intersection(b))
______[1, 4]
>>> sorted(a & b)
______[1, 4]
>>> sorted(a.union(b))
______[1, 2, 4, 12, 1000]
>>> sorted(a | b)
______[1, 2, 4, 12, 1000]
>>> sorted(a - b)
______[12, 1000]
Sets + Order of Growth
One really convenient thing about Python sets is that many operations on sets (adding elements, removing elements, checking membership) run in O(1) (constant) time (usually).
Some of the problems use a utility method called
timeit
, which takes a parameterless function as argument, executes it,
and returns the time required to do so. It's a variation on the function
timeit.timeit
function in the Python3 library.
Question 2: Find duplicates
Write the following function so it runs in O(n) time.
def find_duplicates(lst):
"""Returns True if lst has any duplicates and False if it does not.
>>> find_duplicates([1, 2, 3, 4, 5])
False
>>> find_duplicates([1, 2, 3, 4, 2])
True
>>> find_duplicates(list(range(100000)) + [-1]) # make sure you have linear time
False
"""
"*** YOUR CODE HERE ***"
return len(set(lst)) != len(lst)
Use OK to test your code:
python3 ok -q find_duplicates
Question 3: Pow
Write the following function so it runs in O(log k) time.
Hint: this can be done using a procedure called repeated squaring.
def pow(n,k):
"""Computes n^k.
>>> pow(2, 3)
8
>>> pow(4, 2)
16
>>> a = pow(2, 100000000) # make sure you have log time
"""
"*** YOUR CODE HERE ***"
if k == 1:
return n
if k % 2 == 0:
return pow(n*n,k//2)
else:
return n * pow(n*n, k//2)
Use OK to test your code:
python3 ok -q pow
Extra Questions
Question 4: Prime
Write a function that returns whether a number is prime or not in O(sqrt(n)) time, where sqrt means square root. You can assume n >= 2.
Hint: you don't need to check whether every single number that is smaller than n divides n
from math import sqrt
def is_prime_sqrt(n):
"""Tests whether a number N is prime or not. Implement this function
in O(sqrt(n)) time. You can assume n >= 2
>>> is_prime_sqrt(2)
True
>>> is_prime_sqrt(67092481)
False
>>> is_prime_sqrt(524287)
True
>>> is_prime_sqrt(2251748274470911)
False
>>> is_prime_sqrt(6700417)
True
>>> is_prime_sqrt(44895587973889)
False
>>> is_prime_sqrt(2147483647)
True
>>> is_prime_sqrt(67280421310721)
True
"""
# sqrt(k) will give the square root of k as a floating point (decimal)
"*** YOUR CODE HERE ***"
if sqrt(n) < 2:
return True
for i in range(2, int(sqrt(n)) + 1):
if n % i == 0:
return False
return True
Use OK to test your code:
python3 ok -q is_prime_sqrt
Question 5: Find duplicates k
Write the following function so it runs in O(n) time.
Hint: Sets have an
add
and aremove
method.
def find_duplicates_k(k, lst):
"""Returns True if there are any duplicates in lst that are within k
indices apart.
>>> find_duplicates_k(3, [1, 2, 3, 4, 1])
False
>>> find_duplicates_k(4, [1, 2, 3, 4, 1])
True
>>> find_duplicates_k(4, [1, 1, 2, 3, 3])
True
"""
"*** YOUR CODE HERE ***"
prev_set = set()
for i, elem in enumerate(lst):
if elem in prev_set:
return True
prev_set.add(elem)
if i - k >= 0:
prev_set.remove(lst[i - k])
return False
Use OK to test your code:
python3 ok -q find_duplicates_k
The following problems deal with binary trees, in which each node has at most two children. We further modify the definition of tree to allow empty trees, which we haven't dealt with up to now. So here, we define a binary tree as either
- An empty tree, or
- A label and two children, called the left and right child respectively, both of which are binary trees. The left and right children, therefore, are the roots of the left and right subtrees.
As a result of this definition, most recursion on binary trees uses the empty tree as a base case.
Our implementation of binary trees is the following:
# Tree definition
class BinTree:
empty = ()
def __init__(self, label, left=empty, right=empty):
self.label = label
self.left = left
self.right = right
def __repr__(self):
if self.left == self.empty and self.right == self.empty:
return 'BinTree({})'.format(repr(self.label))
left = 'BinTree.empty' if self.left == self.empty else repr(self.left)
right = 'BinTree.empty' if self.right == self.empty else repr(self.right)
return 'BinTree({}, {}, {})'.format(repr(self.label), left, right)
def binTree(tupleTree):
"""A convenience method for succinctly constructing binary trees. The
empty tuple or list stands for BinTree.empty; (V,) or [V] stands
for BinTree(V); (V, T1, T2) or [V, T1, T2] stands for
BinTree(V, binTree(T1), binTree(T2)).
>>> binTree((3,
... (1, (), [2]),
... (6, [5], [7])))
BinTree(3, BinTree(1, BinTree.empty, BinTree(2)), BinTree(6, BinTree(5), BinTree(7)))
"""
if len(tupleTree) == 0:
return BinTree.empty
elif len(tupleTree) == 1:
return BinTree(tupleTree[0])
else:
return BinTree(tupleTree[0], binTree(tupleTree[1]), binTree(tupleTree[2]))
We also included a function print_bintree
, which prints out a string
representation of a tree:
>>> print_bintree(binTree( (1, (3, (), [2]), (4, [5], [6])) ))
-1-
/ \
3 4
\ / \
2 5 6
Question 6: Size
Define the function size
which takes in a BinTree
as an argument and
returns the number of entries in the tree.
def size(tree):
""" Return the number of entries in the binary tree b.
>>> b = BinTree(4,
... BinTree(2,
... BinTree(1)),
... BinTree(6,
... BinTree(5)))
>>> size(b)
5
"""
"*** YOUR CODE HERE ***"
if tree is BinTree.empty:
return 0
return 1 + size(tree.left) + size(tree.right)
Use OK to test your code:
python3 ok -q size
Question 7: Tree to Ordered Link
Implement tree_to_ordered_link
, which converts a binary search tree into a
set represented as an ordered Link
. If you can, implement this function so
that it executes in O(n) time, where n is the number of elements in the tree:
def tree_to_ordered_link(s):
"""Return an ordered Link containing the elements of tree set s.
Assume that s is a well-formed binary search tree.
>>> b = binTree([7, [1, [], [4]], [9]])
>>> tree_to_ordered_link(b)
Link(1, Link(4, Link(7, Link(9))))
"""
"*** YOUR CODE HERE ***"
def ttos_iter(s, r):
if s is BinTree.empty:
return r
return ttos_iter(s.left, Link(s.label, ttos_iter(s.right, r)))
return ttos_iter(s, Link.empty)
Use OK to test your code:
python3 ok -q tree_to_ordered_link
Question 8: Next Smallest Element
Write a function next_element
that takes in a tree T and a number N. It returns the
smallest element that is larger than N. For example, if you have this tree:
--8---
/ \
3- 10-
/ \ \
1 6 14
/ \ /
4 7 13
The next element of 3 is 4, the next element of 6 is 7, the next element of 7 is 8.
def next_element(t, n):
"""This function takes in a binary search tree T and a number N and
it returns the smallest element that is greater than N. None if it has
no such element, or t does not contain n
>>> t = binTree([8, [3, [1], [6, [4], [7]]], [10, [], [14, [13], []]]])
>>> next_element(t, 1)
3
>>> next_element(t, 3)
4
>>> next_element(t, 5)
>>> next_element(t, 7)
8
>>> next_element(t, 10)
13
>>> next_element(t, 14)
>>> result = [1]
>>> a = next_element(t, 1)
>>> while a:
... result += [a]
... a = next_element(t, a)
>>> result
[1, 3, 4, 6, 7, 8, 10, 13, 14]
"""
"*** YOUR CODE HERE ***"
def helper(t, n, root):
if t is BinTree.empty:
return None
if t.label == n:
if t.right is BinTree.empty:
return root
t = t.right
while t.left is not BinTree.empty:
t = t.left
return t.label
if t.label > n:
return helper(t.left, n, t.label)
return helper(t.right, n, root)
return helper(t, n, None)
Use OK to test your code:
python3 ok -q next_element
How is Set Implemented?
This part is not covered in an exam. In case you are interested please
read on.
As you might have noticed, set
is very different from list
. Here's
a quick table on their major differences:
Set | List | |
---|---|---|
Indexed? | No | Yes |
Ordered? | No | Yes |
Membership | O(1) | O(n) |
To understand these differences, we need to look (briefly) at how the set is implemented. If the way a set stores items is like putting wine in the liquor section and putting shortbreads in the snack section in a supermarket, the way a list stores items is like putting every item in the supermarket on one big shelf, ordered by the time it arrives at the supermarket.
With a list, you would need to go through every item in that store before you can be convinced that it ran out of your favorite chips. Horrible, eh? The good news is that you can easily tell which item is the fifth to arrive, but that's probably not useful at all when finding a pack of chips takes more than one hour.
The way set
actually works, is that instead of adding items in
order, it assigns a position for them, so when one needs to check the
membership, set can just look at the assigned position to see if it's
there.
Suppose we want a set to keep track of all the one digit numbers, we can use a list of booleans initialized to be False.
0 1 2 3 4 5 6 7 8 9
F F F F F F F F F F
Now if we want to add the number 1, the set simply marks the boolean at 1 to be True.
0 1 2 3 4 5 6 7 8 9
F T F F F F F F F F
And when we want to see if digit 2 is in the set, we can check if index 2 is True or not.
But what if we want to keep track of all the numbers below 2^32?
Obviously we can't have a list of length 2^32. What we can do, though,
is modding the number with the length of the list. Let's start with a
list of length 10 first, and _
means nothing is there:
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ _ _ _ _ _
Now if we want to add number 5, the set simply computes 5%10 = 5
, and
decides to put number 5 at index 5.
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ 5 _ _ _ _
Now if we want to add number 321, the set simply computes 321%10 = 1
, and
decides to put number 321 at index 1.
0 1 2 3 4 5 6 7 8 9
_ 321 _ _ _ 5 _ _ _ _
And when we want to see if number 321 is in the set, the set will
compute 321%10 = 1
and decides that the number we are looking for might
be at index 1, then go to index 1 to see that it's actually there.
However, since we are not only dealing with numbers in Python, we need
a way to represent these items with number, and that process is called
hashing. In Python, you
can check the hash values for all the objects by calling hash
on them:
>>> hash(1)
1
>>> hash('abc')
6015317455955025372
More interesting details, like dealing collision and keeping efficiency when there are many collisions can be found here, and will be throughly covered in CS61B.
We've also prepared a quick script for you to witness the difference. You can run it by typing
python3 timing.py
in the terminal in your lab folder.