# Lab 8: Sets and Orders of Growth + Extra BST

*Due at 11:59pm on 03/18/2016.*

## Starter Files

Download lab08.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

## Submission

By the end of this lab, you should have submitted the lab with
`python3 ok --submit`

. You may submit more than once before the
deadline; only the final submission will be graded.

- To receive credit for this lab, you must complete questions 1, 2, and 3 in lab08.py and submit through OK.
- Questions 4 through 8 are
*extra practice*. They can be found in the lab08_extra.py file. It is recommended that you complete these problems on your own time.

## Sets

A set is an **unordered** collection of **distinct** objects that supports
membership testing, union, intersection, and adjunction. The main difference between
sets and lists are 1. they are unordered, and 2. there are no duplicates. Other than
that almost everything is the same.

```
>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> a # No duplicates
{1, 2, 3}
>>> a = {3, 1, 2}
>>> a # Not necessarily in same order
{1, 2, 3}
```

The Python documentation on sets has more details.

### Question 1: WWPP: Sets

Use OK to test your knowledge with the following "What Would Python Print?" questions:

`python3 ok -q sets -u`

If you get stuck, try loading lab08.py into an interpreter. To load file

`xxx.py`

into the interpreter, use`python3 -i xxx.py`

command. Using the interpreter is highly recommended because sets areunordered.

```
>>> a = [1, 1, 2, 2, 3, 3]
>>> a = set(a)
>>> len(a)
______3
>>> sorted(a)
______[1, 2, 3]
>>> a.add(4)
>>> a.add(4)
>>> a.remove(4)
>>> 4 in a
______False
>>> a = {1, 4, 12, 1000}
>>> sum(a)
______1017
>>> b = {1, 2, 4}
>>> sorted(a.intersection(b))
______[1, 4]
>>> sorted(a & b)
______[1, 4]
>>> sorted(a.union(b))
______[1, 2, 4, 12, 1000]
>>> sorted(a | b)
______[1, 2, 4, 12, 1000]
>>> sorted(a - b)
______[12, 1000]
```

## Sets + Order of Growth

One really convenient thing about Python sets is that many operations on sets (adding elements, removing elements, checking membership) run in O(1) (constant) time (usually).

Some of the problems use a utility method called
`timeit`

, which takes a parameterless function as argument, executes it,
and returns the time required to do so. It's a variation on the function
`timeit.timeit`

function in the Python3 library.

### Question 2: Find duplicates

Write the following function so it runs in O(n) time.

```
def find_duplicates(lst):
"""Returns True if lst has any duplicates and False if it does not.
>>> find_duplicates([1, 2, 3, 4, 5])
False
>>> find_duplicates([1, 2, 3, 4, 2])
True
>>> find_duplicates(list(range(100000)) + [-1]) # make sure you have linear time
False
"""
"*** YOUR CODE HERE ***"
return len(set(lst)) != len(lst)
```

Use OK to test your code:

`python3 ok -q find_duplicates`

### Question 3: Pow

Write the following function so it runs in O(log k) time.

Hint:this can be done using a procedure called repeated squaring.

```
def pow(n,k):
"""Computes n^k.
>>> pow(2, 3)
8
>>> pow(4, 2)
16
>>> a = pow(2, 100000000) # make sure you have log time
"""
"*** YOUR CODE HERE ***"
if k == 1:
return n
if k % 2 == 0:
return pow(n*n,k//2)
else:
return n * pow(n*n, k//2)
```

Use OK to test your code:

`python3 ok -q pow`

## Extra Questions

### Question 4: Prime

Write a function that returns whether a number is prime or not in O(sqrt(n)) time, where sqrt means square root. You can assume n >= 2.

Hint:you don't need to check whether every single number that is smaller than n divides n

```
from math import sqrt
def is_prime_sqrt(n):
"""Tests whether a number N is prime or not. Implement this function
in O(sqrt(n)) time. You can assume n >= 2
>>> is_prime_sqrt(2)
True
>>> is_prime_sqrt(67092481)
False
>>> is_prime_sqrt(524287)
True
>>> is_prime_sqrt(2251748274470911)
False
>>> is_prime_sqrt(6700417)
True
>>> is_prime_sqrt(44895587973889)
False
>>> is_prime_sqrt(2147483647)
True
>>> is_prime_sqrt(67280421310721)
True
"""
# sqrt(k) will give the square root of k as a floating point (decimal)
"*** YOUR CODE HERE ***"
if sqrt(n) < 2:
return True
for i in range(2, int(sqrt(n)) + 1):
if n % i == 0:
return False
return True
```

Use OK to test your code:

`python3 ok -q is_prime_sqrt`

### Question 5: Find duplicates k

Write the following function so it runs in O(n) time.

Hint:Sets have an`add`

and a`remove`

method.

```
def find_duplicates_k(k, lst):
"""Returns True if there are any duplicates in lst that are within k
indices apart.
>>> find_duplicates_k(3, [1, 2, 3, 4, 1])
False
>>> find_duplicates_k(4, [1, 2, 3, 4, 1])
True
>>> find_duplicates_k(4, [1, 1, 2, 3, 3])
True
"""
"*** YOUR CODE HERE ***"
prev_set = set()
for i, elem in enumerate(lst):
if elem in prev_set:
return True
prev_set.add(elem)
if i - k >= 0:
prev_set.remove(lst[i - k])
return False
```

Use OK to test your code:

`python3 ok -q find_duplicates_k`

The following problems deal with *binary trees*, in which each node has
at most two children. We further modify the definition of tree to allow
*empty trees*, which we haven't dealt with up to now. So here, we define
a binary tree as either

- An empty tree, or
- A label and two children, called the left and right child respectively, both of which are binary trees. The left and right children, therefore, are the roots of the left and right subtrees.

As a result of this definition, most recursion on binary trees uses the empty tree as a base case.

Our implementation of binary trees is the following:

```
# Tree definition
class BinTree:
empty = ()
def __init__(self, label, left=empty, right=empty):
self.label = label
self.left = left
self.right = right
def __repr__(self):
if self.left == self.empty and self.right == self.empty:
return 'BinTree({})'.format(repr(self.label))
left = 'BinTree.empty' if self.left == self.empty else repr(self.left)
right = 'BinTree.empty' if self.right == self.empty else repr(self.right)
return 'BinTree({}, {}, {})'.format(repr(self.label), left, right)
def binTree(tupleTree):
"""A convenience method for succinctly constructing binary trees. The
empty tuple or list stands for BinTree.empty; (V,) or [V] stands
for BinTree(V); (V, T1, T2) or [V, T1, T2] stands for
BinTree(V, binTree(T1), binTree(T2)).
>>> binTree((3,
... (1, (), [2]),
... (6, [5], [7])))
BinTree(3, BinTree(1, BinTree.empty, BinTree(2)), BinTree(6, BinTree(5), BinTree(7)))
"""
if len(tupleTree) == 0:
return BinTree.empty
elif len(tupleTree) == 1:
return BinTree(tupleTree[0])
else:
return BinTree(tupleTree[0], binTree(tupleTree[1]), binTree(tupleTree[2]))
```

We also included a function `print_bintree`

, which prints out a string
representation of a tree:

```
>>> print_bintree(binTree( (1, (3, (), [2]), (4, [5], [6])) ))
-1-
/ \
3 4
\ / \
2 5 6
```

### Question 6: Size

Define the function `size`

which takes in a `BinTree`

as an argument and
returns the number of entries in the tree.

```
def size(tree):
""" Return the number of entries in the binary tree b.
>>> b = BinTree(4,
... BinTree(2,
... BinTree(1)),
... BinTree(6,
... BinTree(5)))
>>> size(b)
5
"""
"*** YOUR CODE HERE ***"
if tree is BinTree.empty:
return 0
return 1 + size(tree.left) + size(tree.right)
```

Use OK to test your code:

`python3 ok -q size`

### Question 7: Tree to Ordered Link

Implement `tree_to_ordered_link`

, which converts a **binary search tree** into a
set represented as an *ordered* `Link`

. If you can, implement this function so
that it executes in O(n) time, where n is the number of elements in the tree:

```
def tree_to_ordered_link(s):
"""Return an ordered Link containing the elements of tree set s.
Assume that s is a well-formed binary search tree.
>>> b = binTree([7, [1, [], [4]], [9]])
>>> tree_to_ordered_link(b)
Link(1, Link(4, Link(7, Link(9))))
"""
"*** YOUR CODE HERE ***"
def ttos_iter(s, r):
if s is BinTree.empty:
return r
return ttos_iter(s.left, Link(s.label, ttos_iter(s.right, r)))
return ttos_iter(s, Link.empty)
```

Use OK to test your code:

`python3 ok -q tree_to_ordered_link`

### Question 8: Next Smallest Element

Write a function `next_element`

that takes in a tree T and a number N. It returns the
smallest element that is larger than N. For example, if you have this tree:

```
--8---
/ \
3- 10-
/ \ \
1 6 14
/ \ /
4 7 13
```

The next element of 3 is 4, the next element of 6 is 7, the next element of 7 is 8.

```
def next_element(t, n):
"""This function takes in a binary search tree T and a number N and
it returns the smallest element that is greater than N. None if it has
no such element, or t does not contain n
>>> t = binTree([8, [3, [1], [6, [4], [7]]], [10, [], [14, [13], []]]])
>>> next_element(t, 1)
3
>>> next_element(t, 3)
4
>>> next_element(t, 5)
>>> next_element(t, 7)
8
>>> next_element(t, 10)
13
>>> next_element(t, 14)
>>> result = [1]
>>> a = next_element(t, 1)
>>> while a:
... result += [a]
... a = next_element(t, a)
>>> result
[1, 3, 4, 6, 7, 8, 10, 13, 14]
"""
"*** YOUR CODE HERE ***"
def helper(t, n, root):
if t is BinTree.empty:
return None
if t.label == n:
if t.right is BinTree.empty:
return root
t = t.right
while t.left is not BinTree.empty:
t = t.left
return t.label
if t.label > n:
return helper(t.left, n, t.label)
return helper(t.right, n, root)
return helper(t, n, None)
```

Use OK to test your code:

`python3 ok -q next_element`

### How is Set Implemented?

This part is not covered in an exam. In case you are interested please
read on.
As you might have noticed, `set`

is very different from `list`

. Here's
a quick table on their major differences:

Set | List | |
---|---|---|

Indexed? | No | Yes |

Ordered? | No | Yes |

Membership | O(1) | O(n) |

To understand these differences, we need to look (briefly) at how the set is implemented. If the way a set stores items is like putting wine in the liquor section and putting shortbreads in the snack section in a supermarket, the way a list stores items is like putting every item in the supermarket on one big shelf, ordered by the time it arrives at the supermarket.

With a list, you would need to go through **every** item in that store
before you can be convinced that it ran out of your favorite chips.
Horrible, eh? The good news is that you can easily tell which item
is the fifth to arrive, but that's probably not useful at all when
finding a pack of chips takes more than one hour.

The way `set`

actually works, is that **instead of adding items in
order, it assigns a position for them**, so when one needs to check the
membership, set can just look at the assigned position to see if it's
there.

Suppose we want a set to keep track of all the one digit numbers, we can use a list of booleans initialized to be False.

```
0 1 2 3 4 5 6 7 8 9
F F F F F F F F F F
```

Now if we want to add the number 1, the set simply marks the boolean at 1 to be True.

```
0 1 2 3 4 5 6 7 8 9
F T F F F F F F F F
```

And when we want to see if digit 2 is in the set, we can check if index 2 is True or not.

But what if we want to keep track of all the numbers below 2^32?
Obviously we can't have a list of length 2^32. What we can do, though,
is modding the number with the length of the list. Let's start with a
list of length 10 first, and `_`

means nothing is there:

```
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ _ _ _ _ _
```

Now if we want to add number 5, the set simply computes `5%10 = 5`

, and
decides to put number 5 at index 5.

```
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ 5 _ _ _ _
```

Now if we want to add number 321, the set simply computes `321%10 = 1`

, and
decides to put number 321 at index 1.

```
0 1 2 3 4 5 6 7 8 9
_ 321 _ _ _ 5 _ _ _ _
```

And when we want to see if number 321 is in the set, the set will
compute `321%10 = 1`

and decides that the number we are looking for might
be at index 1, then go to index 1 to see that it's actually there.

However, since we are not only dealing with numbers in Python, we need
a way to represent these items with number, and that process is called
hashing. In Python, you
can check the hash values for all the objects by calling `hash`

on them:

```
>>> hash(1)
1
>>> hash('abc')
6015317455955025372
```

More interesting details, like dealing collision and keeping efficiency when there are many collisions can be found here, and will be throughly covered in CS61B.

We've also prepared a quick script for you to witness the difference. You can run it by typing

python3 timing.py

in the terminal in your lab folder.