CS61A Lab 10: Sets

October 24-26, 2012

We will be using the set implementations introduced in lecture. The code for those implementations can be found here, or you can copy the code from your terminal:

cp ~cs61a/public_html/fa12/labs/lab10/sets.py .

You can also copy the starter code for the lab questions here, or from the terminal:

cp ~cs61a/public_html/fa12/labs/lab10/lab10.py .

Don't forget the '.' at the end!

Built-in Sets

What would Python print?

>>> s = {1, 2, 3, 2, 1}
>>> s
______
>>> s.add(4)  # adjunction
>>> s
______
>>> s.add(2)
>>> s
______
>>> s.remove(3)
>>> s
______
>>> s.remove(100)
______

>>> 3 in s
______
>>> 7 not in s
______
>>> len(s)
______
>>> s[1]
______
>>> {1, 2, 4} == {1, 2, 4, 2, 4}
______

>>> s = {1, 2, 3}
>>> r = {1, 2, 4}
>>> s.union(r)
______
>>> s
______
>>> r
______
>>> s | r  # not the letter 'l' -- the pipe character
           # elements in s or r 
______

>>> s.intersection(r)
______
>>> s & r  # elements in s and r 
______

If you're interested about Python built-in sets, take a look here

Set Implementations

In lecture, we learned three different ways to implement our own sets:

  1. unordered Rlists
  2. ordered Rlist
  3. binary Trees

Sets as Unordered Rlist

1) Using the functions defined in lecture, implement a function called is_subset(set1, set2), which returns True if set2 is a subset of set1 (i.e. all the elements in set2 are contained in set1).

def is_subset(set1, set2):
    """Returns True if set2 is a subset of set1.

    >>> set1 = Rlist(1, Rlist(4, Rlist(2, Rlist(5))))
    >>> set2 = Rlist(1, Rlist(2, Rlist(4)))
    >>> is_subset(set1, set2)
    True
    >>> set3 = Rlist(1, Rlist(2, Rlist(3, Rlist(4))))
    >>> is_subset(set1, set3)
    False
    """

2) Often, it is useful to convert a normal sequence (in our case, an Rlist) into a set. Implement a function rlist_to_set, which takes an unordered Rlist and returns a new set (represented as an unordered Rlist).

def rlist_to_set(rlist):
    """Returns a set that contains unique elements in the given rlist.

    >>> r = Rlist(1, Rlist(2, Rlist(2, Rlist(1))))
    >>> rlist_to_set(r)
    Rlist(2, Rlist(1))
    """

3) Implement a function rlist_to_set_mut, which does the same thing as the function in question 1, but instead mutates the original rlist. In addition, rlist_to_set_mut should NOT return anything.

def rlist_to_set_mut(rlist):
    """Mutates the original Rlist by removing duplicate items.

    >>> r = Rlist(1, Rlist(2, Rlist(2, Rlist(1))))
    >>> rlist_to_set_mut(r)
    >>> r
    Rlist(2, Rlist(1))
    """

4) What is the run time of rlist_to_set (in terms of big Theta notation)?

Sets as Ordered Rlist

5) Implement a function exclusive_or that takes two sets (implemented as ordered Rlists) and returns a set containing elements that appear in either sets, but not elements that appear in both sets.

def exclusive_or(set1, set2):
    """Returns a set containing elements in set1 or set2, but not
    elements that appear in both.

    >>> set1 = Rlist(1, Rlist(2, Rlist(3)))
    >>> set2 = Rlist(1, Rlist(3, Rlist(4)))
    >>> exclusive_or(set1, set2)
    Rlist(2, Rlist(4))
    """

6) What is the runtime of the exclusive_or algorithm (in terms of a big Theta notation)?

Sets as Binary Search Trees

7) Sets represented as binary search trees are usually faster than sets represented as Rlists. This is because the average search time for a binary tree is Θ(log(n)), whereas the search time for an Rlist is Θ(n).

However, the search time for a binary tree is not always in Θ(log(n)). To understand why this is the case, complete the following two problems:

  1. Propose a sequence of 7 adjunctions (i.e. adds to a tree) that will result in a Θ(log(n)) search time.
  2. Propose a sequence of 7 adjunctions (i.e. adds to a tree) that will result in a Θ(n) search time.

Hint: you should consider what each adjunction does to the structure of the tree. What is the best case scenario for a binary search tree? What is the worst case scenario?

8) Implement a function called max_set that finds the largest element in a set that is implemented as a binary tree.

def max_set(s):
    """Finds the largest element in a set that is implemented as a 
    binary search tree.

    >>> s = Tree(4, 
    ...            Tree(2, 
    ...                   Tree(1)), 
    ...            Tree(5, 
    ...                   None,
    ...                   Tree(6)))
    >>> max_set(s)
    6
    """