Lab 5: Trees, Data Abstraction

lab05.zip

Due by 11:59pm on Wednesday, September 27.

Starter Files

Download lab05.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the Ok autograder.

Required Questions

Getting Started Videos

These videos may provide some helpful direction for tackling the coding problems on this assignment.

To see these videos, you should be logged into your berkeley.edu email.

YouTube link

Dictionaries

Consult the drop-down if you need a refresher on dictionaries. It's okay to skip directly to the questions and refer back here should you get stuck.

Lists are ordered collections that allow us to access values by index. Another type of collection, called a dictionary, allows us to store and access values that correspond to given keys.

For example, the following code constructs a dictionary with three entries and assigns it to the name artists. Note that you could write the same code all on one line.

>>> artists = {
...    'Ariana Grande': 'No Tears Left To Cry',
...    'Marshmello': 'FRIENDS',
...    'Migos': ['Stir Fry', 'Walk It Talk It']
... }

We say that a dictionary is an unordered collection of key-value pairs. Each key-value pair is separated by a comma. For each pair, the key appears to the left of the colon and the value appears to the right of the colon. To retrieve values from the dictionary, use bracket notation with the corresponding key:

>>> artists['Ariana Grande']
'No Tears Left To Cry'
>>> songs = artists['Migos']
>>> songs[0]
'Stir Fry'

Keys/values in dictionaries do not all have to be the same type:

>>> d = {1: "hello", "hi": 3, (3, 5): 2}
>>> d[(3,5)]
2

However, keys can only be immutable types (e.g. strings, numbers, tuples). For example, if a list is used as a key, an error is raised:

>>> d = {[1]: "test"}
TypeError

Of course, keys must be unique in a dictionary. While Python does not raise an error when you do this, duplicate keys make it impossible to access parts of your data:

>>> d = {"fun": 1, "fun": 2}
>>> d["fun"] # Is 1 or 2 output?

You can check whether a dictionary has a key using the in operator, like we can do with lists.

>>> 'Migos' in artists
True
>>> 'Wolves' in artists  # Not True for values!
False

Finally, here are some useful dictionary functions:

dict.keys() will return a sequence of keys.

>>> list(artists.keys()) # We use list() to turn the sequence into a list
['Ariana Grande', 'Marshmello', 'Migos', 'Charlie Puth']

dict.values() will return a sequence of values.

>>> list(artists.values())
['No Tears Left To Cry', 'Wolves', ['Stir Fry', 'Walk It Talk It'], 'Attention']

dict.items() will return a sequence of key-value tuples.

>>> list(artists.items())
[('Ariana Grande', 'No Tears Left To Cry'),
 ('Marshmello', 'Wolves'),
 ('Migos', ['Stir Fry', 'Walk It Talk It']),
 ('Charlie Puth', 'Attention')]

Q1: Dictionaries

Use Ok to test your knowledge with the following "What Would Python Display?" questions:
python3 ok -q pokemon -u

>>> pokemon = {'pikachu': 25, 'dragonair': 148, 'mew': 151}
>>> pokemon['pikachu']
______
25
>>> len(pokemon)
______
3
>>> 'mewtwo' in pokemon
______
False
>>> 'pikachu' in pokemon
______
True
>>> 25 in pokemon
______
False
>>> 148 in pokemon.values()
______
True
>>> 151 in pokemon.keys()
______
False
>>> 'mew' in pokemon.keys()
______
True

Trees

Consult the drop-down if you need a refresher on trees. It's okay to skip directly to the questions and refer back here should you get stuck.

A tree is a data structure that represents a hierarchy of information. A file system is a good example of a tree structure. For example, within your cs61a folder, you have folders separating your projects, lab assignments, and homework. The next level is folders that separate different assignments, hw01, lab01, hog, etc., and inside those are the files themselves, including the starter files and ok. Below is an incomplete diagram of what your cs61a directory might look like.

cs61a_tree

As you can see, unlike trees in nature, the tree abstract data type is drawn with the root at the top and the leaves at the bottom.

Tree ADT

Our tree abstract data type consists of a root and a list of its branches. To create a tree and access its root value and branches, use the following interface of constructor and selectors:

Constructor
- tree(label, branches=[]): creates a tree object with the given label value at its root node and list of branches. Notice that the second argument to this constructor, branches, is optional — if you want to make a tree with no branches, leave this argument empty.
Selectors
- label(tree): returns the value in the root node of tree.
- branches(tree): returns the list of branches of the given tree (each of which is also a tree)
Convenience function
- is_leaf(tree): returns True if tree's list of branches is empty, and False otherwise.

For example, the tree generated by

number_tree = tree(1,
         [tree(2),
          tree(3,
               [tree(4),
                tree(5)]),
          tree(6,
               [tree(7)])])

would look like this:

To extract the number 3 from this tree, which is the label of the root of its second branch, we would do this:

label(branches(number_tree)[1])

The lab file contains the implementation of the tree ADT, if you would like to view it. However, as with any data abstraction, we should only concern ourselves with what our functions do rather than their specific implementation! You do not need to look at the implementation of the tree ADT for this problem, and your solutions should not depend on the specifics of our implementation beyond what is specified in the interface above.

Q2: Finding Berries!

The squirrels on campus need your help! There are a lot of trees on campus and the squirrels would like to know which ones contain berries. Define the function berry_finder, which takes in a tree and returns True if the tree contains a node with the value 'berry' and False otherwise.

Hint: To iterate through each of the branches of a particular tree, you can consider using a for loop to get each branch.

def berry_finder(t):
    """Returns True if t contains a node with the value 'berry' and 
    False otherwise.

    >>> scrat = tree('berry')
    >>> berry_finder(scrat)
    True
    >>> sproul = tree('roots', [tree('branch1', [tree('leaf'), tree('berry')]), tree('branch2')])
    >>> berry_finder(sproul)
    True
    >>> numbers = tree(1, [tree(2), tree(3, [tree(4), tree(5)]), tree(6, [tree(7)])])
    >>> berry_finder(numbers)
    False
    >>> t = tree(1, [tree('berry',[tree('not berry')])])
    >>> berry_finder(t)
    True
    """
    "*** YOUR CODE HERE ***"

Use Ok to test your code:

python3 ok -q berry_finder

Q3: Replace Loki at Leaf

Define replace_loki_at_leaf, which takes a tree t and a value lokis_replacement. replace_loki_at_leaf returns a new tree that's the same as t except that every leaf label equal to "loki" has been replaced with lokis_replacement.

If you want to learn about the Norse mythology referenced in this problem, you can read about it here.

  def replace_loki_at_leaf(t, lokis_replacement):
      """Returns a new tree where every leaf value equal to "loki" has
      been replaced with lokis_replacement.

      >>> yggdrasil = tree('odin',
      ...                  [tree('balder',
      ...                        [tree('loki'),
      ...                         tree('freya')]),
      ...                   tree('frigg',
      ...                        [tree('loki')]),
      ...                   tree('loki',
      ...                        [tree('sif'),
      ...                         tree('loki')]),
      ...                   tree('loki')])
      >>> laerad = copy_tree(yggdrasil) # copy yggdrasil for testing purposes
      >>> print_tree(replace_loki_at_leaf(yggdrasil, 'freya'))
      odin
        balder
          freya
          freya
        frigg
          freya
        loki
          sif
          freya
        freya
      >>> laerad == yggdrasil # Make sure original tree is unmodified
      True
      """
      "*** YOUR CODE HERE ***"

Use Ok to test your code:

python3 ok -q replace_loki_at_leaf

Data Abstraction

Consult the drop-down if you need a refresher on data abstraction. It's okay to skip directly to the questions and refer back here should you get stuck.

So far, we have encountered several data types: numbers, strings, booleans, lists, tuples, and dictionaries. But what if we want to represent more complicated things — such as the objects we encounter in real life?

Enter the idea of data abstraction, which allows us to build and use objects in code that represent a compound value.

An abstract data type consists of two types of functions:

Constructors: functions that build instances of the abstract data type.
Selectors: functions that retrieve information from instances of the ADT.

An integral part of data abstraction is the concept of abstraction. Users of the abstract data type have access to an interface of defined actions (i.e. constructors and selectors) they can take with the ADT. The user of the ADT does not need to know how the constructors and selectors are implemented. The nature of abstraction allows whoever uses them to assume that the functions have been written correctly and work as described.

This is called the abstraction barrier!

In fact, interacting within an ADT outside of its interface of constructors and selectors is called "violating the abstraction barrier" and is strongly discouraged (even when it doesn't produce an error).

In this way, data abstraction mimics how we think about the world. For example, a car is a complicated machine, but it has a simple interface by which we interact with it: the wheel, the gas pedal, the gear shift. If you want to drive a car, you don't need to know how the engine was built or what kind of material the tires are made of. It's easy to drive a car because of the suppression of complexity that abstraction provides us.

For the car, the principle of the abstraction barrier also applies. If we want to start the car, we should use the provided interface of turning the ignition key, rather than getting out, popping the hood, and messing around with the engine. While you could likely do that if you had enough knowledge, it is unnecessarily costly and highly dependent on how exactly the car was constructed. What are you going to do if the car is electric?

When you're writing code using ADTs that have been provided for you, make sure that you're never assuming anything about the ADT other than that it can be constructed with its constructor and its information can be accessed by selectors. Your code should never depend on the specific implementation of the ADT under the hood.

City ADT

Say we have an abstract data type for cities. A city has a name, a latitude coordinate, and a longitude coordinate.

Our data abstraction has one constructor:

make_city(name, lat, lon): Creates a city object with the given name, latitude, and longitude.

We also have the following selectors in order to get the information for each city:

get_name(city): Returns the city's name
get_lat(city): Returns the city's latitude
get_lon(city): Returns the city's longitude

Here is how we would use the constructor and selectors to create cities and extract their information:

>>> berkeley = make_city('Berkeley', 122, 37)
>>> get_name(berkeley)
'Berkeley'
>>> get_lat(berkeley)
122
>>> new_york = make_city('New York City', 74, 40)
>>> get_lon(new_york)
40

All of the selector and constructor functions can be found in the lab file if you are curious to see how they are implemented. However, the point of data abstraction is that — as users of the city ADT — we do not need to know how an the ADT is implemented. You do not need to look at the implementation of the city ADT for this problem, and your solutions should not depend on the specifics of our implementation beyond what is specified in the interface above.

Q4: Distance

We will now implement the function distance, which computes the distance between two city objects. Recall that the distance between two coordinate pairs (x1, y1) and (x2, y2) can be found by calculating the sqrt of (x1 - x2)**2 + (y1 - y2)**2. We have already imported sqrt for your convenience. Use the latitude and longitude of a city as its coordinates; you'll need to use the selectors to access this info!

from math import sqrt
def distance(city_a, city_b):
    """
    >>> city_a = make_city('city_a', 0, 1)
    >>> city_b = make_city('city_b', 0, 2)
    >>> distance(city_a, city_b)
    1.0
    >>> city_c = make_city('city_c', 6.5, 12)
    >>> city_d = make_city('city_d', 2.5, 15)
    >>> distance(city_c, city_d)
    5.0
    """
    "*** YOUR CODE HERE ***"

Use Ok to test your code:

python3 ok -q distance

Q5: Closer City

Next, implement closer_city, a function that takes a latitude, longitude, and two cities, and returns the name of the city that is closer to the provided latitude and longitude.

You may only use the selectors and constructors introduced above and the distance function you just defined for this question.

Hint: How can you use your distance function to find the distance between the given location and each of the given cities?

def closer_city(lat, lon, city_a, city_b):
    """
    Returns the name of either city_a or city_b, whichever is closest to
    coordinate (lat, lon). If the two cities are the same distance away
    from the coordinate, consider city_b to be the closer city.

    >>> berkeley = make_city('Berkeley', 37.87, 112.26)
    >>> stanford = make_city('Stanford', 34.05, 118.25)
    >>> closer_city(38.33, 121.44, berkeley, stanford)
    'Stanford'
    >>> bucharest = make_city('Bucharest', 44.43, 26.10)
    >>> vienna = make_city('Vienna', 48.20, 16.37)
    >>> closer_city(41.29, 174.78, bucharest, vienna)
    'Bucharest'
    """
    "*** YOUR CODE HERE ***"

Use Ok to test your code:

python3 ok -q closer_city

Q6: Don't violate the abstraction barrier!

Note: this question has no code-writing component (if you implemented the previous two questions correctly).

When writing functions that use a data abstraction, we should use the constructor(s) and selector(s) whenever possible instead of assuming the data abstraction's implementation. Relying on a data abstraction's underlying implementation is known as violating the abstraction barrier, and we never want to do this!

It's possible that you passed the doctests for the previous questions even if you violated the abstraction barrier. To check whether or not you did so, run the following command:

Use Ok to test your code:

python3 ok -q check_city_abstraction

The check_city_abstraction function exists only for the doctest, which swaps out the implementations of the original abstraction with something else, runs the tests from the previous two parts, then restores the original abstraction.

The nature of the abstraction barrier guarantees that changing the implementation of an data abstraction shouldn't affect the functionality of any programs that use that data abstraction, as long as the constructors and selectors were used properly.

If you passed the Ok tests for the previous questions but not this one, the fix is simple! Just replace any code that violates the abstraction barrier with the appropriate constructor or selector.

Make sure that your functions pass the tests with both the first and the second implementations of the data abstraction and that you understand why they should work for both before moving on.

Check Your Score Locally

You can locally check your score on each question of this assignment by running

python3 ok --score

This does NOT submit the assignment! When you are satisfied with your score, submit the assignment to Gradescope to receive credit for it.

Submit

Make sure to submit this assignment by uploading any files you've edited to the appropriate Gradescope assignment. For a refresher on how to do this, refer to Lab 00.

Optional Questions

These questions are optional, but you must complete them in order to be checked off before the end of the lab period. They are also useful practice!

Trees

Q7: Recursion on Trees

Define a function dejavu, which takes in a tree of numbers t and a number n. It returns True if there is a path from the root to a leaf such that the sum of the numbers along that path is n and False otherwise.

def dejavu(t, n):
    """
    >>> my_tree = tree(2, [tree(3, [tree(5), tree(7)]), tree(4)])
    >>> dejavu(my_tree, 12) # 2 -> 3 -> 7
    True
    >>> dejavu(my_tree, 5) # Sums of partial paths like 2 -> 3 don ’t count
    False
    """
    "*** YOUR CODE HERE ***"

Use Ok to test your code:

python3 ok -q dejavu

Q8: Hailstone Tree

We can represent the hailstone sequence as a tree in the figure below, showing the route different numbers take to reach 1. Remember that a hailstone sequence starts with a number n, continuing to n/2 if n is even or 3n+1 if n is odd, ending with 1. Write a function hailstone_tree(n, h) which generates a tree of height h, containing hailstone numbers that will reach n.

Hint: A node of a hailstone tree will always have at least one, and at most two branches (which are also hailstone trees). Under what conditions do you add the second branch?

def hailstone_tree(n, h):
    """Generates a tree of hailstone numbers that will reach N, with height H.
    >>> print_tree(hailstone_tree(1, 0))
    1
    >>> print_tree(hailstone_tree(1, 4))
    1
      2
        4
          8
            16
    >>> print_tree(hailstone_tree(8, 3))
    8
      16
        32
          64
        5
          10
    """
    if _________________________________:
        return _________________________________
    branches = _________________________________
    if ___________ and ___________ and ___________:
        branches += _________________________________
    return tree(n, branches)

Use Ok to test your code:

python3 ok -q hailstone_tree