61A Homework 6

Due by 11:59pm on Thursday, 7/17

Readings: You might find the following references useful:

This is a review of all of the material you have seen so far, so there are no new readings. If you haven't already, you should do the readings for hierarchical data structures:

Section 2.3.4

Submission: See the online submission instructions. We have provided a hw6.py starter file for the questions below.

Required Files
Introduction
Calculator Language

Question 1

Tokenizing
Parsing

Question 2
Question 3

Evaluation

Question 4
Question 5

Required Files

In addition to the hw6.py starter file, you will also need the ucb.py file, although you will only need to edit and submit hw6.py.

Introduction

We're going to build a calculator! It will use the same syntax as Scheme, the language that we will learn in a few weeks, but for now it will just be able to do basic arithmetic.

Once you are done, you can interact with the calculator language:

python3 hw6.py
minicalc> 1
1
minicalc> (+ 1 3)
4

To exit out of the Calculator language, type Ctrl-C or Ctrl-D.

The goal of this homework is to review the material so far and introduce you to the idea of interpretation, that is, a program which can understand other programs. This is of course not a very useful program, because we could just use Python itself as a calculator. However, it serves as a nice toy example of an interpreter and introduces you to the ideas you will need in order to do project 4 (in which we will build a interpreter for Scheme).

The input to Calculator is a program, in the form of a string. The first step is to take this string and break it up into tokens - this is done by tokenize, which is written for you.

Then we parse those tokens to create an AST (abstract syntax tree), which we represent using a deep linked list. This is the job of read_exp and read_until_close. The purpose of this stage is to expose the structure of the parentheses. For example, given the tokens ['(', '+', '(', '-', '2', ')', '3', ')'], read_exp will make sure to put the ( - 2 ) as a smaller linked list - this is how the structure of the parentheses is shown in the AST.

Then, we evaluate the AST in order to get the answer, which is then printed. This is done by calc_eval, which uses calc_apply.

Calculator Language

The syntax of the Calculator language is the same as for the Scheme programming language. The key design principle for Scheme is simplicity and elegance. While Python has lots of syntax for many different cases, Scheme has just one form of syntax - the function call notation.

In Python, the function call notation is:

function(arg1, arg2, ...)

The only difference in Scheme is that the parenthesis comes earlier and there are no commas:

(function arg1 arg2 ...)

Most math expressions you've seen are in what's called infix notation, where the operators appear between the numbers they act on. These expressions look like 3 + 4 or 27 / 3.

However, Scheme and Calculator both use prefix notation, where the operators come before the expressions they act upon. Examples are (+ 3 4) and (/ 27 3).

Of course, these expressions can be nested inside of each other. The following expression uses infix notation and has a nested expression:

(12 / 3) * 5

In the programs you'll be working with in this homework, the same expression would look like:

(* (/ 12 3) 5)

Question 1

Write the function scheme_exp which returns a string containing the prefix notation for the expression 48/(2*(9+3)) if n is 1, and the prefix notation for (48/2)*(9+3) otherwise.

 def scheme_exp(n):
     """If n == 1, returns a Scheme program that computes 48/(2*(9+3))
     Otherwise, returns a Scheme program that computes (48/2)*(9+3)

     NOTE: These doctests use eval_string, which will not work until
     you have completed the entire homework.

     >>> first = scheme_exp(1)
     >>> assert '/' in first and '*' in first and '+' in first
     >>> assert '48' in first and '2' in first
     >>> assert '9' in first and '3' in first
     >>> eval_string(first)
     2.0
     >>> second = scheme_exp(2)
     >>> assert '/' in second and '*' in second and '+' in second
     >>> assert '48' in second and '2' in second
     >>> assert '9' in second and '3' in second
     >>> eval_string(second)
     288.0
     """
     if n == 1:
         "*** YOUR CODE HERE ***"
     else:
         "*** YOUR CODE HERE ***"

Tokenizing

The first step is to take the program (which will be a simple string), and to tokenize it. This just means to identify the various "things" in the program. In Calculator, the "things" are parentheses (such as '(' and ')'), numbers (such as '123' and '74.2'), and operators (such as '+' and '*').

We will use a tokenize function, which will take a program (represented as a string) and return a list of tokens. Since this is very similar to extract_words from the Trends project, we have filled it out for you:

def tokenize(s):
    """Splits the provided string into a list of tokens.

    >>> tokenize('(* (+ 12 3) 5)')
    ['(', '*', '(', '+', '12', '3', ')', '5', ')']
    """
    s = s.replace('(', ' ( ')
    s = s.replace(')', ' ) ')
    return s.split()

Parsing

The next step is to take this list of tokens and turn it into a structure that is easier to understand. This structure is called an Abstract Syntax Tree (AST). In particular, in Calculator, the parentheses show us the structure of the program. We want to take in the list of tokens, and produce a deep linked list that shows the structure of the program. Each subexpression in the Calculator expression should correspond to a smaller linked list in the deep linked list.

Question 2

First, we will write a helper method, that will convert strings to numbers when appropriate. You will need to understand exceptions in order to do this part. The documentation may be helpful. In particular, section 8.3 covers how to handle exceptions, which you will need to do for this question.

def numberize(atomic_exp):
    """Converts atomic_exp to a number if possible, otherwise leaves
    it alone.  atomic_exp is guaranteed to be a single token, and will
    not be a parenthesis.

    >>> numberize('123')
    123
    >>> numberize('3.14159')
    3.14159
    >>> numberize('+')
    '+'
    """
    # Consider doing int(atomic_exp) and float(atomic_exp).
    # What does int('123') do?  What does int('+') do?
    "*** YOUR CODE HERE ***"

Question 3

Write read_exp, a function which parses the program for our calculator as described above.

read_exp takes in a list of tokens, and returns the first calculator expression encountered along with the unevaluated tokens.

The first calculator expression could be a combination expression, an operator, or simply a number. If the first expression is a combination expression (that is, the first token is '('), then it returns a (possibly deep) linked list representing that expression. If the first token represents a number, that is a single Calculator expression and so it just returns that number. Note that tokens such as '2' are converted into numbers in the returned AST. If the first token is an operator, it just returns that operator.

read_exp also returns the rest of the tokens not included in the first calculator expression (see doctests for examples). This will be helpful for the recursion.

To accomplish this task, read_exp mutually recurses with the helper function read_until_close. This means that when implementing read_exp, you may want to use read_until_close, and when you implement read_until_close, you may want to use read_exp, and so you should not start writing code until you understand what both functions are supposed to do. read_until_close also takes in a list of tokens, but only reads up to and includes the first mismatched closed parenthesis. read_until_close returns a linked list of all the expressions read up to that point, as well as the rest of the tokens not read (the tokens following the first unmatched parentheses).

You've already been given some code in the beginning of read_exp. Read this code carefully to understand what constitutes an illegal input to read_exp. Notice the keyword raise - this is how we create our own errors. See the documentation for more details.

Make sure you read and understand all the doctests for both functions before you start writing code.

def read_exp(tokens):
    """Given a list of tokens, returns the first calculator expression
    (either a number, operator, or combination), and the rest of the
    expression that has not been turned into an expression.

    >>> exp, unevaled = read_exp(['(', '+', '2', '3', '6', ')'])
    >>> print_linked_list(exp)
    < '+' 2 3 6 >
    >>> unevaled
    []
    >>> exp, unevaled = read_exp(['2', '3'])
    >>> exp
    2
    >>> unevaled
    ['3']
    >>> exp, unevaled = read_exp(['(', '/', '6', '2', ')', '(', '-', '2', ')'])
    >>> print_linked_list(exp)
    < '/' 6 2 >
    >>> unevaled
    ['(', '-', '2', ')']
    >>> exp, unevaled = read_exp(['(','*','4','(','-','12','8',')',')','3','2'])
    >>> print_linked_list(exp)
    < '*' 4 < '-' 12 8 > >
    >>> unevaled
    ['3', '2']
    """
    if tokens == []:
        raise SyntaxError('unexpected end of input')
    token, rest = tokens[0], tokens[1:]
    if token == ')':
        raise SyntaxError('unexpected )')
    elif token == '(':
        if rest == []:
            raise SyntaxError('mismatched parentheses')
        elif rest[0] == ')':
            raise SyntaxError('empty combination')
        "*** YOUR CODE HERE ***"
    else:
        "*** YOUR CODE HERE ***"

def read_until_close(tokens):
    """Reads up to and including the first mismatched close
    parenthesis, then forms a combination out all of the values read
    up to that point.

    >>> exp, unevaled = read_until_close(['+', '2', '3', ')', '4', ')'])
    >>> print_linked_list(exp)
    < '+' 2 3 >
    >>> unevaled
    ['4', ')']
    >>> exp, unevaled = read_until_close(['+', '(', '-', '2', ')', '3', ')', ')'])
    >>> print_linked_list(exp)
    < '+' < '-' 2 > 3 >
    >>> unevaled
    [')']
    """
    "*** YOUR CODE HERE ***"

Evaluation

Now that we have our AST, we can turn to the problem of evaluating the AST, that is, we figure out what the answer to the program is.

First, we need to understand calc_apply. calc_apply takes an operator and a linked list of numbers. Note that the linked list is not a deep linked list. calc_apply applies the operator to those numbers. It checks what the operator is, and based on that, it calls a function that can do that specific operation.

def calc_apply(op, args):
    """Applies an operator to a linked list of arguments.

    >>> calc_apply('+', link(12, link(34, empty)))
    46
    >>> calc_apply('-', link(10, link(2, link(3, link(4, empty)))))
    1
    >>> calc_apply('-', link(3, empty))
    -3
    >>> calc_apply('*', empty)
    1
    """
    if op == '+':
        return do_addition(args)
    elif op == '*':
        return do_multiplication(args)
    elif op == '-':
        return do_subtraction(args)
    elif op == '/':
        return do_division(args)
    else:
        raise NameError('unknown operator {}'.format(op))

Question 4

Now, we need to implement the functions that do specific operations. Addition, multiplication, and subtraction have been implemented for you. They use the reduce_linked_list function which is defined near the end of the starter file. Your job is to define do_division, which does the division operation.

When an operation op is given more than two arguments, it is left associative. For example,

(op a b c d)

would do that same thing as

(op (op (op a b) c) d)

So, (/ 60 2 3 5) is equivalent to (/ (/ (/ 60 2) 3) 5) which is equivalent to (/ 60 (* 2 3 5))

def do_division(args):
    """Applies the division operation to args.
    args must have a length of at least 1, as in do_subtraction.

    >>> do_division(link(4, empty))
    0.25
    >>> do_division(link(7, link(2, empty)))
    3.5
    >>> do_division(link(60, link(2, link(3, link(5, empty)))))
    2.0
    """
    "*** YOUR CODE HERE ***"

Question 5

Now that we've written calc_apply, let's figure out how to use calc_apply. We'll call calc_apply inside of calc_eval. calc_eval takes in a calculator expression. A calculator expression can either be a number or a linked list that can contain multiple items where the first item is a operator and rest are more calculator expressions. Note that this linked list is a deep linked list.

Remember that we have two different types of calculator expressions - numbers and function calls. For which type would we want to use calc_apply? Don't forget that calc_apply takes in a linked list of numbers, but the subexpressions of exp can themselves be linked lists - how could you turn these subexpressions into numbers? (Hint: Trust the recursion! Domain and range will be helpful here.) You may find the map_linked_list function helpful. It is defined near the end of the starter file.

The other type of expressions is just plain numbers. What should we do for these expressions?

def calc_eval(exp):
    """Evaluates a calculator expression.

    >>> calc_eval(5)
    5
    >>> calc_eval(link('+', link(12, link(3, empty))))
    15
    >>> subexp1 = link('*', link(3, link(4, empty)))
    >>> subexp2 = link('-', link(12, link(9, empty)))
    >>> exp = link('+', link(subexp1, link(subexp2, empty)))
    >>> print_linked_list(exp)
    < '+' < '*' 3 4 > < '-' 12 9 > >
    >>> calc_eval(exp)
    15
    """
    "*** YOUR CODE HERE ***"