Due by 11:59pm on Thursday, 7/17
Readings: You might find the following references useful:
This is a review of all of the material you have seen so far, so there are no new readings. If you haven't already, you should do the readings for hierarchical data structures:
Submission: See the online submission instructions. We have provided a hw6.py starter file for the questions below.
In addition to the hw6.py starter file, you will also need the ucb.py file, although you will only need to edit and submit hw6.py.
We're going to build a calculator! It will use the same syntax as Scheme, the language that we will learn in a few weeks, but for now it will just be able to do basic arithmetic.
Once you are done, you can interact with the calculator language:
python3 hw6.py
minicalc> 1
1
minicalc> (+ 1 3)
4
To exit out of the Calculator language, type Ctrl-C or Ctrl-D.
The goal of this homework is to review the material so far and introduce you to the idea of interpretation, that is, a program which can understand other programs. This is of course not a very useful program, because we could just use Python itself as a calculator. However, it serves as a nice toy example of an interpreter and introduces you to the ideas you will need in order to do project 4 (in which we will build a interpreter for Scheme).
The input to Calculator is a program, in the form of a string. The
first step is to take this string and break it up into tokens -
this is done by tokenize
, which is written for you.
Then we parse those tokens to create an AST (abstract syntax
tree), which we represent using a deep linked list. This is the job of
read_exp
and read_until_close
. The purpose of this stage is to
expose the structure of the parentheses. For example, given the
tokens ['(', '+', '(', '-', '2', ')', '3', ')']
, read_exp
will
make sure to put the ( - 2 )
as a smaller linked list - this is how
the structure of the parentheses is shown in the AST.
Then, we evaluate the AST in order to get the answer, which is
then printed. This is done by calc_eval
, which uses calc_apply
.
The syntax of the Calculator language is the same as for the Scheme programming language. The key design principle for Scheme is simplicity and elegance. While Python has lots of syntax for many different cases, Scheme has just one form of syntax - the function call notation.
In Python, the function call notation is:
function(arg1, arg2, ...)
The only difference in Scheme is that the parenthesis comes earlier and there are no commas:
(function arg1 arg2 ...)
Most math expressions you've seen are in what's called infix notation,
where the operators appear between the numbers they act on. These
expressions look like 3 + 4
or 27 / 3
.
However, Scheme and Calculator both use prefix notation, where the
operators come before the expressions they act upon. Examples are (+
3 4)
and (/ 27 3)
.
Of course, these expressions can be nested inside of each other. The following expression uses infix notation and has a nested expression:
(12 / 3) * 5
In the programs you'll be working with in this homework, the same expression would look like:
(* (/ 12 3) 5)
scheme_exp
which returns a string containing the
prefix notation for the expression 48/(2*(9+3))
if n is 1, and the
prefix notation for (48/2)*(9+3)
otherwise.
def scheme_exp(n):
"""If n == 1, returns a Scheme program that computes 48/(2*(9+3))
Otherwise, returns a Scheme program that computes (48/2)*(9+3)
NOTE: These doctests use eval_string, which will not work until
you have completed the entire homework.
>>> first = scheme_exp(1)
>>> assert '/' in first and '*' in first and '+' in first
>>> assert '48' in first and '2' in first
>>> assert '9' in first and '3' in first
>>> eval_string(first)
2.0
>>> second = scheme_exp(2)
>>> assert '/' in second and '*' in second and '+' in second
>>> assert '48' in second and '2' in second
>>> assert '9' in second and '3' in second
>>> eval_string(second)
288.0
"""
if n == 1:
return "(/ 48 (* 2 (+ 9 3)))"
else:
return "(* (/ 48 2) (+ 9 3))"
The first step is to take the program (which will be a simple string),
and to tokenize it. This just means to identify the various
"things" in the program. In Calculator, the "things" are parentheses
(such as '('
and ')'
), numbers (such as '123'
and '74.2'
), and
operators (such as '+'
and '*'
).
We will use a tokenize
function, which will take a program
(represented as a string) and return a list of tokens. Since this is
very similar to extract_words
from the Trends project, we have
filled it out for you:
def tokenize(s):
"""Splits the provided string into a list of tokens.
>>> tokenize('(* (+ 12 3) 5)')
['(', '*', '(', '+', '12', '3', ')', '5', ')']
"""
s = s.replace('(', ' ( ')
s = s.replace(')', ' ) ')
return s.split()
The next step is to take this list of tokens and turn it into a structure that is easier to understand. This structure is called an Abstract Syntax Tree (AST). In particular, in Calculator, the parentheses show us the structure of the program. We want to take in the list of tokens, and produce a deep linked list that shows the structure of the program. Each subexpression in the Calculator expression should correspond to a smaller linked list in the deep linked list.
def numberize(atomic_exp):
"""Converts atomic_exp to a number if possible, otherwise leaves
it alone. atomic_exp is guaranteed to be a single token, and will
not be a parenthesis.
>>> numberize('123')
123
>>> numberize('3.14159')
3.14159
>>> numberize('+')
'+'
"""
# Consider doing int(atomic_exp) and float(atomic_exp).
# What does int('123') do? What does int('+') do?
try:
return int(atomic_exp)
except ValueError:
try:
return float(atomic_exp)
except ValueError:
return atomic_exp
Write read_exp
, a function which parses the program for our
calculator as described above.
read_exp
takes in a list of tokens, and returns the first calculator
expression encountered along with the unevaluated tokens.
The first calculator expression could be a combination expression, an
operator, or simply a number. If the first expression is a
combination expression (that is, the first token is '('
), then it
returns a (possibly deep) linked list representing that expression.
If the first token represents a number, that is a single Calculator
expression and so it just returns that number. Note that tokens such
as '2'
are converted into numbers in the returned AST. If the first
token is an operator, it just returns that operator.
read_exp
also returns the rest of the tokens not included in the
first calculator expression (see doctests for examples). This will be
helpful for the recursion.
To accomplish this task, read_exp
mutually recurses with the
helper function read_until_close
. This means that when
implementing read_exp
, you may want to use read_until_close
, and
when you implement read_until_close
, you may want to use read_exp
,
and so you should not start writing code until you understand what
both functions are supposed to do. read_until_close
also takes
in a list of tokens, but only reads up to and includes the first
mismatched closed parenthesis. read_until_close
returns a linked
list of all the expressions read up to that point, as well as the rest
of the tokens not read (the tokens following the first unmatched
parentheses).
You've already been given some code in the beginning of
read_exp
. Read this code carefully to understand what constitutes an
illegal input to read_exp
. Notice the keyword raise
- this is how
we create our own errors. See the
documentation for
more details.
Make sure you read and understand all the doctests for both functions before you start writing code.
def read_exp(tokens):
"""Given a list of tokens, returns the first calculator expression
(either a number, operator, or combination), and the rest of the
expression that has not been turned into an expression.
>>> exp, unevaled = read_exp(['(', '+', '2', '3', '6', ')'])
>>> print_linked_list(exp)
< '+' 2 3 6 >
>>> unevaled
[]
>>> exp, unevaled = read_exp(['2', '3'])
>>> exp
2
>>> unevaled
['3']
>>> exp, unevaled = read_exp(['(', '/', '6', '2', ')', '(', '-', '2', ')'])
>>> print_linked_list(exp)
< '/' 6 2 >
>>> unevaled
['(', '-', '2', ')']
>>> exp, unevaled = read_exp(['(','*','4','(','-','12','8',')',')','3','2'])
>>> print_linked_list(exp)
< '*' 4 < '-' 12 8 > >
>>> unevaled
['3', '2']
"""
if tokens == []:
raise SyntaxError('unexpected end of input')
token, rest = tokens[0], tokens[1:]
if token == ')':
raise SyntaxError('unexpected )')
elif token == '(':
if rest == []:
raise SyntaxError('mismatched parentheses')
elif rest[0] == ')':
raise SyntaxError('empty combination')
return read_until_close(rest)
else:
return numberize(token), rest
def read_until_close(tokens):
"""Reads up to and including the first mismatched close
parenthesis, then forms a combination out all of the values read
up to that point.
>>> exp, unevaled = read_until_close(['+', '2', '3', ')', '4', ')'])
>>> print_linked_list(exp)
< '+' 2 3 >
>>> unevaled
['4', ')']
>>> exp, unevaled = read_until_close(['+', '(', '-', '2', ')', '3', ')', ')'])
>>> print_linked_list(exp)
< '+' < '-' 2 > 3 >
>>> unevaled
[')']
"""
if tokens[0] == ')':
return empty, tokens[1:]
first, remaining = read_exp(tokens)
rest, remaining = read_until_close(remaining)
return link(first, rest), remaining
Now that we have our AST, we can turn to the problem of evaluating the AST, that is, we figure out what the answer to the program is.
First, we need to understand calc_apply
. calc_apply
takes an
operator and a linked list of numbers. Note that the linked list is
not a deep linked list. calc_apply
applies the operator to
those numbers. It checks what the operator is, and based on that, it
calls a function that can do that specific operation.
def calc_apply(op, args):
"""Applies an operator to a linked list of arguments.
>>> calc_apply('+', link(12, link(34, empty)))
46
>>> calc_apply('-', link(10, link(2, link(3, link(4, empty)))))
1
>>> calc_apply('-', link(3, empty))
-3
>>> calc_apply('*', empty)
1
"""
if op == '+':
return do_addition(args)
elif op == '*':
return do_multiplication(args)
elif op == '-':
return do_subtraction(args)
elif op == '/':
return do_division(args)
else:
raise NameError('unknown operator {}'.format(op))
Now, we need to implement the functions that do specific
operations. Addition, multiplication, and subtraction have been
implemented for you. They use the reduce_linked_list
function which
is defined near the end of the starter file. Your job is to define
do_division
, which does the division operation.
When an operation op
is given more than two arguments, it is left
associative. For example,
(op a b c d)
would do that same thing as
(op (op (op a b) c) d)
So, (/ 60 2 3 5)
is equivalent to (/ (/ (/ 60 2) 3) 5)
which is
equivalent to (/ 60 (* 2 3 5))
def do_division(args):
"""Applies the division operation to args.
args must have a length of at least 1, as in do_subtraction.
>>> do_division(link(4, empty))
0.25
>>> do_division(link(7, link(2, empty)))
3.5
>>> do_division(link(60, link(2, link(3, link(5, empty)))))
2.0
"""
length = len_linked_list(args)
if length == 0:
raise TypeError('not enough arguments')
elif length == 1:
return 1 / first(args)
else:
return first(args) / reduce_linked_list(mul, 1, rest(args))
calc_apply
, let's figure out how to use
calc_apply
. We'll call calc_apply
inside of
calc_eval
. calc_eval
takes in a calculator expression. A
calculator expression can either be a number or a linked list that can
contain multiple items where the first item is a operator and rest are
more calculator expressions. Note that this linked list is a deep
linked list.
Remember that we have two different types of calculator expressions -
numbers and function calls. For which type would we want to use
calc_apply
? Don't forget that calc_apply
takes in a linked list
of numbers, but the subexpressions of exp
can themselves be linked
lists - how could you turn these subexpressions into numbers? (Hint:
Trust the recursion! Domain and range will be helpful here.) You may
find the map_linked_list
function helpful. It is defined near the
end of the starter file.
The other type of expressions is just plain numbers. What should we do for these expressions?
def calc_eval(exp):
"""Evaluates a calculator expression.
>>> calc_eval(5)
5
>>> calc_eval(link('+', link(12, link(3, empty))))
15
>>> subexp1 = link('*', link(3, link(4, empty)))
>>> subexp2 = link('-', link(12, link(9, empty)))
>>> exp = link('+', link(subexp1, link(subexp2, empty)))
>>> print_linked_list(exp)
< '+' < '*' 3 4 > < '-' 12 9 > >
>>> calc_eval(exp)
15
"""
if is_linked_list(exp):
return calc_apply(first(exp), map_linked_list(calc_eval, rest(exp)))
return exp