Study Guide: Interpreters

Instructions

This is a study guide with links to past lectures, assignments, and handouts, as well as additional practice problems to assist you in learning the concepts.

Assignments

Important: For solutions to these assignments once they have been released, see the main website

Handouts

Lectures

Readings

Guides

How does the computer make sense of the questions we ask in the interpreter?

scm> (+ 1 2)
3

When we type the characters, (+ 1 2), how does a computer understand the meaning of each character: the open parenthesis (, the plus symbol, +, the space character , and so forth?

Interpreters

An interpreter is a program which can directly execute instructions in a programming language like Python or Scheme. To do this, the interpreter needs to understand how the rules of a language work, and to do that, the interpreter needs to be designed (by a human, usually!) to follow those rules. In CS 61A, we'd like to learn how the interpreter program works and understand the ideas that go into an interpreter to reveal the magic behind how a computer makes sense of (+ 1 2) in Scheme and 1 + 2 in Python.

REPL

The interpreters we study in this course are designed around a Read-Eval-Print-Loop.

The interpreter waits for our input, looping until we type in a string of characters, (+ 1 2).

  1. In the Read stage, we take the string of characters and convert those characters into data structures that can be understood by the interpreter. In the Scheme interpreter, we'd like to take the string of characters (+ 1 2) and convert it into a Pair structure, Pair('+', Pair(1, Pair(2, nil))).

    • The Lexer takes the string (+ 1 2) and splits into meaningful tokens wherever there is whitespace or a syntax character like the open parenthesis (. The lexer returns a structure that kind of resembles a list of individual characters like ['(', '+', 1, 2, ')'].
    • The Parser takes the list of smaller strings and converts them into a data structure that understands the nesting in the language. It's not too tricky to convert the example above from ['(', '+', '1', '2', ')'] to Pair('+', Pair(1, Pair(2, nil))). But the pair structure is much more useful when we want to work on a more nested example like (= (+ 1 2) 3) which the lexer converts into the flat list ['(', '=', '(', '+', 1, 2, ')', 3, ')']. As the expressions get more and more complicated, using a nested data structure helps the computer make sense of the expression in chunks, much like how we humans make sense of a Scheme expression as well.
  2. In the Eval stage, we evaluate the nested data structure to a value.

    • If the expression is simple, like a number or a boolean, we can just return the number or boolean itself without any further evaluation. Evaluating 1 in the global frame returns the number 1. If we pass in a name like 'x', we'll try to evaluate the name 'x' in the current environment, looking up through the parent frames as necessary.
    • If we pass a call expression like Pair('+', Pair(1, Pair(2, nil))) into the evaluator, we'll follow the call expression evaluation rules to determine the value of the value of the expression. We'll evaluate the operator, '+', to the built-in procedure which adds values, and each operand, 1 and 2, before applying the procedure to the arguments.
    • if the expression is a special form like Pair('define', Pair('x', Pair(1, nil))) (after parsing (define x 1)), then we'll follow the special form evaluation rules defined in the Scheme interpreter. In this example, we don't need to evaluate the name 'x' (it's just a name!) but we do need to evaluate the value, the number 1.
  3. In the Print stage, we take the value we determined in the evaluation stage and figure out how to display the value to the user. Even if an expression evaluates to a number, the computer actually stores that number in its memory as a bunch of 0 and 1 binary digits! We take that number and convert it back to a more readable representation like '3'!
  4. Once we've displayed the value, we can wait in a Loop until the user asks another question to the interpreter.

There are some special rules in the parser to handle scenarios like the quote operator '. Run python3 scheme_reader.py to explore its behavior.

read> '1
str : (quote 1)
repr: Pair('quote', Pair(1, nil))
read> '(1 2 3)
str : (quote (1 2 3))
repr: Pair('quote', Pair(Pair(1, Pair(2, Pair(3, nil))), nil))

Likewise, the evaluator also needs to handle the behavior of special forms differently from standard call expressions. The Scheme Specification contains more information, and the special form evaluation rules will also be briefly described by example here.

Counting Calls

One way of understanding what occurs in the Scheme interpreter is by counting the number of calls to scheme_eval that occur when evaluating an expression.

Atomic data types like numbers, booleans, strings, symbols, etc. only take a single step to evaluate. Primitive expressions like names evaluate to a value in one step as well.

Expression: 1
Eval:       -       (1 scheme_eval)

Expression: #t
Eval:       --      (1 scheme_eval)

Expression: 'cs61a
Eval:       ------  (1 scheme_eval)

Call Expressions

Call expressions follow the call expression evaluation procedure.

Expression: (+ 1 2)
Eval:       -------
             - - -  (4 scheme_eval)
Apply:      ~~~~~~~ (1 scheme_apply)

Expression: (= (quotient 5 1) (+ 1 2))
Eval:       --------------------------
             - --------------
                -------- - -
Apply:         ~~~~~~~~~~~~~~
Eval:                         -------
                               - - -   (10 scheme_eval)
Apply:                        ~~~~~~~
            ~~~~~~~~~~~~~~~~~~~~~~~~~~ (3 scheme_apply)

Special Forms

Special forms follow different evaluation rules. A more complete description of their behaviors is described in the Scheme Project and the Scheme Specification.

Unlike call expressions where we need to evaluate the operator, special forms do not need to evaluate the first term. In scheme_eval, the logic looks like this.

SPECIAL_FORMS = {
    'and': do_and_form,
    'begin': do_begin_form,
    'cond': do_cond_form,
    'define': do_define_form,
    'if': do_if_form,
    'lambda': do_lambda_form,
    'let': do_let_form,
    'or': do_or_form,
    'quote': do_quote_form,
}

first, rest = expr.first, expr.second
if scheme_symbolp(first) and first in SPECIAL_FORMS:
    return SPECIAL_FORMS[first](rest, env)

Instead of continuing with the call expression evaluation procedure, we handle the special form with its own set of rules.

Expression: (if (= 1 1) #t #f)
Eval:       ------------------
                -------        (note that if is not evaluated)
                 - - -
Apply:          ~~~~~~~        (1 scheme_apply)
Eval:                   --     (6 scheme_eval)
                               (note that #f is not evaluated)

We can also trace through how our Scheme interpreter handles another special form, cond.

Expression: (cond ((= 1 2) #f) (else #t))
Eval:       -----------------------------
                   -------
                    - - -
Apply:             ~~~~~~~               (1 scheme_apply)
Eval:                                --  (6 scheme_eval)

Practice Problems

Q1: Eval/Apply

This question is from the CS 61A Summer 2017 Final Exam.

With your Project 4 implementation, how many total calls to scheme_eval and scheme_apply would be made when evaluating the following two expressions? Assume that you are not using the tail call optimized scheme_eval_optimized function.

Suppose we have already evaluated the following definition in the current environment.

(define lst (list 1 2 3))

How many calls are made to scheme_eval and scheme_apply?

(+ (car lst) (- 5 (car (cdr lst))))
13 eval calls, 5 apply calls

How many calls are made to scheme_eval and scheme_apply?

((if (or (null? lst) (null? (cdr lst)))
     (lambda (s) 0)
     (lambda (s) (car (cdr s))))
 lst)
18 eval calls, 6 apply calls