Summer 2005 CS61C Project 1: A Scheme Interpreter

Due 23:59:59 on Friday 2005-07-08

TA In Charge: Dominic

Problem

A tiny Scheme-subset interpreter written in C is available in the file ~cs61c/lib/microscheme.c. Your job is to extend the interpreter. Part of the object of this project is for you to be sure to learn any C details needed for the project that you haven't learned already, so ask questions if necessary.

One of the niftiest features of Scheme is the way numbers are dealt with. Instead of having various number types such as int, unsigned, long, double, etc., numbers are just plain numbers. There is no need to worry about limits to the size or precision of the data; Scheme numbers can handle everything. Unfortunately, the code in microscheme.c can only handle signed 32 bit integers. The goal of this project will be to extend it to deal with arbitrarily large integers and complex numbers.

You will be required to do the following:

1) Modify the I/O of the interpreter to handle bigints: The current interpreter merely assumes that a number can be represented by a signed 32-bit integer. Extend this such that it can handle arbitrarily large integers, i.e. 1234567890123456789. "Arbitrary" has its limits: our test numbers will not exceed 2^32-1 decimal digits. You will also need to modify the output functions to recognize this representation. The output text is described below. You are advised to make a new number representation in the interpreter to hold the larger values and complex numbers.

2) Implement math functions for big integers: The current addition function takes in what it expects to be signed 32 bit integers, except that is clearly no longer the case. Modify it to accept big integers. Also implement subtraction, multiplication, and division, which do not exist in the framework code that we have provided you.

3) Add I/O for complex numbers: Now that big integers can be stored, add in numbers of the form "realpart+imagpart*i" (and "realpart-imagpart*i) where realpart and imagpart can be big integers themselves. For example, the following will be valid inputs:

   12345678901234567890
   1+8*i
   -2-400000000000000000000000000000000000*i

4) Add math for complex numbers: Just like for bignums, implement addition, subtraction, multiplication, and division for complex numbers. You should NOT duplicate any of the bigint math code from part 2. If you need to multiply two bigints here, then call you bigint multiplication code.

5) Variable argument math functions: To make things a bit more like real Scheme, we'll let the math operators take in a variable number of arguments- as opposed to just two. Once this is all complete, the interpreter should be able to evaluate

    (+ 858101931 1+3*i (* 7858282+812*i 9999999999999999999999999999999999999) 42)
6) Have reasonable running time: You should do your computations in a logical fashion that takes into account the fact that numbers can be very large. In general, we don't want to see your program take half a minute to compute the above example. An example of a poor algorithm would be to add two numbers together by repeatedly adding 1 over and over again, i.e.
    (+ 99999999999 9999999999999999999999999999999999999999999)
shouldn't have to perform 9999999999999999999999999999999999999999999 additions.

7) Good file organization: You should organize the code into multiple files to prevent microscheme.c from growing large and unwieldy. Write header file(s) to facilitate this. At the very minimum, much of your work with math functions and the new number representations should be in seperate file(s).

8) Follow restrictions: You may only use libraries specified in the appendix of K&R.

You will NOT be required to do the following (but read this, since there are subsets of them that need to be done)

1) All out memory management: The framework code we provide does not call free(), and we will not require you to fix this (very leaky!) code. However, you should use good memory management skills and ensure that any intermediate memory that you malloc() is free()'ed before your functions complete. In other words, you do not have to free() the arguments or return values of your math operators.

2) Input error checking: We guarantee that input will be valid Scheme syntax according to Dr. Scheme, plus our additional complex number syntax. Your program will only need to handle this subset of the inputs - you do NOT need to check the input for validity. Of course, the inputs can be extremely large nasty complex numbers. Your computation should gracefully error if an invalid math operation occurs, i.e. division by zero.

Input

The program will take input from STDIN once it is run. The current code already does this. As stated before, all inputs will be statements recognized as legal Scheme by Dr. Scheme plus the complex numbers. In particular, legal numbers will only be of the form /-?[0-9]+([\+\-][0-9]+\*i)?/ (perl). To translate- there is a possible leading negative sign and some digits. An optional + or - may follow in conjunction with more digits and a *i.

Output

After each expression is completely typed in, it should be evaluated and the result printed out followed by a prompt. Since all outputs in this limited form of Scheme will be numbers, this can be narrowed down to only several cases. Integers should be printed just as Dr. Scheme does. Complex numbers should be printed in the form "realpart+imagpart*i".

Sample Interaction

A basic example (light on the long numbers): example0.transcript

Helpful files:
example0.in
example0.out

Using these, you can run
    cat example0.in | ./microscheme > example0.myout
    diff example0.myout example0.out 
The first command will print outputs to the file named example0.myout. After that, the second command compares example0.myout to example0.out. If there are differences they will be printed out, otherwise, nothing will be shown on the screen.

Submission

Submit any number of files by creating a directory called proj1 with all your source files in it, as well as a Makefile. From within this directory run "submit proj1". Be certain that your program works as specified and compiles on the instructional machines. You may wish to use the unix diff command (man diff for more info) to compare your output to the example output. For more info on Makefiles, consult The GNU Make Manual. From within your proj1 directory, one should only have to type "make ; ./microscheme" on nova.cs to compile and run your program.

Optional

FOR FUN, NOT FOR CREDIT! DON'T DO THIS UNTIL THE REST OF THE PROJECT IS COMPLETED DO NOT SUBMIT THIS WITH YOUR MAIN PROJECT- MAKE A SEPERATE VERSION AND KEEP THE ORIGINAL PROJECT

Add user-defined procedures and symbol tables so variables can be defined. This will require several steps and knowledge from 61A:

1) Implement symbol tables (make sure that there can be more than 1 of these). Given a symbol, it should return the associated value. You should also create some means off adding symbol/value pairs into a table.

2) Create the global envrionment and the define special form. An environment is just a list of symbol tables. The global enviroment is a 1-element list. Calling the define special form should add a symbol/value pair to the correct symbol table.

3) Modify EVAL so that the operator (first) subexpression of a compound expression is evaluated recursively, as the argument subexpressions already are. Invent a primitive-procedure data type; put the primitives in the global environment. [The interpreter as it exists now doesn't look up the primitives in the symbol table; it has their names built in, as if they were special forms. But you'll change this, so that procedures are first-class values, as in Scheme.]

4) Add user-defined procedures. Create a new USERPROC type that includes the procedure's text and its defining environment. Add a LAMBDA special form to create user procedures. The text of the procedure is just the LAMBDA expression (or its cdr, if you prefer, leaving out the word LAMBDA itself). Modify the symbol lookup routine to handle a list of symbol tables. Then you have to write APPLY so that for user-defined procedures it extends the procedure's defining environment with a new frame in which the procedure's formal parameters are bound to the calling expression's actual argument values, and you have to write EVAL-SEQUENCE that evaluates the procedure's body. (Actually, since we have no primitives with side effects --- no mutation, for example --- you could get away with only allowing one expression in the body.) Basically you are redoing the metacircular evaluator from chapter 4 of SICP, but writing in C instead of in Scheme. This really isn't hard if you remember 61A!