Functions
Today we’re adding functions to our language! A very exciting day for us!
Here are some programs we’ll be able to support after today:
(define (id x) x) (print (id 4))
(define (f x y) (+ x y)) (define (g x) (f x x)) (print (f 4 5))
(define (fib n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2))))) (print (fib (read-num)))
(define (even n) (if (zero? n) true (odd (sub1 n)))) (define (odd n) (if (zero? n) false (even (sub1 n)))) (print (even (read-num)))
You may notice some patterns that will constrain our implementation and others that will make it easier. Here are a few interesting patterns:
- We can have multiple expressions now.
- Functions can be recursive (and mutually recursive).
- Function definitions are always at the top level—we don’t have function definitions nested inside other function definitions or inside let expressions.
- Function definitions appear before the body of the program.
- We’re not using functions as values.
You might also notice there’s a regular structure to each of those function definitions. They each seem to have a name, which is a string; a list of arguments, which is a list of strings; and a body, which is any s-expression.
To support us in implementing functions in both the interpreter and the
compiler, we’re going to introduce a new type, defn
.
We’ll use this type to represent function definitions, and it will
include those same pieces of information we noticed in each of our
function definitions: name, arguments, and body.
type defn = {name: string; args: string list; body: s_exp}
We’ll also have a helper function, defns_and_body
,
which will just take care of retrieving definitions and the program body
out of the list of s-expressions we get from parsing our programs.
Functions in the Interpreter:
To kick things off, we’ll have to start by changing our
interp
function. Previously it called
parse
to parse a single s-expression in each input
program. Now we’ll use parse_many
to retrieve a
list of s-expressions instead. Once we’ve used our
defns_and_body
helper to grab out the definitions and
body, we’re ready to modify our call to
interp_exp
. We’ll probably want
body
to be the thing we’re interpreting, but how
should we communicate the definitions? We’ll go ahead and add
those as an additional argument to interp_exp
.
let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) : value = (* ... *) let interp (program : string) : unit = let defns, body = parse_many program |> get_defns_and_body in interp_exp defns Symtab.empty body |> ignore
Now that we’ve changed the type signature of
interp_exp
, we’ll have to do a bit of bookkeeping
to make sure all our old calls to interp_exp
pass those
defns
around too. Since the program always has the same
set of definitions available (remember, all definitions appear at
the top level for now), we won’t need to update them anywhere.
We’ll just pass them around.
Ok, now that we have our function definitions available, what
do we do once we spot a function call? We have an interesting
decision to make about where we want to put our function call case
relative to other cases in compile_exp
’s
top-level match. Say the programmer defines a function named
+
, and say we put our function call case before our
addition case. That means the programmer will be able to change what
+
does in our language. We’ll go ahead and put
the function call case right at the top of our match.
Here we also want to think about how to make the arguments available
when we’re evaluating the function. Fortunately we already
know one way to make values available by names! We’ve been
using our environment symtab
to keep track of let-bound
items for a while now. We’ll use that same mechanism to make
function arguments available while we’re evaluating the body
of a function.
Here’s what we’ll add to our
interp_exp
function:
let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) : value = (* ... *) | Lst (Sym f :: args) when is_defn defns f -> let defn = get_defn defns f in if List.length args = List.length defn.args then let vals = List.map (interp_exp defns env) args in let fenv = (List.combine defn.args vals) |> Symtab.of_list in interp_exp defns fenv defn.body else raise (BadExpression exp)
Notice what we do to make sure this is really a valid function call.
We use when is_defn defns
to make sure that the symbol
f
is actually the name of one of the functions the
programmer defined. Then we make sure that the number of arguments
that appear in the function call matches the number of arguments the
function definition accepts. If both of those conditions are met,
we’re good to go.
We also want to notice what’s going on with the environments
we’re passing into our interp_exp
calls. When
we’re evaluating the expressions in the argument list, we use
env
, the environment we received. When we’re
evaluating the body of the function, we discard
everything from env
. Instead we make a new
environment that contains only the argument names, mapped to outputs
from evaluating the argument expressions.
The choice we made about the environment determines how scoping
works in our language. If we’d decided to make a new symtab
that included both the original env
names and
the function arguments, we’d have produced
dynamic scope. This was once a fairly popular approach to
scoping, but it turns out to be very difficult for programmers to
reason about. These days, it’s rare to see languages
implementing dynamic scope. Instead we chose to provide an
environment that included only the function’s arguments. This
means our language has lexical scope. With lexical scope, we
can refer to names only within certain textual regions of the
program. For example, say we define a function f
:
(define (f x y) (+ (g x) (g y)))
. With lexical scope,
we can refer to x
and y
in the body of
f
, but we can’t refer to them once we’re in
the body of g
, even though g
is called by
f
.
Functions in the Compiler:
Now that we’ve seen how it works in the interpreter, let’s try to do the same thing with the compiler.
Unfortunately, one part of our interpreter approach really won’t work here. In the interpreter, whenever we encountered a function call, we grabbed the body of the function and evaluated it. Say we tried to do the same thing here. Every time we encounter a function call, we could try to emit the assembly directives for executing the function body. What would happen if we tried to do this for the third and fourth example programs above?
Since we can’t take that naive approach, we’ll have to find a way to put the function bodies into the assembly code once, then jump to them whenever we need to execute the function. Forunately we’ve already had some good practice with calling functions—we’ve already been calling the C functions we defined in our runtime. We’ll be able to do a similar process for functions defined in our own input programs.
We’ll start with doing some of the same bookkeeping we did in
the interpreter—calling parse_many
, changing
compile
to accept a list of s-expressions instead of
just one, using defns_and_body
to get the definitions
and body.
let compile (program : s_exp list) : string = let defns, body = defns_and_body program in [ Global "entry" ; Extern "error" ; Extern "read_num" ; Extern "print_value" ; Extern "print_newline" ] @ [Label "entry"] @ compile_exp defns Symtab.empty (-8) body @ [Ret] @ List.concat_map (compile_defn defns) defns |> List.map string_of_directive |> String.concat "\n" let compile_to_file (program : string) : unit = let file = open_out "program.s" in output_string file (compile (parse_many program)) ; close_out file
You can see we also made some other changes here. First, we passed
the definitions as input to compile_exp
. As in the
interpreter, we’ll thread those definitions through all our
calls to compile_exp
, without changing what’s in
the defns
list. But we’ve also done something a
bit weird. We’ve added additional directives after our
Ret
. What are we putting after Ret
?
We’re calling a new helper function
compile_defn
on each of our function definitions, and
adding the resulting directives to the end of our assembly code.
Let’s take a look at what compile_defn
is doing
to make the function definitions available in our assembly programs.
let compile_defn defns {name; args; body} = let ftab = args |> List.mapi (fun i arg -> (arg, -8 * (i + 1))) |> Symtab.of_list in [Label (defn_label name)] @ compile_exp defns ftab (-8 * (List.length args + 1)) body @ [Ret]
Our overall goal here is to put each function body at its own label.
When we were calling C functions, C was in charge of producing the labels for the functions and putting the appropriate assembly instructions after. For programmer-defined functions, we’re in charge of doing the same process. When we’re compiling the definition, we don’t know what values will be passed as arguments. In fact, if we call the function multiple times in the program, we may be passing in many different arguments. Since we can’t make the argument values available while we’re compiling the definition, we’ll have to take care of making the arguments available when we’re compiling the call. However, since we’re compiling the body right now, we still need to know where the body should look to find those values. We’ll go ahead and decide where the arguments will live, even though the call will be in charge of putting the appropriate values into those slots.
Here we’ve decided to put the first argument at the stack
pointer minus 8, the second argument at the stack pointer minus 16,
and so on. Remember that the compiler’s symtab maps from a
name to the offset from rsp
where it can find the value
bound to that name. So if a function has n arguments,
ftab
maps the arguments to the first n slots above the
stack pointer. Once we’re ready to compile the function calls,
that’s where we’ll want to put the argument values.
In the meantime, we now have enough information to finish compiling
the definitions. Since we know how much space will be taken by the n
arguments, we can pass an apprioriate stack index to our
compile_exp
call that compiles the body. We’ll
tell the body not to use any of the slots that are already populated
with arguments, but anything above that is fair game. And as in our
compiler, we’ve designed ftab
so that only the
function arguments are available within the function body.
Finally, we emit the list of directives. We add a label based on the
function name. The defn_label
function takes care of
generating an appropriate label—this is important because
some strings are allowable function names in our language but not
allowable labels in the assembly. After the label, we emit the
directives for the body itself, and lastly the Ret
.
Now let’s move to compile_exp
itself and
compiling function calls.
let rec compile_exp (defns : defn list) (tab : int symtab) (stack_index : int) (exp : s_exp) : directive list = (* ... *) | Lst (Sym f :: args) when is_defn defns f -> let defn = get_defn defns f in if List.length args = List.length defn.args then let stack_base = align_stack_index (stack_index + 8) in let compiled_args = args |> List.mapi (fun i arg -> compile_exp defns tab (stack_base - ((i + 2) * 8)) arg @ [Mov (stack_address (stack_base - ((i + 2) * 8)), Reg Rax)]) |> List.concat in compiled_args @ [ Add (Reg Rsp, Imm stack_base) ; Call (defn_label f) ; Sub (Reg Rsp, Imm stack_base) ] else raise (BadExpression exp)
In some ways, this looks a lot like our C calls. We’re still
doing the Add
to point rsp
to the last
used slot (or 8 bytes beyond that, if we need the buffer to keep
everything 16-byte aligned). We’re still doing the
Call
to pass execution to the funtion we’re
calling. And we’re still doing the Sub
to point
rsp
back to where it pointed before the call.
One imporant thing we’re not doing: saving
rdi
. Since we control the assembly in the function that
we’re calling, we know none of the code we generate will
clobber rdi
. In fact, if it does update
rdi
because it puts something on the heap, that’s
something we need to know about! So we don’t need to save and
restore rdi
.
The other big change compared to our C calls: we have to put those
arguments onto the stack, right where the function definition is
expecting to find them. Remember, Call
increments the
stack pointer by -8. stack_base + 1*-8
thus points to
the slot where our return address will end up. And
stack_base + 2*-8
points to the slot where we want our
first argument to live. So we know what offset to use for our
Mov
instruction, for each argument. But what stack
index should we pass as input when we call compile_exp
?
We can actually use the exact same offset we use for the
Mov
. When we’re populating the arguments in the
stack, we slot them in from the bottom up. If the next available
slot is at offset x
, then we can use any slot at offset
x
or above to calculate the value we’ll
eventually place in x
.
And with that, we have functions! Open up dune utop
and
try it out!