Functions

Today we’re adding functions to our language! A very exciting day for us!

Here are some programs we’ll be able to support after today:

(define (id x) x)
(print (id 4))
(define (f x y) (+ x y))
(define (g x) (f x x))
(print (f 4 5))
(define (fib n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2)))))
(print (fib (read-num)))
(define (even n) (if (zero? n) true (odd (sub1 n))))
(define (odd n) (if (zero? n) false (even (sub1 n))))
(print (even (read-num)))

You may notice some patterns that will constrain our implementation and others that will make it easier. Here are a few interesting patterns:

You might also notice there’s a regular structure to each of those function definitions. They each seem to have a name, which is a string; a list of arguments, which is a list of strings; and a body, which is any s-expression.

To support us in implementing functions in both the interpreter and the compiler, we’re going to introduce a new type, defn. We’ll use this type to represent function definitions, and it will include those same pieces of information we noticed in each of our function definitions: name, arguments, and body.

type defn = {name: string; args: string list; body: s_exp}

We’ll also have a helper function, defns_and_body, which will just take care of retrieving definitions and the program body out of the list of s-expressions we get from parsing our programs.

Functions in the Interpreter:

To kick things off, we’ll have to start by changing our interp function. Previously it called parse to parse a single s-expression in each input program. Now we’ll use parse_many to retrieve a list of s-expressions instead. Once we’ve used our defns_and_body helper to grab out the definitions and body, we’re ready to modify our call to interp_exp. We’ll probably want body to be the thing we’re interpreting, but how should we communicate the definitions? We’ll go ahead and add those as an additional argument to interp_exp.

let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) :
    value =
  (* ... *)

let interp (program : string) : unit =
  let defns, body = parse_many program |> get_defns_and_body in
  interp_exp defns Symtab.empty body |> ignore

Now that we’ve changed the type signature of interp_exp, we’ll have to do a bit of bookkeeping to make sure all our old calls to interp_exp pass those defns around too. Since the program always has the same set of definitions available (remember, all definitions appear at the top level for now), we won’t need to update them anywhere. We’ll just pass them around.

Ok, now that we have our function definitions available, what do we do once we spot a function call? We have an interesting decision to make about where we want to put our function call case relative to other cases in compile_exp’s top-level match. Say the programmer defines a function named +, and say we put our function call case before our addition case. That means the programmer will be able to change what + does in our language. We’ll go ahead and put the function call case right at the top of our match.

Here we also want to think about how to make the arguments available when we’re evaluating the function. Fortunately we already know one way to make values available by names! We’ve been using our environment symtab to keep track of let-bound items for a while now. We’ll use that same mechanism to make function arguments available while we’re evaluating the body of a function.

Here’s what we’ll add to our interp_exp function:

let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) :
    value =
  (* ... *)
  | Lst (Sym f :: args) when is_defn defns f ->
      let defn = get_defn defns f in
      if List.length args = List.length defn.args then
	let vals = List.map (interp_exp defns env) args in
	let fenv = (List.combine defn.args vals) |> Symtab.of_list in
	interp_exp defns fenv defn.body
      else raise (BadExpression exp)

Notice what we do to make sure this is really a valid function call. We use when is_defn defns to make sure that the symbol f is actually the name of one of the functions the programmer defined. Then we make sure that the number of arguments that appear in the function call matches the number of arguments the function definition accepts. If both of those conditions are met, we’re good to go.

We also want to notice what’s going on with the environments we’re passing into our interp_exp calls. When we’re evaluating the expressions in the argument list, we use env, the environment we received. When we’re evaluating the body of the function, we discard everything from env. Instead we make a new environment that contains only the argument names, mapped to outputs from evaluating the argument expressions.

The choice we made about the environment determines how scoping works in our language. If we’d decided to make a new symtab that included both the original env names and the function arguments, we’d have produced dynamic scope. This was once a fairly popular approach to scoping, but it turns out to be very difficult for programmers to reason about. These days, it’s rare to see languages implementing dynamic scope. Instead we chose to provide an environment that included only the function’s arguments. This means our language has lexical scope. With lexical scope, we can refer to names only within certain textual regions of the program. For example, say we define a function f: (define (f x y) (+ (g x) (g y))). With lexical scope, we can refer to x and y in the body of f, but we can’t refer to them once we’re in the body of g, even though g is called by f.

Functions in the Compiler:

Now that we’ve seen how it works in the interpreter, let’s try to do the same thing with the compiler.

Unfortunately, one part of our interpreter approach really won’t work here. In the interpreter, whenever we encountered a function call, we grabbed the body of the function and evaluated it. Say we tried to do the same thing here. Every time we encounter a function call, we could try to emit the assembly directives for executing the function body. What would happen if we tried to do this for the third and fourth example programs above?

Since we can’t take that naive approach, we’ll have to find a way to put the function bodies into the assembly code once, then jump to them whenever we need to execute the function. Forunately we’ve already had some good practice with calling functions—we’ve already been calling the C functions we defined in our runtime. We’ll be able to do a similar process for functions defined in our own input programs.

We’ll start with doing some of the same bookkeeping we did in the interpreter—calling parse_many, changing compile to accept a list of s-expressions instead of just one, using defns_and_body to get the definitions and body.

let compile (program : s_exp list) : string =
  let defns, body = defns_and_body program in
  [ Global "entry"
  ; Extern "error"
  ; Extern "read_num"
  ; Extern "print_value"
  ; Extern "print_newline" ]
  @ [Label "entry"]
  @ compile_exp defns Symtab.empty (-8) body
  @ [Ret]
  @ List.concat_map (compile_defn defns) defns
  |> List.map string_of_directive
  |> String.concat "\n"

let compile_to_file (program : string) : unit =
  let file = open_out "program.s" in
  output_string file (compile (parse_many program)) ;
  close_out file

You can see we also made some other changes here. First, we passed the definitions as input to compile_exp. As in the interpreter, we’ll thread those definitions through all our calls to compile_exp, without changing what’s in the defns list. But we’ve also done something a bit weird. We’ve added additional directives after our Ret. What are we putting after Ret? We’re calling a new helper function compile_defn on each of our function definitions, and adding the resulting directives to the end of our assembly code. Let’s take a look at what compile_defn is doing to make the function definitions available in our assembly programs.

let compile_defn defns {name; args; body} =
  let ftab =
    args
    |> List.mapi (fun i arg -> (arg, -8 * (i + 1)))
    |> Symtab.of_list
  in
  [Label (defn_label name)]
  @ compile_exp defns ftab (-8 * (List.length args + 1)) body
  @ [Ret]

Our overall goal here is to put each function body at its own label.

When we were calling C functions, C was in charge of producing the labels for the functions and putting the appropriate assembly instructions after. For programmer-defined functions, we’re in charge of doing the same process. When we’re compiling the definition, we don’t know what values will be passed as arguments. In fact, if we call the function multiple times in the program, we may be passing in many different arguments. Since we can’t make the argument values available while we’re compiling the definition, we’ll have to take care of making the arguments available when we’re compiling the call. However, since we’re compiling the body right now, we still need to know where the body should look to find those values. We’ll go ahead and decide where the arguments will live, even though the call will be in charge of putting the appropriate values into those slots.

Here we’ve decided to put the first argument at the stack pointer minus 8, the second argument at the stack pointer minus 16, and so on. Remember that the compiler’s symtab maps from a name to the offset from rsp where it can find the value bound to that name. So if a function has n arguments, ftab maps the arguments to the first n slots above the stack pointer. Once we’re ready to compile the function calls, that’s where we’ll want to put the argument values.

In the meantime, we now have enough information to finish compiling the definitions. Since we know how much space will be taken by the n arguments, we can pass an apprioriate stack index to our compile_exp call that compiles the body. We’ll tell the body not to use any of the slots that are already populated with arguments, but anything above that is fair game. And as in our compiler, we’ve designed ftab so that only the function arguments are available within the function body.

Finally, we emit the list of directives. We add a label based on the function name. The defn_label function takes care of generating an appropriate label—this is important because some strings are allowable function names in our language but not allowable labels in the assembly. After the label, we emit the directives for the body itself, and lastly the Ret.

Now let’s move to compile_exp itself and compiling function calls.

let rec compile_exp (defns : defn list) (tab : int symtab) (stack_index : int)
    (exp : s_exp) : directive list =
  (* ... *)
  | Lst (Sym f :: args) when is_defn defns f ->
      let defn = get_defn defns f in
      if List.length args = List.length defn.args then
	let stack_base = align_stack_index (stack_index + 8) in
	let compiled_args =
	  args
	  |> List.mapi (fun i arg ->
		 compile_exp defns tab (stack_base - ((i + 2) * 8)) arg
		 @ [Mov (stack_address (stack_base - ((i + 2) * 8)), Reg Rax)])
	  |> List.concat
	in
	compiled_args
	@ [ Add (Reg Rsp, Imm stack_base)
	  ; Call (defn_label f)
	  ; Sub (Reg Rsp, Imm stack_base) ]
      else raise (BadExpression exp)

In some ways, this looks a lot like our C calls. We’re still doing the Add to point rsp to the last used slot (or 8 bytes beyond that, if we need the buffer to keep everything 16-byte aligned). We’re still doing the Call to pass execution to the funtion we’re calling. And we’re still doing the Sub to point rsp back to where it pointed before the call.

One imporant thing we’re not doing: saving rdi. Since we control the assembly in the function that we’re calling, we know none of the code we generate will clobber rdi. In fact, if it does update rdi because it puts something on the heap, that’s something we need to know about! So we don’t need to save and restore rdi.

The other big change compared to our C calls: we have to put those arguments onto the stack, right where the function definition is expecting to find them. Remember, Call increments the stack pointer by -8. stack_base + 1*-8 thus points to the slot where our return address will end up. And stack_base + 2*-8 points to the slot where we want our first argument to live. So we know what offset to use for our Mov instruction, for each argument. But what stack index should we pass as input when we call compile_exp? We can actually use the exact same offset we use for the Mov. When we’re populating the arguments in the stack, we slot them in from the bottom up. If the next available slot is at offset x, then we can use any slot at offset x or above to calculate the value we’ll eventually place in x.

And with that, we have functions! Open up dune utop and try it out!