# Functions

Today we’re adding functions to our language! A very exciting day for us!

Here are some programs we’ll be able to support after today:

```(define (id x) x)
(print (id 4))
```
```(define (f x y) (+ x y))
(define (g x) (f x x))
(print (f 4 5))
```
```(define (fib n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2)))))
```
```(define (even n) (if (zero? n) true (odd (sub1 n))))
(define (odd n) (if (zero? n) false (even (sub1 n))))
```

You may notice some patterns that will constrain our implementation and others that will make it easier. Here are a few interesting patterns:

• We can have multiple expressions now.
• Functions can be recursive (and mutually recursive).
• Function definitions are always at the top level—we don’t have function definitions nested inside other function definitions or inside let expressions.
• Function definitions appear before the body of the program.
• We’re not using functions as values.

You might also notice there’s a regular structure to each of those function definitions. They each seem to have a name, which is a string; a list of arguments, which is a list of strings; and a body, which is any s-expression.

To support us in implementing functions in both the interpreter and the compiler, we’re going to introduce a new type, `defn`. We’ll use this type to represent function definitions, and it will include those same pieces of information we noticed in each of our function definitions: name, arguments, and body.

```type defn = {name: string; args: string list; body: s_exp}
```

We’ll also have a helper function, `defns_and_body`, which will just take care of retrieving definitions and the program body out of the list of s-expressions we get from parsing our programs.

## Functions in the Interpreter:

To kick things off, we’ll have to start by changing our `interp` function. Previously it called `parse` to parse a single s-expression in each input program. Now we’ll use `parse_many` to retrieve a list of s-expressions instead. Once we’ve used our `defns_and_body` helper to grab out the definitions and body, we’re ready to modify our call to `interp_exp`. We’ll probably want `body` to be the thing we’re interpreting, but how should we communicate the definitions? We’ll go ahead and add those as an additional argument to `interp_exp`.

```let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) :
value =
(* ... *)

let interp (program : string) : unit =
let defns, body = parse_many program |> get_defns_and_body in
interp_exp defns Symtab.empty body |> ignore
```

Now that we’ve changed the type signature of `interp_exp`, we’ll have to do a bit of bookkeeping to make sure all our old calls to `interp_exp` pass those `defns` around too. Since the program always has the same set of definitions available (remember, all definitions appear at the top level for now), we won’t need to update them anywhere. We’ll just pass them around.

Ok, now that we have our function definitions available, what do we do once we spot a function call? We have an interesting decision to make about where we want to put our function call case relative to other cases in `compile_exp`’s top-level match. Say the programmer defines a function named `+`, and say we put our function call case before our addition case. That means the programmer will be able to change what `+` does in our language. We’ll go ahead and put the function call case right at the top of our match.

Here we also want to think about how to make the arguments available when we’re evaluating the function. Fortunately we already know one way to make values available by names! We’ve been using our environment `symtab` to keep track of let-bound items for a while now. We’ll use that same mechanism to make function arguments available while we’re evaluating the body of a function.

Here’s what we’ll add to our `interp_exp` function:

```let rec interp_exp (defns : defn list) (env : value symtab) (exp : s_exp) :
value =
(* ... *)
| Lst (Sym f :: args) when is_defn defns f ->
let defn = get_defn defns f in
if List.length args = List.length defn.args then
let vals = List.map (interp_exp defns env) args in
let fenv = (List.combine defn.args vals) |> Symtab.of_list in
interp_exp defns fenv defn.body
```

Notice what we do to make sure this is really a valid function call. We use `when is_defn defns` to make sure that the symbol `f` is actually the name of one of the functions the programmer defined. Then we make sure that the number of arguments that appear in the function call matches the number of arguments the function definition accepts. If both of those conditions are met, we’re good to go.

We also want to notice what’s going on with the environments we’re passing into our `interp_exp` calls. When we’re evaluating the expressions in the argument list, we use `env`, the environment we received. When we’re evaluating the body of the function, we discard everything from `env`. Instead we make a new environment that contains only the argument names, mapped to outputs from evaluating the argument expressions.

The choice we made about the environment determines how scoping works in our language. If we’d decided to make a new symtab that included both the original `env` names and the function arguments, we’d have produced dynamic scope. This was once a fairly popular approach to scoping, but it turns out to be very difficult for programmers to reason about. These days, it’s rare to see languages implementing dynamic scope. Instead we chose to provide an environment that included only the function’s arguments. This means our language has lexical scope. With lexical scope, we can refer to names only within certain textual regions of the program. For example, say we define a function `f`: `(define (f x y) (+ (g x) (g y)))`. With lexical scope, we can refer to `x` and `y` in the body of `f`, but we can’t refer to them once we’re in the body of `g`, even though `g` is called by `f`.

## Functions in the Compiler:

Now that we’ve seen how it works in the interpreter, let’s try to do the same thing with the compiler.

Unfortunately, one part of our interpreter approach really won’t work here. In the interpreter, whenever we encountered a function call, we grabbed the body of the function and evaluated it. Say we tried to do the same thing here. Every time we encounter a function call, we could try to emit the assembly directives for executing the function body. What would happen if we tried to do this for the third and fourth example programs above?

Since we can’t take that naive approach, we’ll have to find a way to put the function bodies into the assembly code once, then jump to them whenever we need to execute the function. Forunately we’ve already had some good practice with calling functions—we’ve already been calling the C functions we defined in our runtime. We’ll be able to do a similar process for functions defined in our own input programs.

We’ll start with doing some of the same bookkeeping we did in the interpreter—calling `parse_many`, changing `compile` to accept a list of s-expressions instead of just one, using `defns_and_body` to get the definitions and body.

```let compile (program : s_exp list) : string =
let defns, body = defns_and_body program in
[ Global "entry"
; Extern "error"
; Extern "print_value"
; Extern "print_newline" ]
@ [Label "entry"]
@ compile_exp defns Symtab.empty (-8) body
@ [Ret]
@ List.concat_map (compile_defn defns) defns
|> List.map string_of_directive
|> String.concat "\n"

let compile_to_file (program : string) : unit =
let file = open_out "program.s" in
output_string file (compile (parse_many program)) ;
close_out file
```

You can see we also made some other changes here. First, we passed the definitions as input to `compile_exp`. As in the interpreter, we’ll thread those definitions through all our calls to `compile_exp`, without changing what’s in the `defns` list. But we’ve also done something a bit weird. We’ve added additional directives after our `Ret`. What are we putting after `Ret`? We’re calling a new helper function `compile_defn` on each of our function definitions, and adding the resulting directives to the end of our assembly code. Let’s take a look at what `compile_defn` is doing to make the function definitions available in our assembly programs.

```let compile_defn defns {name; args; body} =
let ftab =
args
|> List.mapi (fun i arg -> (arg, -8 * (i + 1)))
|> Symtab.of_list
in
[Label (defn_label name)]
@ compile_exp defns ftab (-8 * (List.length args + 1)) body
@ [Ret]

```

Our overall goal here is to put each function body at its own label.

When we were calling C functions, C was in charge of producing the labels for the functions and putting the appropriate assembly instructions after. For programmer-defined functions, we’re in charge of doing the same process. When we’re compiling the definition, we don’t know what values will be passed as arguments. In fact, if we call the function multiple times in the program, we may be passing in many different arguments. Since we can’t make the argument values available while we’re compiling the definition, we’ll have to take care of making the arguments available when we’re compiling the call. However, since we’re compiling the body right now, we still need to know where the body should look to find those values. We’ll go ahead and decide where the arguments will live, even though the call will be in charge of putting the appropriate values into those slots.

Here we’ve decided to put the first argument at the stack pointer minus 8, the second argument at the stack pointer minus 16, and so on. Remember that the compiler’s symtab maps from a name to the offset from `rsp` where it can find the value bound to that name. So if a function has n arguments, `ftab` maps the arguments to the first n slots above the stack pointer. Once we’re ready to compile the function calls, that’s where we’ll want to put the argument values.

In the meantime, we now have enough information to finish compiling the definitions. Since we know how much space will be taken by the n arguments, we can pass an apprioriate stack index to our `compile_exp` call that compiles the body. We’ll tell the body not to use any of the slots that are already populated with arguments, but anything above that is fair game. And as in our compiler, we’ve designed `ftab` so that only the function arguments are available within the function body.

Finally, we emit the list of directives. We add a label based on the function name. The `defn_label` function takes care of generating an appropriate label—this is important because some strings are allowable function names in our language but not allowable labels in the assembly. After the label, we emit the directives for the body itself, and lastly the `Ret`.

Now let’s move to `compile_exp` itself and compiling function calls.

```let rec compile_exp (defns : defn list) (tab : int symtab) (stack_index : int)
(exp : s_exp) : directive list =
(* ... *)
| Lst (Sym f :: args) when is_defn defns f ->
let defn = get_defn defns f in
if List.length args = List.length defn.args then
let stack_base = align_stack_index (stack_index + 8) in
let compiled_args =
args
|> List.mapi (fun i arg ->
compile_exp defns tab (stack_base - ((i + 2) * 8)) arg
@ [Mov (stack_address (stack_base - ((i + 2) * 8)), Reg Rax)])
|> List.concat
in
compiled_args
@ [ Add (Reg Rsp, Imm stack_base)
; Call (defn_label f)
; Sub (Reg Rsp, Imm stack_base) ]
In some ways, this looks a lot like our C calls. We’re still doing the `Add` to point `rsp` to the last used slot (or 8 bytes beyond that, if we need the buffer to keep everything 16-byte aligned). We’re still doing the `Call` to pass execution to the funtion we’re calling. And we’re still doing the `Sub` to point `rsp` back to where it pointed before the call.
One imporant thing we’re not doing: saving `rdi`. Since we control the assembly in the function that we’re calling, we know none of the code we generate will clobber `rdi`. In fact, if it does update `rdi` because it puts something on the heap, that’s something we need to know about! So we don’t need to save and restore `rdi`.
The other big change compared to our C calls: we have to put those arguments onto the stack, right where the function definition is expecting to find them. Remember, `Call` increments the stack pointer by -8. `stack_base + 1*-8` thus points to the slot where our return address will end up. And `stack_base + 2*-8` points to the slot where we want our first argument to live. So we know what offset to use for our `Mov` instruction, for each argument. But what stack index should we pass as input when we call `compile_exp`? We can actually use the exact same offset we use for the `Mov`. When we’re populating the arguments in the stack, we slot them in from the bottom up. If the next available slot is at offset `x`, then we can use any slot at offset `x` or above to calculate the value we’ll eventually place in `x`.
And with that, we have functions! Open up `dune utop` and try it out!