Booleans II

Recall that we’re adding support for booleans to our interpreter and compiler. Booleans I covered adding them to our interpreter. Now we’ll turn to the compiler. Specifically, we’re going to add support for these expressions:

true and false, the two boolean values
(not e), a unary operation which evaluates to true on the boolean value false and false otherwise
(num? e), a unary opertion which evaluates to true if e is a number and false otherwise
(zero? e), a unary opertion which evaluates to true if e is the number 0 and false otherwise

Types in the compiler

Now that we have an interpreter to test against, we can extend our compiler to support our new operations!

When our interpreter is executing a program, values of expressions are instances of the value datatype we just defined. We won’t be able to do that in the compiler–we can’t define new datatypes in x86-64! Remember that when our program is executing, its values live in registers (actually, just rax). Registers store 64-bit integers. Right now the values in our program are all integers, so this works fine. But how will we add booleans? Take a second and think about how you might implement this.

Well, we know that all of our values need to be represented, at runtime, as 64-bit integers. So instead of representing integers as themselves:

0 -> 0b00
1 -> 0b01
2 -> 0b10
3 -> 0b11
...

We’re going to represent the integer x as x << 2 (shifted left by two bits):

0 -> 0b0000
1 -> 0b0100
2 -> 0b1000
3 -> 0b1100

This is exactly equivalent to representing each integer x as x * 4.

This means our integers have to fit in 62 bits instead of 64. So our minimum integer is now -(2**61) and our maximum integer is (2**61) - 1.

This also means there are a bunch of 64-bit integers (how many?) that are no longer being used to represent values! All of our integer values now end with 00. So anything that ends with a different pair of bits won’t be used to represent a number. This means we can use some of them to represent booleans, and other types!

First, though, let’s update our compiler to use this new representation for integers. Integer constants will be easy–we’ll just shift them left. How will we handle add1 and sub1? Well, remember that our runtime representations are the values multiplied by 4. Since multiplication distributes over addition (and subtraction), we can just add (or subtract) 4 instead of 1! So:

Listing 1: compiler.ml

let num_shift = 2
let num_mask = 0b11
let num_tag = 0b00

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  | Num n ->
    [Mov (Reg Rax, Imm (n lsl num_shift))]
  | Lst [Sym "add1"; arg] ->
    compile_exp arg @
    [Add (Reg Rax, Imm (1 lsl num_shift))]
  | Lst [Sym "sub1"; arg] ->
    compile_exp arg @
    [Sub (Reg Rax, Imm (1 lsl num_shift))]
  | e -> raise (BadExpression e)

(lsl is “logical shift left.” We could also just multiply by 4, but it’s clearer this way.)

What happens if we run a program now?

>>> compile_and_run "(add1 4)"
20

This makes sense–we’re printing out the runtime representation! We’ll need to fix that. We’ll edit our C runtime:

Listing 2: runtime.c

#include <stdio.h>
#include <inttypes.h>

#define num_shift 2
#define num_mask 0b11
#define num_tag 0b00

extern uint64_t entry();

void print_value(uint64_t value) {
  if ((value & num_mask) == num_tag) {
    int64_t ivalue = (int64_t)value;
    printf("%" PRIi64, ivalue >> num_shift);
  } else {
    printf("BAD VALUE %" PRIu64, value);
  }
}

int main(int argc, char **argv) {
  print_value(entry());
  return 0;
}

Boolean support in the runtime

While we’re editing the runtime, let’s also add support for booleans.

Listing 3: runtime.c

#include <stdio.h>
#include <inttypes.h>

#define num_shift 2
#define num_mask  0b11
#define num_tag   0b00

#define bool_shift 7
#define bool_mask  0b1111111
#define bool_tag   0b0011111

extern uint64_t entry();

void print_value(uint64_t value) {
  if ((value & num_mask) == num_tag) {
    int64_t ivalue = (int64_t)value;
    printf("%" PRIi64, ivalue >> num_shift);
  } else if ((value & bool_mask) == bool_tag) {
    if (value >> bool_shift) {
      printf("true");
    } else {
      printf("false");
    }
  } else {
    printf("BAD VALUE %" PRIu64, value);
  }
}

int main(int argc, char **argv) {
  print_value(entry());
  return 0;
}

We’ll need to recompile the runtime:

$ gcc -c runtime.c -o runtime.o

Boolean support in the compiler

We can now add support for true and false pretty easily:

let bool_shift = 7
let bool_mask = 0b1111111
let bool_tag = 0b0011111

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  (* some cases elided ... *)
  | Sym "true" -> [Mov (Reg Rax, Imm ((1 lsl bool_shift) lor bool_tag))]
  | Sym "false" -> [Mov (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag))]

Handling our other operations will be a little trickier. Let’s start with not. As a reminder, not should evaluate to true (i.e., should put the runtime representation of true into rax!) when its argument is false; otherwise, it should evaluate to false.

It seems like we need a way to compare the runtime representations of values. For this, we’ll use the x86-64 instruction cmp. cmp X,Y compares X to Y. It then sets processor flags based on the result. There are a bunch of flags, and we’ll talk about more of them later in the class; for now, we just need to know that cmp sets the flag ZF to 1 if its arguments are the same and 0 otherwise.

Flags aren’t like registers–we don’t access them directly in assembly code¹. These flags then modify the behavior of subsequent instructions. We’ll see more examples of this next lecture when we talk about conditionals. For now, we’re going to use another instruction, setz, in order to access ZF. setz takes a register² and sets the last byte of that register to 1 (i.e., 0b00000001) if ZF is set and 0 if ZF is not set.

In pseudo-assembly, how we’re going to implement the not operator:

not:    
        cmp rax, 0b00011111
        mov rax, 0
        setz rax        
        shl rax, 7
        or rax, 0b0011111

So, now we can implement not:

let bool_shift = 7
let bool_mask = 0b1111111
let bool_tag = 0b0011111

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  (* some cases elided ... *)
  | Sym "true" -> [Mov (Reg Rax, Imm ((1 lsl bool_shift) lor bool_tag))]
  | Sym "false" -> [Mov (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag))]
  | Lst [Sym "not"; arg] ->
    compile_exp arg @
    [ Cmp (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag)) (* compare rax to false *)
    ; Mov (Reg Rax, Imm 0) (* zero out rax *)
    ; Setz (Reg Rax) (* 1 if ZF is set (meaning rax contained false), 0 otherwise *)
    ; Shl (Reg Rax, Imm bool_shift) (* rax << bool_shift *)
    ; Or (Reg Rax, Imm bool_tag) (* tag rax as a boolean: rax = rax | bool_tag *)
    ]

There’s some duplicate logic here. We’re going to make a helper function called operand_of_bool, which makes an instruction operand from a boolean using shift and or:

let operand_of_bool (b : bool) : operand =
  Imm (((if b then 1 else 0) lsl bool_shift) lor bool_tag)

We can do the same thing for numbers:

let operand_of_num (x : int) : operand =
  Imm ((x lsl num_shift) lor num_tag)

(We include lor num_tag here to be symmetric with operand_to_bool, but everything would work if we left it off–why?)

Lastly, we’re going to re-use the code to convert ZF to a boolean:

let zf_to_bool: directive list =
  [Mov (Reg Rax, Imm 0) (* zero out rax *)
  ; Setz (Reg Rax) (* 1 if ZF is set, 0 otherwise *)
  ; Shl (Reg Rax, Imm bool_shift) (* rax << bool_shift *)
  ; Or (Reg Rax, Imm bool_tag) (* tag rax as a boolean: rax = rax | bool_tag *)
  ]

zf_to_bool is a list, not a function. How is that possible? Won’t it depend on the value we’re trying to convert? It does not! This is a list of instructions that set rax to the runtime representation of true if ZF is set and to the runtime representation of false otherwise.

Now we can implement zero? easily:

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  (* some cases elided ... *)
  | Sym "true" -> [Mov (Reg Rax, operand_of_bool true)]
  | Sym "false" -> [Mov (Reg Rax, operand_of_bool false)]
  | Lst [Sym "not"; arg] ->
    compile_exp arg @
    [ Cmp (Reg Rax, operand_of_bool false) ]
    @ zf_to_bool
  | Lst [Sym "zero?"; arg] ->
    compile_exp arg @
    [ Cmp (Reg Rax, operand_of_num 0) ]
    @ zf_to_bool

Lastly, we can implement num?. We can detect if a value is a number by looking at the last two bits and seeing if they are both zero. We can do that like this:

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  (* some cases elided ... *)
  | Lst [Sym "num?"; arg] ->
    compile_exp arg @
    [ And (Reg Rax, Imm num_mask); Cmp (Reg Rax, Imm num_tag) ]
    @ zf_to_bool

Footnotes:

Actually, all of the flags are packed together in the same special RFLAGS register

It actually just takes the lower byte of a register, which are notated differently in assembly–for instance, the lower byte of rax is written al. Our assembly library takes care of this, so we won’t talk about it too much in class.