CS164: Assorted Notes on Project 3, Spring 2009

  1. Attribute code generation
  2. Warning: Initialization in C++
  3. C++ style: placement of function/method bodies
  4. Assignment and copying of VMs.
  5. Addresses of labels
  6. A couple of debugging pointers

1. Attribute code generation


> I get confused about how  to implement the cgen for attributeref node.
> What is  supposed to return from  the ID node? The  biggest question I
> have is how to  decide which spot to no up in  the virtual table since
> in the  run time we don't have  the prototype of the  virtual table to
> lookup on.

Presumbably, there are three things you might do with an attributeref:
generate code to get the value, generate code to assign a value, or
generate code to call the value.  Strictly speaking, the first and
last cases are the same, but I said in an earlier post that we're not
going to worry about cases like

   a = x.f

where f is a method of the object pointed to by x.  Thus, if the value
of x.f is not being called, it may be assumed to be an attribute other
than a declared method for the purposes of this project.  If it IS being
called, it may either be a declared method or an attribute (whose value
is a function, for example:

   def g ():
       ...

   x.a = g

   x.a (3)

) When you've done OOP for a while, this sort of situation will scream
for three distinct codegen methods.  Looking just at the value-getting
problem, it's clear that what you need to generate code is the operand,
r, that contains the receiver (i.e., the value of the expression to the
left of the dot) and the Decl, d, of the id to the right of the dot.
If d is an unknown decl, then the left side had better be an object
(we're not looking at <class>.<id> for this project), and you generate
code to 

     t1 = the vtbl pointer from r
     t2 = attribute offset from the 'a' entry in vtbl t1
     call error if t2 == 0
     t3 = value at t2 + r

How do you know where to find the 'a' entry in a vtbl?  Your program has
to find all method & other attribute names and decide where each
goes---you choose where to put the 'a' entry, in other words.  If d is
an instance decl, then you can just generate

     t1 = the vtbl pointer from r
     t3 = value at K + r

where K is the offset of 'a' in the instance variable (NOT the same as
the offset of the 'a' entry in the vtbl!).  Again, your program is
supposed to decide what this offset is (and to have arranged to put it
into the appropriate vtables).

The value-setting problem is similar (but obviously differs in the final
assignment). 

The case x.a(...) is a bit different:

1. If d is a methoddecl, this is easy: generate code to fetch the method 
   pointer from r's vtable and call it.  

2. If d is an unknowndecl, things are messy: x.a could be either an
   instance variable that's been assigned a function value, or it could
   be a method.  To do this right, you'd have to generate code that does 
   something like

     t1 = the vtbl pointer from r
     t2 = method address from the 'a' entry in vtbl t1
     if t2 == 0 goto Attr
     call t2 as a method
     goto End
 Attr:
     t2 = attribute offset from the 'a' entry in vtbl
     call error if t2 == 0
     t3 = value at t2 + r
     proceed as if t3 is a function value (i.e., a pointer to an object
       containing the special function vtbl, a code address, and static link

Yuch.  So I'll tell you what.  Let's just say we won't worry about the
case x.a(...) if a is not a method, so that your code can be

     t1 = the vtbl pointer from r
     t2 = method address from the 'a' entry in vtbl t1
     call error if t2 == 0 
     call t2 as a method

Let's also say that the standard __xxx__ names always name declared
methods, if they are defined, so that in translating (e.g.) x+y, you
need not worry whether x.__add__ (if defined at all) is an attribute
other than a method.

2. Warning: Initialization in C++

I've seen a couple of instances of the following common problem, and
expect more, so I thought I'd mention it.

Variables (local, instance, array members) are not automatically
initialized in C++ unless 

1. They have global, namespace, or static scope, in which case they are 
   initialized "the default value for their type", or 

2. You arrange to have them initialized in a constructor or at the
   declaration site.

Thus, in 

      class Foo : public Bar {
      public:
         Foo () Bar () { }

      private:

         Foo* nextFoo;

      };

the 'nextFoo' field of a newly allocated Foo does not in general have a
defined value.  

This error takes many forms, usually resulting in segfaults, bus errors,
or the like, or in downright weird behavior generally.  It will
typically have different results on different systems.

3. C++ style: placement of function/method bodies

In at least one student's file, I've seen large method definitions in .h
files.  This is bad C++ in general (and I DO give points for "style").
The definitions of functions and methods ("definitions" include bodies,
as opposed to "declarations", which don't) do NOT generally belong in .h
files, but rather in .cc files.  The DECLARATIONS of such methods do
belong. Thus, one writes

.h file:

        extern void foo (int x);

        class Glorp {
        public:
            void bar (int z);
            ...
        };

.cc file:

        void
        foo (int x) 
        { 
           ...
        }

        void
        Glorp::bar (int z)
        {
            ...
        }

When defining non-public subtypes in .cc files, I don't mind your putting the
bodies in (a la Java).  Likewise, very small functions intended for 
inlining have to be declared in .h files to get the inlining effect for
most compilers:

     class Glorp {
     public:
          int getThud () const {
              return thud;
          }

     private:
          int thud;
     };

     static inline int bump (int x) {
         return x + 1;
     } 

(This last is problematic, since it produces a warning with g++ -Wall,
but you can be really fussy and write

     static inline int bump (int x) __attribute__ ((unused));
     static inline int bump (int x) {
         return x + 1;
     } 

)

But this kind of thing applies only to REALLY SMALL functions.

Another reason avoid bodies in .h files is that it screws up
debugging.  When a body appears in .h file, there can be multiple
copies of its code generated.  That makes it difficult to set
breakpoints properly.

Bottom line: keep your .h files clean.  Executable code belongs in .cc
files for the most part.

4. Assignment and copying of VMs.

Be careful to pass around VMs as pointers or references, as in the skeleton.
Don't introduce new classes like this:

      class Something {
      public:

         Something (VM& vm, ...) {
	    _vm = vm;
	    ...
	 }

      private:
         VM _vm;
      };

Instead use one of these:

      class Something {
      public:

         Something (VM& vm, ...) : _vm (vm), ... {
	    ...
	 }

      private:
         VM& _vm;
      };

OR 

      class Something {
      public:

         Something (VM* vm, ...) {
	    _vm = vm;
	    ...
	 }

      private:
         VM* _vm;
      };

5. Addresses of labels

To create an external label, you use an expression like

      vm->externLabel ("FOO", address_of_FOO)

There has been some confusion as to what address_of_FOO should be and what
such an operand looks like in ia32 code.  

The address argument is intended for use in the interpreter only, and should
be the address named by the label FOO in the runtime code executed by
the interpreter.  Thus,

      vm->externLabel ("__pyInitRuntime", (void*) &__pyInitRuntime)

      vm->externLabel ("__pyIntVtbl", (void*) &__pyIntVtbl);
    or
      vm->externLabel ("__pyIntVtbl", (void*) __pyIntVtblPtr);

[why are the last two equivalent?].  

The IL (Intermediate Language) instruction generated by

      emitInst (VM::MOVE, r1, IMM(externLabel ("FOO", ADDR)));

loads ADDR into r1 in the interpreter.  In ia32, however, it should
translate to

      movl  $FOO, <whatever r1 is>

and ADDR should not be mentioned, since that particular address
is meaningless in the assembled code.  Similarly, 

      emitInst (VM::MOVE, r1, MEM(externLabel ("FOO", ADDR)));

stores the current value of *(long*) ADDR in the interpreter into r1.  In 
assembly code, it becomes

     movl   FOO, <whatever r1 is>

For statement labels, your code should generate names (GCC uses things like
.LCn) unique to each newLabel().  These are what you output when generating
ia32 code. 

A couple of debugging pointers

1. I hope everyone has noticed by now that we have a method VM::dump()
   for dumping 3-address code.  You can call this by inserting a
   statement in your program (which had better not execute in your submitted
   version!), or by breaking in (e.g.) VM::execute and calling it from
   your debugger.  If your debugger does not support the calling of
   functions, get a new one; yours is useless.

2. In gdb, issuing the instruction

       set print object on

   will cause the debugger to display the dynamic types of values, not
   just their declared (static) types.