CS164: Assorted Notes on Project 3, Spring 2009
- Attribute code generation
- Warning: Initialization in C++
- C++ style: placement of function/method bodies
- Assignment and copying of VMs.
- Addresses of labels
- A couple of debugging pointers
> I get confused about how to implement the cgen for attributeref node.
> What is supposed to return from the ID node? The biggest question I
> have is how to decide which spot to no up in the virtual table since
> in the run time we don't have the prototype of the virtual table to
> lookup on.
Presumbably, there are three things you might do with an attributeref:
generate code to get the value, generate code to assign a value, or
generate code to call the value. Strictly speaking, the first and
last cases are the same, but I said in an earlier post that we're not
going to worry about cases like
a = x.f
where f is a method of the object pointed to by x. Thus, if the value
of x.f is not being called, it may be assumed to be an attribute other
than a declared method for the purposes of this project. If it IS being
called, it may either be a declared method or an attribute (whose value
is a function, for example:
def g ():
...
x.a = g
x.a (3)
) When you've done OOP for a while, this sort of situation will scream
for three distinct codegen methods. Looking just at the value-getting
problem, it's clear that what you need to generate code is the operand,
r, that contains the receiver (i.e., the value of the expression to the
left of the dot) and the Decl, d, of the id to the right of the dot.
If d is an unknown decl, then the left side had better be an object
(we're not looking at <class>.<id> for this project), and you generate
code to
t1 = the vtbl pointer from r
t2 = attribute offset from the 'a' entry in vtbl t1
call error if t2 == 0
t3 = value at t2 + r
How do you know where to find the 'a' entry in a vtbl? Your program has
to find all method & other attribute names and decide where each
goes---you choose where to put the 'a' entry, in other words. If d is
an instance decl, then you can just generate
t1 = the vtbl pointer from r
t3 = value at K + r
where K is the offset of 'a' in the instance variable (NOT the same as
the offset of the 'a' entry in the vtbl!). Again, your program is
supposed to decide what this offset is (and to have arranged to put it
into the appropriate vtables).
The value-setting problem is similar (but obviously differs in the final
assignment).
The case x.a(...) is a bit different:
1. If d is a methoddecl, this is easy: generate code to fetch the method
pointer from r's vtable and call it.
2. If d is an unknowndecl, things are messy: x.a could be either an
instance variable that's been assigned a function value, or it could
be a method. To do this right, you'd have to generate code that does
something like
t1 = the vtbl pointer from r
t2 = method address from the 'a' entry in vtbl t1
if t2 == 0 goto Attr
call t2 as a method
goto End
Attr:
t2 = attribute offset from the 'a' entry in vtbl
call error if t2 == 0
t3 = value at t2 + r
proceed as if t3 is a function value (i.e., a pointer to an object
containing the special function vtbl, a code address, and static link
Yuch. So I'll tell you what. Let's just say we won't worry about the
case x.a(...) if a is not a method, so that your code can be
t1 = the vtbl pointer from r
t2 = method address from the 'a' entry in vtbl t1
call error if t2 == 0
call t2 as a method
Let's also say that the standard __xxx__ names always name declared
methods, if they are defined, so that in translating (e.g.) x+y, you
need not worry whether x.__add__ (if defined at all) is an attribute
other than a method.
I've seen a couple of instances of the following common problem, and
expect more, so I thought I'd mention it.
Variables (local, instance, array members) are not automatically
initialized in C++ unless
1. They have global, namespace, or static scope, in which case they are
initialized "the default value for their type", or
2. You arrange to have them initialized in a constructor or at the
declaration site.
Thus, in
class Foo : public Bar {
public:
Foo () Bar () { }
private:
Foo* nextFoo;
};
the 'nextFoo' field of a newly allocated Foo does not in general have a
defined value.
This error takes many forms, usually resulting in segfaults, bus errors,
or the like, or in downright weird behavior generally. It will
typically have different results on different systems.
In at least one student's file, I've seen large method definitions in .h
files. This is bad C++ in general (and I DO give points for "style").
The definitions of functions and methods ("definitions" include bodies,
as opposed to "declarations", which don't) do NOT generally belong in .h
files, but rather in .cc files. The DECLARATIONS of such methods do
belong. Thus, one writes
.h file:
extern void foo (int x);
class Glorp {
public:
void bar (int z);
...
};
.cc file:
void
foo (int x)
{
...
}
void
Glorp::bar (int z)
{
...
}
When defining non-public subtypes in .cc files, I don't mind your putting the
bodies in (a la Java). Likewise, very small functions intended for
inlining have to be declared in .h files to get the inlining effect for
most compilers:
class Glorp {
public:
int getThud () const {
return thud;
}
private:
int thud;
};
static inline int bump (int x) {
return x + 1;
}
(This last is problematic, since it produces a warning with g++ -Wall,
but you can be really fussy and write
static inline int bump (int x) __attribute__ ((unused));
static inline int bump (int x) {
return x + 1;
}
)
But this kind of thing applies only to REALLY SMALL functions.
Another reason avoid bodies in .h files is that it screws up
debugging. When a body appears in .h file, there can be multiple
copies of its code generated. That makes it difficult to set
breakpoints properly.
Bottom line: keep your .h files clean. Executable code belongs in .cc
files for the most part.
Be careful to pass around VMs as pointers or references, as in the skeleton.
Don't introduce new classes like this:
class Something {
public:
Something (VM& vm, ...) {
_vm = vm;
...
}
private:
VM _vm;
};
Instead use one of these:
class Something {
public:
Something (VM& vm, ...) : _vm (vm), ... {
...
}
private:
VM& _vm;
};
OR
class Something {
public:
Something (VM* vm, ...) {
_vm = vm;
...
}
private:
VM* _vm;
};
To create an external label, you use an expression like
vm->externLabel ("FOO", address_of_FOO)
There has been some confusion as to what address_of_FOO should be and what
such an operand looks like in ia32 code.
The address argument is intended for use in the interpreter only, and should
be the address named by the label FOO in the runtime code executed by
the interpreter. Thus,
vm->externLabel ("__pyInitRuntime", (void*) &__pyInitRuntime)
vm->externLabel ("__pyIntVtbl", (void*) &__pyIntVtbl);
or
vm->externLabel ("__pyIntVtbl", (void*) __pyIntVtblPtr);
[why are the last two equivalent?].
The IL (Intermediate Language) instruction generated by
emitInst (VM::MOVE, r1, IMM(externLabel ("FOO", ADDR)));
loads ADDR into r1 in the interpreter. In ia32, however, it should
translate to
movl $FOO, <whatever r1 is>
and ADDR should not be mentioned, since that particular address
is meaningless in the assembled code. Similarly,
emitInst (VM::MOVE, r1, MEM(externLabel ("FOO", ADDR)));
stores the current value of *(long*) ADDR in the interpreter into r1. In
assembly code, it becomes
movl FOO, <whatever r1 is>
For statement labels, your code should generate names (GCC uses things like
.LCn) unique to each newLabel(). These are what you output when generating
ia32 code.
1. I hope everyone has noticed by now that we have a method VM::dump()
for dumping 3-address code. You can call this by inserting a
statement in your program (which had better not execute in your submitted
version!), or by breaking in (e.g.) VM::execute and calling it from
your debugger. If your debugger does not support the calling of
functions, get a new one; yours is useless.
2. In gdb, issuing the instruction
set print object on
will cause the debugger to display the dynamic types of values, not
just their declared (static) types.