CS 164 Section 11: Inheritance and the Visitor Pattern

Inheritance

Inheritance is a way of making new classes by using existing ones and is a key component of object-oriented programming. In this section, we will discuss some features and uses of inheritance.

Consider this Java program. What will it print out? The answer is ABB. The first two letters are relatively simple: the first call is on an object of type A and the second on an object of type b, so they call their respective methods. But what about the third call? Does it use the actual type of the object or its declared type?

This is the difference between dynamic and static dispatch. With dynamic dispatch, a method is called based on the actual (dynamic) type of an object. With static dispatch, the method is called based on the declared (static) type of an object. Java, it turns out, uses dynamic dispatch, as do most other modern languages. C++, however, uses static dispatch (although you can have specific methods use dynamic dispatch with the virtual keyword).

What are the advantages of these two types of dispatch? Which would you prefer? Dynamic dispatch is much more useful and makes object-oriented programming much more powerful. For example, say you're writing a graphics program, and you have a generic Shape class with subclasses Square, Triangle, and so forth. Each class has its own draw method that displays it on the screen. If we have a list of all the objects that exist and want to draw them all, we will take our list of Shapes (since we do not know the type of each one) and call draw on each of them. With dynamic dispatch, this will work as intended and call the method for the specific subtype, but with static dispatch we will not be able to draw squares or triangles. On the other hand, static dispatch might be simpler, since you can read the code and figure out exactly which method is being called. It is also more efficient. With dynamic dispatch, a function call will require looking at a class to see if it has a method, then checking its superclass, then its superclass, and so forth, all at runtime. Languages with static dispatch can do this work at compile-time and call the correct method directly at runtime.

Visitor pattern

Image we're writing a compiler (which is a reasonable thing to do in a compilers course). If it's going to be a large compiler that's really used, we probably wouldn't want to write our AST with tuples. Instead, we'd probably want to use objects, since everyone likes object-oriented programming. So we'd probably have a hierarchy something like the following:

class Expr { ... }
class Number extends Expr { ... }
class Identifier extends Expr { ... }
class Plus extends Expr { ... }

We could then create this AST in our grammar with code like the following.

E -> E '+' E  %{ return new Plus(n1.val, n3.val) $}

Now that we have our nice AST of objects, we want to use it. We want to be able to print out our AST for debugging, we want to generate bytecode for each node, we might want to optimize our AST by removing redundant nodes, and so forth. How can we do all of this? One way is to add one method for each of these to each class in our AST. So each subclass of Expr gets a print method, a codeGen method, and so forth. This would work, but assume that we don't want to or can't do this. Perhaps the structure of the AST is given to us as a library and we can't modify it. Or perhaps we want to put all of the code for one step (e.g. code generation) in one place, rather than spreading it out and having each part in a different class or file. How would we do this?

Ideally, we would want to write code like the following.

class Printer {
    int indentLevel;
    void print(Expr n) { ... }
    void print(Number n) { /* Print a number */ }
    void print(Identifier n) { /* Print a variable */ }
    void print(Plus n) { /* Print a plus expr */ }
}
// Client code
Expr e = parser.parse(...);
Printer.print(e);

This puts all of the print code in one place and is pretty.

Unfortunately, this code will not work. It turns out that the outermost call to Printer.print calls void print(Expr n), which can't do the correct thing (e.g. if the parse returns a Plus).

This leads us to the difference between overriding and overloading. In the first example on this page, we considered overriding: when a class and its superclass both define the same method, the one for the subclass gets called. But our code here is slightly different: one class define multiple versions of the same method. This is called overloading, and Java dispatches it statically. For example, consider this file. It is similar to the first example, but the third call will now call the A method. This isn't just Java being stupid: most programming languages you've used do the same thing.

So we can't write the code the way we would want to. How else can we write it? We could have some big if/else block, where we check the type of the Expr and dispatch it accordingly (e.g. using Java's instanceof keyword). This would work, but it's not very object oriented. So how else can we figure out the dynamic type of an object? Well, we only know one way: call a method on it. We can thus modify our code to be something like the following.

abstract class Expr { abstract void ???(???);  }
class Number extends Expr {
    void ???(???) { ??? }
}
class Identifier extends Expr {
    void ???(???) { ??? }
}
class Plus extends Expr {
    void ???(???) { ??? }
}

class Printer {
    int indentLevel;
    void print(Expr n) { ???(???); }
    void print(Number n) { /* Print a number */ }
    void print(Identifier n) { /* Print a variable */ }
    void print(Plus n) { /* Print a plus expr */ }
}
// Client code
Expr e = parser.parse(...);
Printer.print(e);

This mystery ??? method will be dispatched dynamically (since the method is overriden by subclasses), so it will call the correct version of the method. But what do we put in its body? We could put the print code, but we already said we didn't want to split the code up and put it in the body of the nodes of the AST. But we can instead have it call back to the printer as follows.

abstract class Expr { abstract void visit(Visitor v);  }
class Number extends Expr {
    void accept(Visitor v) { v.visit(this); }
}
class Identifier extends Expr {
    void accept(Visitor v) { v.visit(this); }
}
class Plus extends Expr {
    public Expr left, right;
    void accept(Visitor v) { v.visit(this); }
}

class Printer extends Visitor {
    int indentLevel;  // The Visitor can keep some state
    void visit(Expr n) { n.accept(this); }
    void visit(Number n) { /* Print a number */ }
    void visit(Identifier n) { /* Print a variable */ }
    void visit(Plus n) { n.left.accept(this); Console.out.print(" + "); n.right.accept(this); }
}
// Client code
Expr e = parser.parse(...);
Printer.print(e);

This code will now work the way we want it to. The outermost call will go to Printer.print(Expr), which will call the accept method of the actual type of its argument (perhaps Plus). This method will call right back to the print method of its specific type. For example, the accept method of Plus knows statically that the type of this is Plus -- after all, it's inside the Plus class. So when it calls the overloaded visit method, it will call the one that takes in a Plus, as desired. For a full example, see this file, which comes from a midterm from a previous semester. Try tracing through it and seeing what exactly it prints.

Now, this may seem like cheating. After all, I said a while back that we couldn't add methods to the AST classes, but we just did that. But now consider that we want to add another visitor, perhaps to generate code. What will it look like?

class CodeGen extends Visitor {
    void visit(Expr n) { n.accept(this); }
    void visit(Number n) { /* Gen code for a number */ }
    void visit(Identifier n) { /* Gen code for an Identifier */ }
    void visit(Plus n) { /* Gen code for a Plus */ }
}

Note that we didn't have to add any more methods to our AST classes: the existing accept methods sufficed. We thus only need to add one such method to each class, and we can then write as many visitors as we want without modifying them again.

This is the visitor pattern. It is widely used in practice. The Java file linked above gives one example, and the Wikipedia page presents another.

Visitor pattern in Python

Now let's go back to Python. How would we write the above code that uses a visitor pattern in Python? Python supports classes, inheritance, and dynamic dispatch, so we could easily translate the above Java code to Python. But what would break down?

Python doesn't have static types, so it doesn't have a sense of overloaded methods. Specifically, we can't have our visitor define one print method for each type of AST node, since Python won't let them declare the type. So how can we get around this and write the code in Python?

One way would be to write a single visit method and have it do lots of if checks on the type of its argument. But again, this isn't very object-oriented. So instead, we can do the following.

class Expr:
class Number(Expr):
    def accept(visitor):
        visitor.visitNumber(self)
class identifier(Expr):
    def accept(visitor):
        visitor.visitidentifier(self)
class Plus(Expr):
    def accept(visitor):
        visitor.visitPlus(self)

class Printer(Visitor):
    def visitNumber(n):
        # Print a number
    def visitIdentifier(n):
        # Print an identifier
    def visitPlus(n):
        # Print a Plus

We are simulating the overloaded methods by giving them different names and modifying the accept methods to call the proper visit method. This code is still generic, so it would still work for multiple visitors. Here is a Python version of the Java code linked above.

Multiple dispatch

The Visitor pattern is nice, but the double calls are annoying and confusing. Is there any way to get rid of them? Not easily. After all, the problem is that Java (and other languages) can only dispatch based on the type of the receiver (i.e. call an overriden method), not the types of any of the arguments. But if you were designing a language, could you imagine changing this? Would it be possible to dispatch a method based on the type of all of its arguments?

It is possible, and it even has a name: multiple dispatch (or multimethods: it apparently has two names). As the name suggests, with multiple dispatch functions are dispatched based on the types of more than one of their arguments. This would allow us to write the visitor pattern the way we first attempted it:

class Printer {
    int indentLevel;
    void print(Expr n) { ... }
    void print(Number n) { /* Print a number */ }
    void print(Identifier n) { /* Print a variable */ }
    void print(Plus n) { /* Print a plus expr */ }
}
// Client code
Expr e = parser.parse(...);
Printer.print(e);

With multiple dispatch, this code will work correctly, as the call to print will be dispatched dynamically based on the type of its argument. We can thus write the clean version of the code.

We can also imagine how we might implement multiple dispatch. For single-argument functions, we could actually implement it with the visitor pattern, by adding fake visit and accept methods. For more complicated examples with multiple arguments, the compiler could insert lots of if checks that lookup the dynamic type of each object. This is still a bit tricky, since the order of the calls matters (we have to check if it's a Plus before we check if it's an Expr, for example), but it shows one possible strategy.

Pattern matching

With multiple dispatch, we've seen that it's both possible and useful to dispatch methods based on the types of their arguments. Going one step further, we could consider dispatching methods based on the values of their arguments. Is this implementable? Would it be useful?

The answer to the first question is yes. After all, it's easy to check the value of a variable: we can just use ==. But the usefulness is less obvious. Perhaps some example will help.

Consider the following code (which uses some fake syntax).

int fib(0) = return 0;
int fib(1) = return 1;
int fib(n) = return fib(n-1) + fib(n-2);

This code defines three versions of the fib function, two of which only work on specific values of its argument, and the third of which covers the rest of the cases. Not necessarily useful, perhaps, but it is fairly nice syntax.

Going a little further, if we're allowed to match on the values of arguments, we can make those values whatever we want. Consider the following code, which attempts to optimize an AST, as mentioned above.

def optimize(n) = match n with
    | Plus(Number(n1), Number(n2)) -> return Number(n1 + n2)
    | Or(True, _) -> True
    | _ -> n

This syntax (inspired by OCaml, by the way, although it's not syntactically valid), might be a bit confusing. But it's matching the parameter n against various patterns. Here, you can consider the patterns to be constructor calls with the fields of the class. So the first pattern matches Plus objects whose left field is a Number with value n1 and whose right field is a Number with value n2. In this case, where we're adding two constants, we can simplify the AST by returning the constant representing n1+n2. The underscore denotes a wildcard, and can stand for anything. The or of the constant true and anything is true, so we can also short-circuit there. We now have the beginnings of an optimizing compiler!

This technique is known as pattern matching, as you can match arguments based on their types and values. You can match multiple arguments as well. It is supported by many functional languages, and it can in fact be useful.