Building an AST without a Parser

The goal of this section is to develop a method to build a data structure that would normally be built by a parser using syntax-directed translation. For example, consider this page. The input is HTML, which is then run through a parser (e.g. your BAH parser), which builds an AST (the DOM, in this case). Say, however, that we don't want to build a parser. As you know from the previous project, building a parser isn't entirely trivial. This may come up for your final project: you might want to design a language (like BAH) but not write a parser for it. What are other ways of building an AST without writing a parser?

One way is to simply build the AST by hand. For instance, using the 164 you've developed (with the addition of array literals, which you can likely imagine adding without too much difficulty), we could write code like the following.

def root = {
  name: "VerticalBox",
  children = {
    0: {
      name = "Text"
      text = "Hello"
    },
...
Evaluating this expression in 164 will create the DOM. This achieves our goal: we don't need a specialized parser (you could write this code e.g. in Python). However, this code is difficult to read. It also fixes the internal representation of the AST data structure, so if we decided to change it later, we'd have to change our source. This isn't very good.

Method Chaining

Another idea is to use method chaining, which is used by Protovis and jQuery. Method chaining consists of simply writing a series of method calls, such as f().g().h(). Each call modifies a property and returns some result, and this can be used to build up an AST. For example, we might write code like the following.

Text("Hello").color("blue")
Text("world").color("red").font("Sans Serif")

This raises one immediate question: what do these function calls return? From the example code above, we can see that each call must return the receiver. For example, the second line above makes some Text object and calls its color method to set its color. This function must then return the same Text object (modified to be red) so that we can change its font. So the first call creates a Text object with the text "world", the second call sets its color to red, and the third changes its font.

But what about a more complicated example where we want to add multiple elements as children of another? For example, consider adding both of these Text nodes as children of some Box node, as below.

new Box()
  .add(Text).text("Hello").color("blue")
  .add(Text).text("world").color("red").font("Sans Serif")
Unfortunately, this will not work, as the second call to add will act on the Text object returned by the call to color, not on the Box. We thus have to have some way of navigating up between parents and children. For example, we could write the above code as
new Box()
  .add(Text).text("Hello").color("blue")
  .parent.add(Text).text("world").color("red").font("Sans Serif")
There are other alternatives to this strategy of using a parent pointer. If we imagine this as building up a stack, we might change the syntax slightly and use a push/pop metaphor. We could also consider having add instead return the parent, not the child, and modifying the children inside the argument to add:
new Box()
  .add(Text.text("Hello").color("blue"))
  .add(Text.text("world").color("red").font("Sans Serif"))
Both of these methods will work. The latter is perhaps more useful when we expect a deep hierarchy, but which we prefer might depend on our domain.

Implementation

I have written up a simple example of method chaining based on the above examples and Protovis. The HTML source (which draws the same page as the HTML page above) simple sets up an empty page and calls into the JavaScript source. The JavaScript code contains both the library code and the client code for simplicity of presentation in section, but these two would likely be separated in practice.

The code uses inheritance to define the different objects used by the client. The root of the hierarchy is Object164. It defines some fields that all objects have (parent, children, and so forth) and methods that work on all objects (e.g. width and height). Its subclasses include Text (which has text properties) and _Container, which can hold other elements and has subclasses such as Box and List.

Each object contains a _dom field that stores the JavaScript DOM element used to render that object. There are thus two different trees in this code: the object hierarchy that the client sees and the DOM hierarchy that is used behind the scenes to implement things. This separates the implementation from the client code, so we could switch the implementation (e.g. to SVG or canvas) and the client code would still work.

The code follows the Protovis semantics of having add return the child. All of the property-setting methods set the corresponding property and then return either the receiver or (for add methods) the child. Each object has a parent field and an up method to traverse the hierarchy. With this, we can write code like that shown above and at the bottom of the JavaScript source.

There are a few JavaScript tricks in the code. In JavaScript, objects are just functions, and so all the objects in our hierarchy are declared as functions. But then what does a new call do? It turns out that in JavaScript, the code new E() is equivalent to

(function() {
  var o = new Object();
  E.call(o);
  return o;
})()
What does the call method do? E is a JavaScript object, which means it is a function. In JavaScript, functions have a call method that calls them with the given argument as their this parameter. This this code makes a new empty object and calls the E function using the new object as its this parameter. Thus this code, as we would expect, uses E as a constructor.

JavaScript uses prototype-based inheritance. Specifically, if someone tries to access a field or method that an object does not contain, JavaScript will see if it has a prototype and, if so, recursively check it for the field or method. In our code here, we set each object's prototype to point to its parent. For example, we say Text.prototype = _extend(Object164), where _extend returns an object whose prototype points to Object164's prototype. We then add all methods to prototypes (e.g. Text.prototype.italics = ...) and the inheritance works.

Try taking a look through the code and tracing through how it works and builds up an AST.