Fixing Grammars for Python Print Statements

(from discussion sections on 2/22/10)

A first attempt at a grammar for print statements in Python might look like this:

Grammar P1

stmt : "print" exprs0
     | "print" exprs1 ','
     ;
exprs0 : /* empty */
       | exprs1
       ;
exprs1 : expr
       | expr ',' exprs1
       ;

(The choice of the second production for exprs1 is motivated by the fact that the AST format in the project spec looks like Lisp, so we might build up expression lists by consing onto the front.)

What is Bison's response to this grammar?

Shift/reduce conflict. When the top symbol on the stack is expr and the next token is a comma, the parser could shift the comma in hopes of reducing to exprs1 later, or it could reduce the expr to exprs1 immediately so that the comma is treated as the trailing comma for the print statement.

To fix the conflict, can we just accept Bison's default behavior (prefer shifting to reducing)?

No. When the comma is at the end of a line, reducing is the right thing to do.

To fix this conflict, we need more lookahead than just the comma. Bison will only give us one token of lookahead, so we need to modify the grammar to get us past the comma. In this case, we can simply change the associativity of the second exprs1 production to left-associative.

Grammar P2

stmt : "print" exprs0
     | "print" exprs1 ','
     ;
exprs0 : /* empty */
       | exprs1
       ;
exprs1 : expr
       | exprs1 ',' expr
       ;

Now the sequence <exprs1 ','> appears in both the stmt and exprs1 productions, so the parser can always shift when the lookahead token is a comma. After the comma is shifted, the next token will differentiate between the two cases (NEWLINE means reduce to stmt, otherwise reduce to exprs1).