Project #1 Notes, Spring 2008

This page is an accumulation of (we hope) helpful notes, hints, etc., that might be useful to you in doing Project #1.

Various Changes from the Original Handouts and Skeletons
Postponed error checks
Debugging a grammar
Assorted Questions and Answers

Various Changes from the Original Handouts and Skeletons

We changed the project spec to require that the entire program and file list be surrounded by a program node.

Typo correction in the project 1 spec:

Thus, what appears in a program as

    "Input file: C:\\FOO\40contains\t\"Hello, w\137rld!\"\n"

gets written out as 

    "Input file: C:\134FOO contains\011\042Hello, world!\042\012"

That second line should read

    "Input file: C:\\FOO\40contains\t\"Hello, w\157rld!\"\n"

We have restricted the left sides of augmented assignments (such as +=) to be simple variables.
Augmented assignments such as x += E are now just syntactic sugar for x = x + E. They don't use the special functions such as __iadd.
The control variable of a for statement must be a single simple variable.
We corrected an error in the getIndentation function.
The C++ skeleton originally contained files parser.y and Parser.tab.cc, which we removed (the parser file is Parser.y; Parser.tab.cc is a generated file, not part of the original source.
We modified the sample string literals in the project description that contained octal escapes, since Pyth doesn't have them this semester. However, you still use octal escapes in the ASTs (and the unparser will use them until I fix it).

Postponed error checks

There are several erroneous cases that could be handled either in the parser or in static semantics. To allow us to use test cases properly, we have to decide how to handle them. Detection of the following errors can wait until static semantic analysis. You are free to check for them during parsing, but be warned: we are postponing them because they present various difficulties for our grammar tools and yet are easy to handle during static semantics.

Misplaced break, continue, and return: These statements may occur only in certain contexts (the first two in loops, the last in functions). However, enforcing this in the parser requires either duplicating a bunch of grammar rules or using various kludges involving global variables or the like. So for our purposes, just allow these three statements anywhere that a simple statement is allowed.
Class method definitions: Likewise, class methods may occur only immediately within classes, but let's allow them anywhere, and catch the error in the static semantics phase. On the other hand, just to be inconsistent, classes may only appear at the outer level of the program. This is easy to enforce in the parser, so do forbid class definitions except at the outer level.

Debugging a Grammar

Be sure that your finished parser does not report unresolved shift/reduce or reduce/reduce errors. These generally mean you've done something wrong.

With the -v switch, bison and jbison produce a table (in a file with suffix .output) showing the LALR(1) machine that they produce. You can use this as shown in lecture to help find some problems.

We have set up the ParseTest commands in the skeletons to allow a --debug switch, which will cause the parser to output information about the parse it is performing. This information includes the shifts and reductions it performs and the tokens it gets from the lexer.

It's also possible to test the lexer alone. Putting the declaration

    %debug

for JFlex or

    %option debug

for flex will cause the lexer to output each rule that it matches and value that it returns. Unfortunately, the returned syntactic categories are integers, which are, perhaps, not the most useful things to look at, and the information returned does not include the accompanying lexemes themselves.

So a better a approach to testing the lexer is to set up a dummy version of the parser, containing %term declarations for all tokens you expect from the lexer (the lexer needs the parser's values for these anyway):

   %term IDENT STRING_LITERAL INDENT ...
   %term "and" "or" "if" ...
   %term "==" "!=" ...

and a dummy grammar that uses all of them:

   prog : /* EMPTY */
        | prog token      { System.out.printf ("Token value: %s%n", $2); }
        ;

   token : IDENT | STRING_LITERAL | INDENT | ...
        ;

Is the welter of resulting output hard to read? Consider writing a tiny program to format it better!

Assorted Questions and Answers

Q: How should we treat "elif"? It isn't in the spec.

A: There is no construct for it, because elif is the same as else if:

if FOO:
   S1
elif BAR:
   S2
else:
   S3

is equivalent to

if FOO:
   S1
else:
   if BAR:
      S2
   else:
      S3

Q: Are we free to allow (or disallow) statements such as:

 x *= y *= 2

A: Augmented assignments are STATEMENTS, not expressions, and may not be used as expressions. Thus the statement above is illegal.

Q: In dictionaries can keys and values only be expressions? If not what syntactic categories can they be?

A: They may only be expressions.

[CS164 Homework] [CS164 Home Page]

Page was last modified on Thu Feb 28 03:26:06 2008.
Address comments and questions to cs164@eecs.berkeley.edu