Welcome to CS 164! Announcements: PA1 due Friday 11:59pm (early autograder Thurs 11:59pm) WA1/PA2 out Thursday ---------------------------------------------------------------------------- 1. Spot the Cool bugs: class main { /* Create a new stack. */ stack : Stack <- new Stack(); main() : Object let command <- "" in { command = prompt(); stack.push(command); out_string(str); if not isQuit(command) then main() else abort(); } } } How is Cool different from Java or C++? Any PA1 questions? ---------------------------------------------------------------------------- 2. Lexical Analysis Lexical analysis is the first stage of the compiler. It converts raw input into a stream of token-lexeme pairs suitable for parsing. What is the input and output for our compiler? Example: { if (section_interesting == 1) then listen() else sleep(); depart(); } --------------------------------------------------------------------------- 3. Regular Languages What is an alphabet? What is a language? A regular language is a kind of language that has some additional structure. Specifically, a regular language can be described by a regular expression. We care because we will use regular expressions to construct a lexer in PA2. What are the five constructs used to define regular expressions/languages? Examples: - Binary numbers. - Binary numbers except "10". - Text strings in a programming language (e.g., "Hello, world!\n" in C). - Strings that contain "ab". - Strings with balanced parentheses (e.g., ((())) ). [Trick question: This is not a regular language!] --------------------------------------------------------------------------- 4. Finite Automata Finite automata can be used to implement regular languages. Theorem: A language is regular if and only if it is accepted by a finite automaton. Formal definition: (\Sigma, S, n, F, \delta) \Sigma is the alphabet S is the set of states n is the start state (n \in S) F is the set of accepting states (F \subseteq S) \delta is the transition function (\delta : state -> state) Miscellaneous questions: - How does an NFA differ from a DFA? - When does a DFA accept its input? An NFA? - Which is more powerful, a DFA or an NFA? [Trick question: They're equally powerful!] - Why would you choose one over the other? Construct DFA for: - (a|b)*c - Strings that contain "ab". - All identifiers except "if" and "fi". --------------------------------------------------------------------------- 5. Flex and JLex Flex and JLex are automatic tools that will spit out C or Java code for a scanner (lexer) for you, given a bunch of regular expressions. Flex and JLex take regular expressions, convert them interally to a DFA, and then emit code for that DFA. Flex syntax for regular expressions: 'a' literal 'a' a* 0 or more a's a+ 1 or more a's a? 0 or 1 a's [ab] (a|b) [^ab] all characters not equal to a or b [a-z] all characters between a and z [a-zA-Z_] all letters plus the underscore [^a-z0-9] all characters not between a-z or 0-9 . all character not equal to \n Flex input format: %{ [headers for lexer, such as #includes] %} [definitions for useful names and delcarations for start conditions] %% [lexer rules] %% [user code to be included with lexer] Flex Example (cribbed from manual) %{ /* We need math.h for the atof() call below. */ #include %} %x COMMENT DIGIT [0-9] ID [a-z][a-z0-9]* %% [^*]* /* skip */ "*"+[^*/]* /* skip */ "*"+"/" BEGIN(INITIAL); "/*" BEGIN(COMMENT); {DIGIT}+ printf("int: %d\n", atoi(yytext)); {DIGIT}+"."{DIGIT}* printf("float: %g\n", atof(yytext)); if|then|else|fi printf("keyword: %s\n", yytext); {ID} printf("identifier: %s\n", yytext); "+"|"-"|"*"|"/" printf("operator: %s\n", yytext); [ \t\n]+ /* eat up whitespace */ . printf("unrecognized character: %s\n", yytext); %% int main(int argc, char **argv) { ++argv; --argc; /* skip over program name */ if (argc > 0) { yyin = fopen(argv[0], "r"); } else { yyin = stdin; } yylex(); } This lexer works! Give it a try. % flex -otoy.c toy.l % gcc -o toy toy.c -lfl Some notes: - That funny COMMENT section defines a start condition--that is, a special context for defining a separate set of rules. Here, we switch into the COMMENT context when consuming comment text. - Be careful! Your lexer will need to handle nested comments. - Skim the Flex manual for more detail on all of these features. Questions: - How do we handle ambiguity? - How do we handle errors?