CSCE 434 Lecture 11
« previous | Wednesday, September 18, 2013 | next »
The Parser
Scanner produces lexemes, and the Parser takes these lexemes to produce the IL
"It takes the words and puts them into sentences"
- Perform context-free syntax analysis (we'll add limited context later)
- Guide context-sensitive analysis
- Like the scanner, it should output meaningful error messages
- Can attempt error-correction
It's even possible to compile high-level language back into a high-level language
SUIF - Stanford University Intermediate Format
- reads in C and spits out C
- Useful for research, but impractical
Context-Free Syntax
- Terminal Symbols: the stuff retured by flex
- Nonterminal Symbols: syntactic, structural "variables" that represent (sub)strings in the language
- Start Nonterminal: Denotes the entire set of strings in the language specified by the language; can never appear on the RHS of a production rule
- Production Rules: transformations in following format: single nonterminal on LHS, any combination of terminals and nonterminals on the RHS.
Grammars expressed in Backus-Naur format:
1. <goal> ::= <expr> 2. <expr> ::= <expr> <op> <expr> 3. | number 4. | id 5. <op> ::= + 6. | - 7. | * 8. | /
Why context-free? They're easy to specify and easy to convert to DFA
Syntax-directed Translation:
- If language is well-behaved (i.e. no goto statements), program can be represented with only a stack.
- Otherwise, construction requires a graph.
CFGs are used to count:
- open/close brackets
- begin-end
- if-then-else
Grammar for cc has 188 productions
The Parse
Recognizing x + 2 * y:
<goal> -> <expr> -> <expr> <op> <expr> -> <expr> <op> <expr> <op> <expr> -> <id,x> <op> <expr> <op> <expr> -> <id,x> + <expr> <op> <expr> -> <id,x> + <num,2> <op> <expr> -> <id,x> + <num,2> * <expr> -> <id,x> + <num,2> * <id,y>
We denote this as <goal> ⇒* x + 2 * y ("x+2*y
" is a valid sentence in ), and the sequence of rules that gets us there (1,2,2,4,5,3,7,4) is called a parse
Two types of derivation:
- leftmost: only the leftmost non-terminal symbol is replaced at each step (as in this example)
- rightmost: only the rightmost non-terminal symbol is replaced at each step
Regular Grammars
Only rules of the following form are allowed:
<A> ::= a<A> <A> ::= a
These are called type 3 grammars (according to Chomsky hierarchy)
Complexity
Regular Grammars | DFAs | |
LR Grammars | Knuth's Algorithm | |
Arbitrary CFGs | Early's algorithm | |
Arbitrary CSGs | Ibas | P-SPACE COMPLETE |
Parse Tree
Start is always the root
leaves are terminal symbols
Precedence
Previous example would interpret x + 2 * y as (x+2)*y, which is wrong.
The following grammar is a corrected version that respects mathematical precedence:
1. <goal> ::= <expr> 2. <expr> ::= <expr> + <term> 3. | <expr> - <term> 4. | <term> 5. <term> ::= <term> * <factor> 6. | <term> / <factor> 7. | <factor> 8. <factor> ::= number 9. | id
Syntax tree for x + 2 * y is now:
goal `-- expr |-- expr | `-- term | |-- factor | | `-- <id,x> | |-- * | `-- factor | `-- <num,2> |-- + `-- term `-- factor `-- <id,y>