CSCE 434 Lecture 15
« previous | Friday, September 27, 2013 | next »
Substitute: TA Gabriel Foust
Top-Down Parsers
- Start at start symbol and then expand with transform rules to match tokens.
- May require backtracking
- If we can look ahead, we can use predictive parsing to prevent backtracking
Repeat following steps:
- At node , select production with on LHS and for each symbol on RHS, construct appropriate child
- When terminal is added to "fringe" (leaves) that doesn't match the input, then backtrack
- Repeat for remaining RHS options
Example:
1 <goal> ::= <expr> 2 <expr> ::= <expr> + <term> 3 | <expr> - <term> 4 | <term> 5 <term> ::= <term> * <factor> 6 | <term> / <factor> 7 | <factor> 8 <factor> ::= number 9 | id
Sample parse of x - 2 * y
Prod'n | Representation | Input |
---|---|---|
— | <goal> | ↑ x - 2 * y |
1L1 | <expr> | ↑ x - 2 * y |
2 | <expr> + <term> | ↑ x - 2 * y |
4 | <term> + >term> | ↑ x - 2 * y |
7 | lt;factor> + <term> | ↑ x - 2 * y |
9 | id + <term> | ↑ x - 2 * y |
— | 'x' + <term> | x ↑ - 2 * y |
— | oops! didn't expect -; backtrack to L1 | x - ↑ 2 * y |
3 | <expr> - <term> | ↑ x - 2 * y |
4 | <term> - <term> | ↑ x - 2 * y |
7 | <factor> - <term> | ↑ x - 2 * y |
9 | id - <term> | ↑ x - 2 * y |
— | 'x' - <term> | x ↑ - 2 * y |
— | 'x' '-' <term> | x - ↑ 2 * y |
5 | 'x' '-' <term> * <factor> | x - ↑ 2 * y |
7 | 'x' '-' <factor> * <factor> | x - ↑ 2 * y |
8 | 'x' '-' num * <factor> | x - ↑ 2 * y |
— | 'x' '-' '2' * <factor> | x - 2 ↑ * y |
— | 'x' '-' '2' '*' <factor> | x - 2 * ↑ y |
9 | 'x' '-' '2' '*' id | x - 2 * ↑ y |
— | 'x' '-' '2' '*' 'y' | x - 2 * y ↑ |
Left Recursion
Left recursive grammar could lead to following rule expansion:
<expr> <expr> + <term> <expr> + <term> + <term> <expr> + <term> + <term> + <term> ...
Therefore left recursive grammars do not work with top-down parsers:
Definition. Any rule (or any number of rules) of form for any nonterminal .
Right Recursion
Solves problem with left recursion (i.e. works in a top-down parser)
However, left-associative operators are not possible
1 <goal> ::= <expr> 2 <expr> ::= <term> + <expr> 3 | <term> - <expr> 4 <term> ::= <factor> * <term> 5 | <factor> / <term> 6 <factor> ::= number 7 | id
Now 1 - 2 - 3 would be interpreted as , which is wrong.
Eliminating Left Recursion
Consider
<foo> ::= <foo> α | β
Where and do not start with <foo>
We can rewrite it as
<foo> ::= β <bar> <bar> ::= α <bar> | ε
This transformation could be proven to not change the language.
Now our example grammar becomes
1 <goal> ::= <expr> 2 <expr> ::= <term> <expr'> 3 <expr'> ::= + <term> <expr'> 4 | - <term> <expr'> 5 | ε 6 <term> ::= <factor> <term'> 7 <term'> ::= * <factor> <term'> 8 | / <factor> <term'> 9 | ε 10 <factor> ::= number 11 | id
And a parse of 3 * x + 7 is:
expr |-- term | |-- factor | | `-- <num,3> | `-- term' | |-- * | |-- factor | | `-- <id,x> | `-- term' | `-- ε `-- expr' |-- + |-- term | |-- factor | | `-- <num, 7> | `-- term' | `-- ε `-- expr' `-- ε
Lookahead
We want to be able to make a good decision and prevent backtracking.
In general, we need arbitrary lookahead, but fortunately, large subclasses of context-free grammars can be parsed with limited lookahead:
- LL(1): Left-to-right; leftmost derivation; 1 lookahead
- LR(1): Left-to-right; rightmost derivation; 1 lookahead
leftmost/rightmost derivation defines which side of the parse tree it expands first. In the previous example tree,
- A leftmost parser would expand term before expr'
- A rightmost parser would expand expr' before term
Predictive Parsing
Define for a RHS sequence to be the set of all terminal symbols that appear as the first symbol in some string derived from .
In other words, of all the srings that could be formed by expanding , is the set of all first non-terminals of these strings.
A grammar is LL(1) if and only if the intersection of all choices in a production rule is empty:
Bottom-Up Parsers
Start at tokens in state for valid and work way to goal symbol (same as start symbol in top-down)