CSCE 434 Lecture 15

From Notes
Jump to navigation Jump to search

« previous | Friday, September 27, 2013 | next »

Lecture Slides

Substitute: TA Gabriel Foust

Top-Down Parsers

  • Start at start symbol and then expand with transform rules to match tokens.
  • May require backtracking
  • If we can look ahead, we can use predictive parsing to prevent backtracking

Repeat following steps:

  1. At node , select production with on LHS and for each symbol on RHS, construct appropriate child
  2. When terminal is added to "fringe" (leaves) that doesn't match the input, then backtrack
  3. Repeat for remaining RHS options


Example:

1 <goal> ::= <expr>
2 <expr> ::= <expr> + <term>
3          | <expr> - <term>
4          | <term>
5 <term> ::= <term> * <factor>
6          | <term> / <factor>
7          | <factor>
8 <factor> ::= number
9            | id

Sample parse of x - 2 * y

Prod'n Representation Input
<goal> ↑ x - 2 * y
1L1 <expr> ↑ x - 2 * y
2 <expr> + <term> ↑ x - 2 * y
4 <term> + >term> ↑ x - 2 * y
7 lt;factor> + <term> ↑ x - 2 * y
9 id + <term> ↑ x - 2 * y
'x' + <term> x ↑ - 2 * y
oops! didn't expect -; backtrack to L1 x - ↑ 2 * y
3 <expr> - <term> ↑ x - 2 * y
4 <term> - <term> ↑ x - 2 * y
7 <factor> - <term> ↑ x - 2 * y
9 id - <term> ↑ x - 2 * y
'x' - <term> x ↑ - 2 * y
'x' '-' <term> x - ↑ 2 * y
5 'x' '-' <term> * <factor> x - ↑ 2 * y
7 'x' '-' <factor> * <factor> x - ↑ 2 * y
8 'x' '-' num * <factor> x - ↑ 2 * y
'x' '-' '2' * <factor> x - 2 ↑ * y
'x' '-' '2' '*' <factor> x - 2 * ↑ y
9 'x' '-' '2' '*' id x - 2 * ↑ y
'x' '-' '2' '*' 'y' x - 2 * y ↑

Left Recursion

Left recursive grammar could lead to following rule expansion:

<expr>
<expr> + <term>
<expr> + <term> + <term>
<expr> + <term> + <term> + <term>
...

Therefore left recursive grammars do not work with top-down parsers:

Definition. Any rule (or any number of rules) of form for any nonterminal .

Right Recursion

Solves problem with left recursion (i.e. works in a top-down parser)

However, left-associative operators are not possible

1 <goal> ::= <expr>
2 <expr> ::= <term> + <expr>
3          | <term> - <expr>
4 <term> ::= <factor> * <term>
5          | <factor> / <term>
6 <factor> ::= number
7            | id

Now 1 - 2 - 3 would be interpreted as , which is wrong.

Eliminating Left Recursion

Consider

<foo> ::= <foo> α
        | β

Where and do not start with <foo>

We can rewrite it as

<foo> ::= β <bar>
<bar> ::= α <bar>
        | ε

This transformation could be proven to not change the language.

Note: In general, it is undecidable whether two grammars accept the same languages

Now our example grammar becomes

1  <goal> ::= <expr>
2  <expr> ::= <term> <expr'>
3  <expr'> ::= + <term> <expr'>
4            | - <term> <expr'>
5            | ε
6  <term> ::= <factor> <term'>
7  <term'> ::= * <factor> <term'>
8            | / <factor> <term'>
9            | ε
10 <factor> ::= number
11            | id

And a parse of 3 * x + 7 is:

expr
|-- term
|   |-- factor
|   |   `-- <num,3>
|   `-- term'
|       |-- *
|       |-- factor
|       |   `-- <id,x>
|       `-- term'
|           `-- ε
`-- expr'
    |-- +
    |-- term
    |   |-- factor
    |   |   `-- <num, 7>
    |   `-- term'
    |       `-- ε
    `-- expr'
        `-- ε

Lookahead

We want to be able to make a good decision and prevent backtracking.

In general, we need arbitrary lookahead, but fortunately, large subclasses of context-free grammars can be parsed with limited lookahead:

  • LL(1): Left-to-right; leftmost derivation; 1 lookahead
  • LR(1): Left-to-right; rightmost derivation; 1 lookahead

leftmost/rightmost derivation defines which side of the parse tree it expands first. In the previous example tree,

  • A leftmost parser would expand term before expr'
  • A rightmost parser would expand expr' before term

Predictive Parsing

Define for a RHS sequence to be the set of all terminal symbols that appear as the first symbol in some string derived from .

In other words, of all the srings that could be formed by expanding , is the set of all first non-terminals of these strings.

A grammar is LL(1) if and only if the intersection of all choices in a production rule is empty:


Bottom-Up Parsers

Start at tokens in state for valid and work way to goal symbol (same as start symbol in top-down)