CSCE 434 Lecture 15

« previous | Friday, September 27, 2013 | next »

Lecture Slides

Substitute: TA Gabriel Foust

Top-Down Parsers

Start at start symbol and then expand with transform rules to match tokens.
May require backtracking
If we can look ahead, we can use predictive parsing to prevent backtracking

Repeat following steps:

At node $A$ , select production with $A$ on LHS and for each symbol on RHS, construct appropriate child
When terminal is added to "fringe" (leaves) that doesn't match the input, then backtrack
Repeat for remaining RHS options

Example:

1 <goal> ::= <expr>
2 <expr> ::= <expr> + <term>
3          | <expr> - <term>
4          | <term>
5 <term> ::= <term> * <factor>
6          | <term> / <factor>
7          | <factor>
8 <factor> ::= number
9            | id

Sample parse of x - 2 * y

Prod'n	Representation	Input
—	`<goal>`	`↑ x - 2 * y`
1^L1	`<expr>`	`↑ x - 2 * y`
2	`<expr> + <term>`	`↑ x - 2 * y`
4	`<term> + >term>`	`↑ x - 2 * y`
7	`lt;factor> + <term>`	`↑ x - 2 * y`
9	`id + <term>`	`↑ x - 2 * y`
—	`'x' + <term>`	`x ↑ - 2 * y`
—	oops! didn't expect `-`; backtrack to ^L1	`x - ↑ 2 * y`
3	`<expr> - <term>`	`↑ x - 2 * y`
4	`<term> - <term>`	`↑ x - 2 * y`
7	`<factor> - <term>`	`↑ x - 2 * y`
9	`id - <term>`	`↑ x - 2 * y`
—	`'x' - <term>`	`x ↑ - 2 * y`
—	`'x' '-' <term>`	`x - ↑ 2 * y`
5	`'x' '-' <term> * <factor>`	`x - ↑ 2 * y`
7	`'x' '-' <factor> * <factor>`	`x - ↑ 2 * y`
8	`'x' '-' num * <factor>`	`x - ↑ 2 * y`
—	`'x' '-' '2' * <factor>`	`x - 2 ↑ * y`
—	`'x' '-' '2' '*' <factor>`	`x - 2 * ↑ y`
9	`'x' '-' '2' '*' id`	`x - 2 * ↑ y`
—	`'x' '-' '2' '*' 'y'`	`x - 2 * y ↑`

Left Recursion

Left recursive grammar could lead to following rule expansion:

<expr>
<expr> + <term>
<expr> + <term> + <term>
<expr> + <term> + <term> + <term>
...

Therefore left recursive grammars do not work with top-down parsers:

Definition. Any rule (or any number of rules) of form $A\Rightarrow ^{+}A\alpha$ for any nonterminal $A$ .

Right Recursion

Solves problem with left recursion (i.e. works in a top-down parser)

However, left-associative operators are not possible

1 <goal> ::= <expr>
2 <expr> ::= <term> + <expr>
3          | <term> - <expr>
4 <term> ::= <factor> * <term>
5          | <factor> / <term>
6 <factor> ::= number
7            | id

Now 1 - 2 - 3 would be interpreted as $1-(2-3)$ , which is wrong.

Eliminating Left Recursion

Consider

<foo> ::= <foo> α
        | β

Where $\alpha$ and $\beta$ do not start with <foo>

We can rewrite it as

<foo> ::= β <bar>
<bar> ::= α <bar>
        | ε

This transformation could be proven to not change the language.

Note: In general, it is undecidable whether two grammars accept the same languages

Now our example grammar becomes

1  <goal> ::= <expr>
2  <expr> ::= <term> <expr'>
3  <expr'> ::= + <term> <expr'>
4            | - <term> <expr'>
5            | ε
6  <term> ::= <factor> <term'>
7  <term'> ::= * <factor> <term'>
8            | / <factor> <term'>
9            | ε
10 <factor> ::= number
11            | id

And a parse of 3 * x + 7 is:

expr
|-- term
|   |-- factor
|   |   `-- <num,3>
|   `-- term'
|       |-- *
|       |-- factor
|       |   `-- <id,x>
|       `-- term'
|           `-- ε
`-- expr'
    |-- +
    |-- term
    |   |-- factor
    |   |   `-- <num, 7>
    |   `-- term'
    |       `-- ε
    `-- expr'
        `-- ε

Lookahead

We want to be able to make a good decision and prevent backtracking.

In general, we need arbitrary lookahead, but fortunately, large subclasses of context-free grammars can be parsed with limited lookahead:

LL(1): Left-to-right; leftmost derivation; 1 lookahead
LR(1): Left-to-right; rightmost derivation; 1 lookahead

leftmost/rightmost derivation defines which side of the parse tree it expands first. In the previous example tree,

A leftmost parser would expand term before expr'
A rightmost parser would expand expr' before term

Predictive Parsing

Define $\mathrm {first} (\alpha )$ for a RHS sequence $\alpha$ to be the set of all terminal symbols that appear as the first symbol in some string derived from $\alpha$ .

x\in \mathrm {first} (\alpha )\quad \iff \quad \exists \gamma \quad \alpha \rightarrow ^{*}x\gamma

In other words, of all the srings that could be formed by expanding $\alpha$ , $\mathrm {first} (\alpha )$ is the set of all first non-terminals of these strings.

A grammar is LL(1) if and only if the intersection of all choices in a production rule is empty: $\mathrm {first} (\alpha )\cap \mathrm {first} (\beta )=\emptyset$

Bottom-Up Parsers

Start at tokens in state for valid and work way to goal symbol (same as start symbol in top-down)