CSCE 434 Lecture 15

From Notes
Jump to navigation Jump to search

« previous | Friday, September 27, 2013 | next »

Lecture Slides

Substitute: TA Gabriel Foust

Top-Down Parsers

  • Start at start symbol and then expand with transform rules to match tokens.
  • May require backtracking
  • If we can look ahead, we can use predictive parsing to prevent backtracking

Repeat following steps:

  1. At node Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} , select production with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} on LHS and for each symbol on RHS, construct appropriate child
  2. When terminal is added to "fringe" (leaves) that doesn't match the input, then backtrack
  3. Repeat for remaining RHS options


Example:

1 <goal> ::= <expr>
2 <expr> ::= <expr> + <term>
3          | <expr> - <term>
4          | <term>
5 <term> ::= <term> * <factor>
6          | <term> / <factor>
7          | <factor>
8 <factor> ::= number
9            | id

Sample parse of x - 2 * y

Prod'n Representation Input
<goal> ↑ x - 2 * y
1L1 <expr> ↑ x - 2 * y
2 <expr> + <term> ↑ x - 2 * y
4 <term> + >term> ↑ x - 2 * y
7 lt;factor> + <term> ↑ x - 2 * y
9 id + <term> ↑ x - 2 * y
'x' + <term> x ↑ - 2 * y
oops! didn't expect -; backtrack to L1 x - ↑ 2 * y
3 <expr> - <term> ↑ x - 2 * y
4 <term> - <term> ↑ x - 2 * y
7 <factor> - <term> ↑ x - 2 * y
9 id - <term> ↑ x - 2 * y
'x' - <term> x ↑ - 2 * y
'x' '-' <term> x - ↑ 2 * y
5 'x' '-' <term> * <factor> x - ↑ 2 * y
7 'x' '-' <factor> * <factor> x - ↑ 2 * y
8 'x' '-' num * <factor> x - ↑ 2 * y
'x' '-' '2' * <factor> x - 2 ↑ * y
'x' '-' '2' '*' <factor> x - 2 * ↑ y
9 'x' '-' '2' '*' id x - 2 * ↑ y
'x' '-' '2' '*' 'y' x - 2 * y ↑

Left Recursion

Left recursive grammar could lead to following rule expansion:

<expr>
<expr> + <term>
<expr> + <term> + <term>
<expr> + <term> + <term> + <term>
...

Therefore left recursive grammars do not work with top-down parsers:

Definition. Any rule (or any number of rules) of form Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A \Rightarrow^+ A \alpha} for any nonterminal Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} .

Right Recursion

Solves problem with left recursion (i.e. works in a top-down parser)

However, left-associative operators are not possible

1 <goal> ::= <expr>
2 <expr> ::= <term> + <expr>
3          | <term> - <expr>
4 <term> ::= <factor> * <term>
5          | <factor> / <term>
6 <factor> ::= number
7            | id

Now 1 - 2 - 3 would be interpreted as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 1-(2-3)} , which is wrong.

Eliminating Left Recursion

Consider

<foo> ::= <foo> α
        | β

Where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} do not start with <foo>

We can rewrite it as

<foo> ::= β <bar>
<bar> ::= α <bar>
        | ε

This transformation could be proven to not change the language.

Note: In general, it is undecidable whether two grammars accept the same languages

Now our example grammar becomes

1  <goal> ::= <expr>
2  <expr> ::= <term> <expr'>
3  <expr'> ::= + <term> <expr'>
4            | - <term> <expr'>
5            | ε
6  <term> ::= <factor> <term'>
7  <term'> ::= * <factor> <term'>
8            | / <factor> <term'>
9            | ε
10 <factor> ::= number
11            | id

And a parse of 3 * x + 7 is:

expr
|-- term
|   |-- factor
|   |   `-- <num,3>
|   `-- term'
|       |-- *
|       |-- factor
|       |   `-- <id,x>
|       `-- term'
|           `-- ε
`-- expr'
    |-- +
    |-- term
    |   |-- factor
    |   |   `-- <num, 7>
    |   `-- term'
    |       `-- ε
    `-- expr'
        `-- ε

Lookahead

We want to be able to make a good decision and prevent backtracking.

In general, we need arbitrary lookahead, but fortunately, large subclasses of context-free grammars can be parsed with limited lookahead:

  • LL(1): Left-to-right; leftmost derivation; 1 lookahead
  • LR(1): Left-to-right; rightmost derivation; 1 lookahead

leftmost/rightmost derivation defines which side of the parse tree it expands first. In the previous example tree,

  • A leftmost parser would expand term before expr'
  • A rightmost parser would expand expr' before term

Predictive Parsing

Define Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathrm{first}(\alpha)} for a RHS sequence Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} to be the set of all terminal symbols that appear as the first symbol in some string derived from Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} .

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x \in \mathrm{first}(\alpha) \quad \iff \quad \exists \gamma \quad \alpha \rightarrow^* x\gamma}

In other words, of all the srings that could be formed by expanding Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} , Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathrm{first}(\alpha)} is the set of all first non-terminals of these strings.

A grammar is LL(1) if and only if the intersection of all choices in a production rule is empty: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathrm{first}(\alpha) \cap \mathrm{first}(\beta) = \emptyset}


Bottom-Up Parsers

Start at tokens in state for valid and work way to goal symbol (same as start symbol in top-down)