CSCE 434 Lecture 18
« previous | Friday, October 4, 2013 | next »
Review of Top-Down LL(1) Parsing
Grammar is LL(1) if and only if, for all non-terminals , each distinct pair of productions and satisfy the condition
In other words, for a production rule
- all first-sets of all production rules are pairwise disjoint
- If , then , for all , .
Bottom-up (LR) Parsing
- start at leaves and use production rules to reduce to goal
- Key problem: Choosing RHS of production to use
- LR parsers use stack and lookahead to choose production.
- Lookahead is just extra decoration for state; each state has more information.
- LR is generally more powerful than LL
Starting with start symbol , any string such that is called a sentential form
- If contains only terminals, then is a sentence in
- If contains at least one non-terminal, then it is just a sentential form.
Bottom-up parser repeatedly matches RHS of a productino against a substring in the current right-sentential form.
In other words, if our grammar has , and we have Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a \, b} at the top of our stack, then we can replace the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a \, b} with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} .
Example
Consider the grammar
1 <goal> ::= a <A> <B> e 2 <A> ::= <A> b c 3 | b 4 <B> ::= d
and the input sentence abbcde
| Production | Sentential Form | Handle |
|---|---|---|
| — | abbcde |
3, 2 |
| 3 | a<A>bcde |
2, 4 |
| 2 | a<A>de |
4, 3 |
| 4 | a<A><B>e |
1, 4 |
| 1 | <goal> |
— |
Handles
Definition:
- a tuple containing the matching production rule and the position in the sentence where it matches.
- a substring Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} of the current right-sentential form where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha } matches some production Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A ::= \alpha} and reducing Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} is one step in the reverse rightmost derivation
Invariant: The substring of a handle contains only terminal symbols.
Provable fact: If Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G} is unambiguous, then every right-sentential form has a unique handle.
Proof. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G} is unambiguous, then rightmost derivation is unique. This implies that a unique production Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A ::= \beta} applied to take Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \gamma_{i-1}} to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \gamma_i} . Next a unique position Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k} at which Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A ::= \beta} is applied. The result is a handle Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (A ::= \beta, k)}
Shift-Reduce Parsing
- Simple to understand
- Have simple, table-driven, shift-reduce skeleton
- grammatical knowledge encoded in tables
Algorithm
- Initialize stack with "$"
- Repeat the following until the top of the stack is the goal symbol and the input token is "end of file"
- Find the handle. If we don't have a handle, shift an input symbol onto the stack
- Prune the handle. If we have a handle Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (A ::= \beta, k)}
on the stack, then reduce
- pop Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle |\beta|} symbols off the stack
- push Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} onto the stack
LR(1)
Everyone's favorite parser
Describe a proper superset of languages recognized by LL (predictive) parsers
Token token = next_token();
while (true) {
Token s = top of stack
if (action[s,token] == "shift s[i]") {
push token;
push s[i];
token = next_token();
} else if (action[s, token] == "reduce A ::= b") {
pop 2 * |b| symbols;
s = top of stack;
push A;
push goto[s,A];
} else if (action[s, token] == "accept") {
return;
} else error();
}