CSCE 434 Lecture 19
« previous | Monday, October 7, 2013 | next »
Tangents
LR Parsing
Three common algorithms to construct LR tables:
- SLR(1) = LR(0) + FOLLOW
- smallest class of grammars
- smallest tables (number of states)
- simple, fast construction
- LR(1)
- full set of LR(1) grammars
- Largest number of states
- Slow, large construction
- LALR(1)
- intermediate sized set of grammars
- Same number of states as SLR(1)
- canonical construction is slow and large
- but better solutions exist
For example, LR(1) parser for ALGOL or PASCAL has 1000s of states, while SLR(1) or LALR(1) only has 100s
Smaller tables are more desirable because it can fit better in the cache.
Viable Prefix
Formally, a prefix of a right-sentential form that:
- does not ontinue past the right end of the rightmost handle of that sentential form [1]
- can appear on the stack of a shift-reduce parser. That is, as long as the prefix represented by the stack is viable, the parser has not seen a detectable error.
A viable prefix is an invariant of shift-reduce parsing: the top of the stack always contains a viable prefix.
SLR(1)
Viable prefix of a right-sentential form:
- contains both terminals and nonterminals
- can be recognized with an NFA or a DFA.
To build SLR,
- start with NFA
- construct DFA
- augment DFA with follow sets to disambiguate reductions
States in NFA are LR(0) items, whereas states in DFA are sets of LR(0) items.
LR(0) Items
LR(0) items are strings denoted as
- is a production from with a cursor • at some position
- the cursor indicates how much of an item we have seen at a given state in the parse.
Examples:
- indicates that the parser is looking for a string that can be derived from
- indicates that the parser has seen a string derived from and is looking for one derivable from
- The production rule generates 4 LR(0) items:
Canonical LR(0) Items
SLR(1) Table construction uses specific sets of LR(0) items (collectively called the canonical collection of sets of LR(0) items for a grammar , or just canonical collection for short)
The Canonical collection represents the set of valid states for the LR parser.
Two classes of items:
- kernel items
- Items where • is not at the left end of the RHS of a production rule
- Add as "identity"
- non-kernel items
- items where • is at the left end of the RHS
Closure
To generate a state, we compute its closure (all possible outcomes of the particular state):
- If , then in state , the parser might next see a string derivable from .
- To form its closure, add all items of the form .
Given an item , its closure contains the item itself and any other items that can generate legal substrings to follow
Therefore, if the parser has a viable prefix on its stack, the input should reduce to (or for some other item in the closure).
Creating the closure of :
closure(I) {
bool new_item;
do {
new_item = false;
for (Item [A ::= a.Bb] in I) {
for (Production "B ::= g" in Gprime) {
if ([B ::= .g] not in I) {
I.add([B ::= .g]);
new_item = true;
}
}
}
} while (new_item != false)
return I;
}
SLR(1) uses lookahead to guide decision whether to shift or reduce
Goto
Let be a set of LR(0) items, and be a grammar symbol.
is the set of all items such that is in .
If is the set of valid items for some viable prefix , then is the set of valid items for the viable prefix .
represents the state after recognizing in state .
goto(I,X) {
J = items [A ::= aX.b] such that [A ::= a.Xb] in I;
J' = closure(J);
return J';
}
Table Construction
Start construction of collection of sets of LR(0) items with , where
- is the start symbol of the augmented grammar
- is the start symbol of
To compute collection of sets of LR(0) items,
def items(G'):
S0 = closure(set([S' ::= . S]))
Items = set(S0)
ToDo = set(S0)
while not ToDo.empty():
remove Si from ToDo
for X in grammar symbols:
Snew = goto(Si, X)
if Snew is a new state:
Items.append(set(Snew))
ToDo.append(set(Snew))
return Items
LR(0) Machines
- states: canonical sets of LR(0) items
- edges: goto transitions
- recognizes all viable prefixes of handles
- no lookahead
To recognize viable prefixes of the language (instead of handles), we must be able to reduce the handles to nonterminals.
Reducing a handle (RHS of production) to a nonterminal can be viewed as
- returning to a state at beginning of handle (must use a stack!)
- making transition on nonterminal
SLR(1) Tables
SLR(1) augments the LR(0) machine by adding FOLLOW information using one token of lookahead.
These are encoded as ACTION and GOTO tables.
ACTION Table:
- for each [state, lookahead] pair,
- have we reached the end of handle?
- if not, "shift".
- If at end of handle, "reduce" (by production rule)
- "accept" and "error" are also valid actions
- use lookahead to guide precision
GOTO Table:
- For each [state, nonterminal] pair,
- pick state togo to after reduction
- look at nonterminal at top of stack.
Algorithm:
- Construct items(G')
- State of the parser is constructed from .
- If (a must be a terminal) and , then set ACTION[i,a] to "shift ".
- If , then set ACTION[i,a] to "reduce " for all a in .
- If , then set ACTION[i, eof] to "accept".
- If , then set GOTO[i,A] to .
- All other entries in ACTION and GOTO are set to "error"
- The initial state of the parser is the state constructed from the set containing the item .
SLR(1) Parser Example
Grammar:
1 E ::= T + E 2 | T 3 T ::= id
Augmented Grammar:
0 S' ::= E 1 E ::= T + E 2 | T 3 T ::= id
Symbol | first | follow |
---|---|---|
S' | { id } | { eof } |
E | { id } | { eof } |
T | { id } | { +, eof } |
States:
GOTO Construction iterations:
- Start
- Iteration 1
- Iteration 2
- Iteration 3
ACTION and GOTO Tables:
ACTION | GOTO | ||||
---|---|---|---|---|---|
id | + | eof | expr | term | |
shift 3 | — | — | 1 | 2 | |
— | — | accept | — | — | |
— | shift 4 | reduce 2 | — | — | |
— | reduce 3 | reduce 3 | — | — | |
shift 3 | — | — | 5 | 2 | |
— | — | reduce 1 | — | — |
Potential Problems
If either of these happen, the grammar is not SLR(1)
Shift/Reduce Conflicts
ambiguous construct in the grammar (parser doesn't know whether to shift or reduce).
- grammar can be modified to eliminate conflict
- resolve in favor of shifting
Classic example: dangling else
Reduce/Reduce Conflicts
Another grammar ambiguity
- often no simple resolution
- parse a nearby language
Classic example: PL/I call and subscript ([])
Usually resolved with context
Footnotes
- ↑ If the grammar is unambiguous, and LR(k) grammars are generally unambiguous, there is a unique rightmost handle.