CSCE 434 Lecture 19

« previous | Monday, October 7, 2013 | next »

Lecture Slides

Tangents

Thread Building Blocks

LR Parsing

Three common algorithms to construct LR tables:

SLR(1) = LR(0) + FOLLOW
- smallest class of grammars
- smallest tables (number of states)
- simple, fast construction
LR(1)
- full set of LR(1) grammars
- Largest number of states
- Slow, large construction
LALR(1)
- intermediate sized set of grammars
- Same number of states as SLR(1)
- canonical construction is slow and large
- but better solutions exist

For example, LR(1) parser for ALGOL or PASCAL has 1000s of states, while SLR(1) or LALR(1) only has 100s

tangent about ALGOL and PASCAL; written by Niklaus Wirth.

Smaller tables are more desirable because it can fit better in the cache.

Viable Prefix

Formally, a prefix of a right-sentential form that:

does not ontinue past the right end of the rightmost handle of that sentential form ^[1]
can appear on the stack of a shift-reduce parser. That is, as long as the prefix represented by the stack is viable, the parser has not seen a detectable error.

A viable prefix is an invariant of shift-reduce parsing: the top of the stack always contains a viable prefix.

SLR(1)

Viable prefix of a right-sentential form:

contains both terminals and nonterminals
can be recognized with an NFA or a DFA.

To build SLR,

start with NFA
construct DFA
augment DFA with follow sets to disambiguate reductions

States in NFA are LR(0) items, whereas states in DFA are sets of LR(0) items.

LR(0) Items

LR(0) items are strings denoted as $[\alpha ]$

$\alpha$ is a production from $G$ with a cursor • at some position
the cursor indicates how much of an item we have seen at a given state in the parse.

Examples:

$[A::=\cdot XYZ]$ indicates that the parser is looking for a string that can be derived from $XYZ$
$[A::=XY\cdot Z]$ indicates that the parser has seen a string derived from $XY$ and is looking for one derivable from $Z$
The production rule $A::=XYZ$ $A::=XYZ$ generates 4 LR(0) items:
1. $[A::=\cdot XYZ]$
2. $[A::=X\cdot YZ]$
3. $[A::=XY\cdot Z]$
4. $[A::=XYZ\cdot ]$

Canonical LR(0) Items

SLR(1) Table construction uses specific sets of LR(0) items (collectively called the canonical collection of sets of LR(0) items for a grammar $G$ , or just canonical collection for short)

The Canonical collection represents the set of valid states for the LR parser.

Two classes of items:

kernel items: Items where • is not at the left end of the RHS of a production rule; Add $[S'::=\cdot S]$ as "identity"
non-kernel items: items where • is at the left end of the RHS

Closure

To generate a state, we compute its closure (all possible outcomes of the particular state):

If $[A::=\alpha \cdot B\beta ]\in I_{j}$ , then in state $j$ , the parser might next see a string derivable from $B\beta$ .
To form its closure, add all items of the form $[B::=\cdot \gamma ]\in G$ .

Given an item $[A::=\alpha \cdot B\beta ]$ , its closure contains the item itself and any other items that can generate legal substrings to follow $\alpha$

Therefore, if the parser has a viable prefix $\alpha$ on its stack, the input should reduce to $B\beta$ (or $\gamma$ for some other item $[B::=\cdot \gamma ]$ in the closure).

Creating the closure of $I$ :

closure(I) {
  bool new_item;
  do {
    new_item = false;
    for (Item [A ::= a.Bb] in I) {
      for (Production "B ::= g" in Gprime) {
        if ([B ::= .g] not in I) {
          I.add([B ::= .g]);
          new_item = true;
        }
      }
    }
  } while (new_item != false)
  return I;
}

SLR(1) uses lookahead to guide decision whether to shift or reduce

Goto

Let $I$ be a set of LR(0) items, and $X$ be a grammar symbol.

$\mathrm {goto} (I,X)$ is the set of all items $\left[A::=\alpha X\cdot \beta \right]$ such that $\left[A::=\alpha \cdot X\beta \right]$ is in $I$ .

If $I$ is the set of valid items for some viable prefix $\gamma$ , then $\mathrm {goto} (I,X)$ is the set of valid items for the viable prefix $\gamma X$ .

$\mathrm {goto} (I,X)$ represents the state after recognizing $X$ in state $I$ .

goto(I,X) {
  J = items [A ::= aX.b] such that [A ::= a.Xb] in I;
  J' = closure(J);
  return J';
}

Table Construction

Start construction of collection of sets of LR(0) items with $[S'::=\cdot S]$ , where

$S'$ is the start symbol of the augmented grammar $G'$
$S$ is the start symbol of $G$

To compute collection of sets of LR(0) items,

def items(G'):
  S0 = closure(set([S' ::= . S]))
  Items = set(S0)
  ToDo = set(S0)
  while not ToDo.empty():
    remove Si from ToDo
    for X in grammar symbols:
      Snew = goto(Si, X)
      if Snew is a new state:
        Items.append(set(Snew))
        ToDo.append(set(Snew))
  return Items

LR(0) Machines

states: canonical sets of LR(0) items
edges: goto transitions
recognizes all viable prefixes of handles
no lookahead

To recognize viable prefixes of the language (instead of handles), we must be able to reduce the handles to nonterminals.

Reducing a handle (RHS of production) to a nonterminal can be viewed as

returning to a state at beginning of handle (must use a stack!)
making transition on nonterminal

SLR(1) Tables

SLR(1) augments the LR(0) machine by adding FOLLOW information using one token of lookahead.

These are encoded as ACTION and GOTO tables.

ACTION Table:

for each [state, lookahead] pair,
have we reached the end of handle?
if not, "shift".
If at end of handle, "reduce" (by production rule)
"accept" and "error" are also valid actions
use lookahead to guide precision

GOTO Table:

For each [state, nonterminal] pair,
pick state togo to after reduction
look at nonterminal at top of stack.

Algorithm:

Construct items(G')
State $i$ $i$ of the parser is constructed from $I_{i}$ $I_{i}$ .
- If $[A::=\alpha \cdot {\mathtt {a}}\beta ]\in I_{i}$ (a must be a terminal) and $goto(I_{i},{\mathtt {a}})=I_{j}$ , then set ACTION[i,a] to "shift $j$ ".
- If $[A::=\alpha \cdot ]\in I_{i}$ , then set ACTION[i,a] to "reduce $A::=\alpha$ " for all a in $\mathrm {follow} (A)$ .
- If $[S'::=S\cdot ]\in I_{i}$ , then set ACTION[i, eof] to "accept".
If $\mathrm {goto} (I_{i},A)=I_{j}$ , then set GOTO[i,A] to $j$ .
All other entries in ACTION and GOTO are set to "error"
The initial state of the parser is the state constructed from the set containing the item $[S'::=\cdot S]$ .

SLR(1) Parser Example

Grammar:

1  E ::= T + E
2      | T
3  T ::= id

Augmented Grammar:

0  S' ::= E
1  E ::= T + E
2      | T
3  T ::= id

Symbol	first	follow
`S'`	`{ id }`	`{ eof }`
`E`	`{ id }`	`{ eof }`
`T`	`{ id }`	`{ +, eof }`

States:

$S_{0}=\left\{[S'::=\cdot E],\ [E::=\cdot T+E],\ [E::=\cdot T],\ [T::=\cdot {\mathtt {id}}]\right\}$
$S_{1}=\left\{[S'::=E\cdot ]\right\}$
$S_{2}=\left\{[E::=T\cdot +E],\ [E::=T\cdot ]\right\}$
$S_{3}=\left\{[T::=\mathrm {id} \cdot ]\right\}$
$S_{4}=\left\{[E::=T+\cdot E],\ [E::=\cdot T+E],\ [E::=\cdot T],\ [T::=\cdot {\mathtt {id}}]\right\}$
$S_{5}=\left\{[E::=T+E\cdot ]\right\}$

GOTO Construction iterations:

Start: $S_{0}\gets \mathrm {closure} ({[S::=\cdot E]})$
Iteration 1: $\mathrm {goto} (S_{0},E)=S_{1}$; $\mathrm {goto} (S_{0},T)=S_{2}$; $\mathrm {goto} (S_{0},{\mathtt {id}})=S_{3}$
Iteration 2: $\mathrm {goto} (S_{2},+)=S_{4}$
Iteration 3: $\mathrm {goto} (S_{4},{\mathtt {id}})=S_{3}$; $\mathrm {goto} (S_{4},E)=S_{5}$; $\mathrm {goto} (S_{4},T)=S_{2}$

ACTION and GOTO Tables:

	ACTION			GOTO
	`id`	`+`	`eof`	expr	term
$S_{0}$	shift 3	—	—	1	2
$S_{1}$	—	—	accept	—	—
$S_{2}$	—	shift 4	reduce 2	—	—
$S_{3}$	—	reduce 3	reduce 3	—	—
$S_{4}$	shift 3	—	—	5	2
$S_{5}$	—	—	reduce 1	—	—

Potential Problems

If either of these happen, the grammar is not SLR(1)

Shift/Reduce Conflicts

ambiguous construct in the grammar (parser doesn't know whether to shift or reduce).

grammar can be modified to eliminate conflict
resolve in favor of shifting

Classic example: dangling else

Reduce/Reduce Conflicts

Another grammar ambiguity

often no simple resolution
parse a nearby language

Classic example: PL/I call and subscript ([])

Usually resolved with context

Footnotes

↑ If the grammar is unambiguous, and LR(k) grammars are generally unambiguous, there is a unique rightmost handle.

[1] If the grammar is unambiguous, and LR(k) grammars are generally unambiguous, there is a unique rightmost handle.

[1]