CSCE 434 Lecture 19

From Notes
Jump to navigation Jump to search

« previous | Monday, October 7, 2013 | next »

Lecture Slides

Tangents

LR Parsing

Three common algorithms to construct LR tables:

  • SLR(1) = LR(0) + FOLLOW
    • smallest class of grammars
    • smallest tables (number of states)
    • simple, fast construction
  • LR(1)
    • full set of LR(1) grammars
    • Largest number of states
    • Slow, large construction
  • LALR(1)
    • intermediate sized set of grammars
    • Same number of states as SLR(1)
    • canonical construction is slow and large
    • but better solutions exist

For example, LR(1) parser for ALGOL or PASCAL has 1000s of states, while SLR(1) or LALR(1) only has 100s

tangent about ALGOL and PASCAL; written by Niklaus Wirth.

Smaller tables are more desirable because it can fit better in the cache.

Viable Prefix

Formally, a prefix of a right-sentential form that:

  1. does not ontinue past the right end of the rightmost handle of that sentential form [1]
  2. can appear on the stack of a shift-reduce parser. That is, as long as the prefix represented by the stack is viable, the parser has not seen a detectable error.

A viable prefix is an invariant of shift-reduce parsing: the top of the stack always contains a viable prefix.


SLR(1)

Viable prefix of a right-sentential form:

  • contains both terminals and nonterminals
  • can be recognized with an NFA or a DFA.

To build SLR,

  • start with NFA
  • construct DFA
  • augment DFA with follow sets to disambiguate reductions

States in NFA are LR(0) items, whereas states in DFA are sets of LR(0) items.

LR(0) Items

LR(0) items are strings denoted as

  • is a production from with a cursor • at some position
  • the cursor indicates how much of an item we have seen at a given state in the parse.

Examples:

  • indicates that the parser is looking for a string that can be derived from
  • indicates that the parser has seen a string derived from and is looking for one derivable from
  • The production rule generates 4 LR(0) items:
Canonical LR(0) Items

SLR(1) Table construction uses specific sets of LR(0) items (collectively called the canonical collection of sets of LR(0) items for a grammar , or just canonical collection for short)

The Canonical collection represents the set of valid states for the LR parser.

Two classes of items:

kernel items
Items where • is not at the left end of the RHS of a production rule
Add as "identity"
non-kernel items
items where • is at the left end of the RHS

Closure

To generate a state, we compute its closure (all possible outcomes of the particular state):

  • If , then in state , the parser might next see a string derivable from .
  • To form its closure, add all items of the form .

Given an item , its closure contains the item itself and any other items that can generate legal substrings to follow

Therefore, if the parser has a viable prefix on its stack, the input should reduce to (or for some other item in the closure).

Creating the closure of :

closure(I) {
  bool new_item;
  do {
    new_item = false;
    for (Item [A ::= a.Bb] in I) {
      for (Production "B ::= g" in Gprime) {
        if ([B ::= .g] not in I) {
          I.add([B ::= .g]);
          new_item = true;
        }
      }
    }
  } while (new_item != false)
  return I;
}


SLR(1) uses lookahead to guide decision whether to shift or reduce

Goto

Let be a set of LR(0) items, and be a grammar symbol.

is the set of all items such that is in .

If is the set of valid items for some viable prefix , then is the set of valid items for the viable prefix .

represents the state after recognizing in state .

goto(I,X) {
  J = items [A ::= aX.b] such that [A ::= a.Xb] in I;
  J' = closure(J);
  return J';
}


Table Construction

Start construction of collection of sets of LR(0) items with , where

  • is the start symbol of the augmented grammar
  • is the start symbol of

To compute collection of sets of LR(0) items,

def items(G'):
  S0 = closure(set([S' ::= . S]))
  Items = set(S0)
  ToDo = set(S0)
  while not ToDo.empty():
    remove Si from ToDo
    for X in grammar symbols:
      Snew = goto(Si, X)
      if Snew is a new state:
        Items.append(set(Snew))
        ToDo.append(set(Snew))
  return Items


LR(0) Machines

  • states: canonical sets of LR(0) items
  • edges: goto transitions
  • recognizes all viable prefixes of handles
  • no lookahead

To recognize viable prefixes of the language (instead of handles), we must be able to reduce the handles to nonterminals.

Reducing a handle (RHS of production) to a nonterminal can be viewed as

  • returning to a state at beginning of handle (must use a stack!)
  • making transition on nonterminal


SLR(1) Tables

SLR(1) augments the LR(0) machine by adding FOLLOW information using one token of lookahead.

These are encoded as ACTION and GOTO tables.

ACTION Table:

  • for each [state, lookahead] pair,
  • have we reached the end of handle?
  • if not, "shift".
  • If at end of handle, "reduce" (by production rule)
  • "accept" and "error" are also valid actions
  • use lookahead to guide precision

GOTO Table:

  • For each [state, nonterminal] pair,
  • pick state togo to after reduction
  • look at nonterminal at top of stack.

Algorithm:

  1. Construct items(G')
  2. State of the parser is constructed from .
    • If (a must be a terminal) and , then set ACTION[i,a] to "shift ".
    • If , then set ACTION[i,a] to "reduce " for all a in .
    • If , then set ACTION[i, eof] to "accept".
  3. If , then set GOTO[i,A] to .
  4. All other entries in ACTION and GOTO are set to "error"
  5. The initial state of the parser is the state constructed from the set containing the item .

SLR(1) Parser Example

Grammar:

1  E ::= T + E
2      | T
3  T ::= id

Augmented Grammar:

0  S' ::= E
1  E ::= T + E
2      | T
3  T ::= id
Symbol first follow
S' { id } { eof }
E { id } { eof }
T { id } { +, eof }

States:

GOTO Construction iterations:

Start
Iteration 1
Iteration 2
Iteration 3

ACTION and GOTO Tables:

  ACTION GOTO
  id + eof expr term
shift 3 1 2
accept
shift 4 reduce 2
reduce 3 reduce 3
shift 3 5 2
reduce 1


Potential Problems

If either of these happen, the grammar is not SLR(1)

Shift/Reduce Conflicts

ambiguous construct in the grammar (parser doesn't know whether to shift or reduce).

  • grammar can be modified to eliminate conflict
  • resolve in favor of shifting

Classic example: dangling else

Reduce/Reduce Conflicts

Another grammar ambiguity

  • often no simple resolution
  • parse a nearby language

Classic example: PL/I call and subscript ([])

Usually resolved with context

Footnotes

  1. If the grammar is unambiguous, and LR(k) grammars are generally unambiguous, there is a unique rightmost handle.