Category:CSCE 434 Exam 1

Review Slides

« previous | Friday, October 25, 2013 | next »

Location	HRBB 126
Date	Friday, October 25, 2013
Time	11:30–12:20

Lexical Analysis (Scanning)

Syntax Analysis (Parsing)

LL Parsing

Top-down parsers
build parse tree from root to leaves

LR Parsing

Bottom-up parsers
build parse tree from leaves to root

LR(1)

scan input from left to right (L)
build rightmost derivation in reverse (R)
use a single token lookahead to disambiguate (1)
Simple, Table-driven, shift-reduce skeleton
grammatical knowledge encoded in (parse lookup) tables
In general, LR parsers are practical, efficient, and easy to build

Skeleton Parser code:

def lr_parse(toks):
    stack = Stack();
    tokiter = iter(toks)
    tok = tokiter.next()

    while True:
        curr_state = stack.top()

        # Possible values are:
        #   ("shift", state_id),
        #   ("reduce", production{lhs,rhs}), or
        #   ("accept", None)
        behavior, payload = action[curr_state, tok]

        if behavior == "shift":
            next_state = payload

            stack.push(tok)
            stack.push(next_state)
            tok = tokiter.next()

        elif behavior == "reduce":
            A = payload['lhs']
            b = payload['rhs']

            stack.pop(2*len(b))
            curr_state = stack.top()
            stack.push(A)
            push goto[curr_state, A]

        elif behavior == "accept"
            return

        else:
            error()

This has a running time of $\Theta (k+l)$ , where $k$ is the number of shifts (the length of the input string), and $l$ is the number of reduces (depends on the grammar).

Example Tables

The following grammar:

1 <goal>   ::= <expr>
2 <expr>   ::= <term> + <expr>
3            | <term>
4 <term>   ::= <factor> * <term>
5            | <factor>
6 <factor> ::= id

gets translated into the following action and goto tables:

	action				goto
	id	+	*	$ (eof)	$\left\langle \mathrm {expr} \right\rangle$	$\left\langle \mathrm {term} \right\rangle$	$\left\langle \mathrm {factor} \right\rangle$
$S_{0}$	(shift, $S_{4}$ )				$S_{1}$	$S_{2}$	$S_{3}$
$S_{1}$				(accept, _)
$S_{2}$		(shift, $S_{5}$ )		(reduce, $\left\langle \mathrm {expr} \right\rangle ::=\left\langle \mathrm {term} \right\rangle$ )
$S_{3}$		(reduce, $\left\langle \mathrm {term} \right\rangle ::=\left\langle \mathrm {factor} \right\rangle$ )	(shift, $S_{6}$ )	(reduce, $\left\langle \mathrm {term} \right\rangle ::=\left\langle \mathrm {factor} \right\rangle$ )
$S_{4}$		(reduce, $\left\langle \mathrm {factor} \right\rangle ::=\mathrm {id}$ )	(reduce, $\left\langle \mathrm {factor} \right\rangle ::=\mathrm {id}$ )	(reduce, $\left\langle \mathrm {factor} \right\rangle ::=\mathrm {id}$ )
$S_{5}$	(shift, $S_{4}$ )				$S_{7}$	$S_{2}$	$S_{3}$
$S_{6}$	(shift, $S_{4}$ )					$S_{8}$	$S_{3}$
$S_{7}$				(reduce, $\left\langle \mathrm {expr} \right\rangle ::=\left\langle \mathrm {term} \right\rangle +\left\langle \mathrm {expr} \right\rangle$ )
$S_{8}$		(reduce, $\left\langle \mathrm {term} \right\rangle ::=\left\langle \mathrm {factor} \right\rangle *\left\langle \mathrm {term} \right\rangle$ )		(reduce, $\left\langle \mathrm {term} \right\rangle ::=\left\langle \mathrm {factor} \right\rangle *\left\langle \mathrm {term} \right\rangle$ )

First and Follow Sets

Used to build LR(1) tables.

First Set

Given a terminal or non-terminal grammar symbol $\alpha$ , the first set $\mathrm {first} (\alpha )$ is the set of tokens (terminal symbols) that are the first symbol in all possible strings derivable from $\alpha$ .

If $\alpha \Rightarrow *\epsilon$ , then $\epsilon$ is a member of $\mathrm {first} (\alpha )$

Building instructions:

If $X$ is a terminal, then $\mathrm {first} (X)=\left\{X\right\}$ .
If $X::=\epsilon$ , then $\epsilon \in \mathrm {first} (X)$ .
If $X::=Y_{1}\,Y_{2}\,\dots \,Y_{k}$ , then put $\mathrm {first} (Y_{1})$ in $\mathrm {first} (X)$ (i.e. $\mathrm {first} (X)\subseteq \mathrm {first} (Y_{1})$ ).
In the rule above, suppose the first $k$ nonterminals $Y_{1}\,\dots \,Y_{k}$ all have $\epsilon$ in their first-sets , this tells us that if $a\in \mathrm {first} (Y_{k+1})$ , then $a$ is a valid member of $\mathrm {first} (X)$ and each $\left.\mathrm {first} (Y_{i})\right|_{i=1}^{k}$

Note: The definition given here is suitable for LR(1) grammars. In general for LR(k), we define $\mathrm {first} _{k}(\alpha )$ as the leading $k$ tokens (not just 1) that begin strings derived from $\alpha$ .

Follow Set

Given a non-terminal grammar symbol $A$ , the follow set $\mathrm {follow} (A)$ is the set of terminals that can appear immediately after $A$ :

Suppose both $\alpha A\beta$ and $\alpha A\gamma$ appear as valid sentential forms (RHS of some other productions). Then $\beta$ and $\gamma$ are both members of the follow set of $A$ .

Building instructions:

Place eof in $\mathrm {follow} (\left\langle \mathrm {goal} \right\rangle )$ (a.k.a $\left\langle \mathrm {start} \right\rangle$ ... same concept)
If $A::=\alpha B\beta$ , then all members of $\mathrm {first} (\beta )$ —except $\epsilon$ —are also in $\mathrm {follow} (B)$ .
If $A::=\alpha B$ , then anything that follows $A$ can follow $B$ (i.e. put all members of $\mathrm {follow} (A)$ in $\mathrm {follow} (B)$ )
(A combination of the above two rules) If $A::=\alpha B\beta$ and $\epsilon \in \mathrm {first} (\beta )$ , then all members of $\mathrm {follow} (A)$ are in $\mathrm {follow} (B)$ .

Example

In the grammar above, the first and follow sets of each symbol are:

Symbol	first	follow
$\left\langle \mathrm {goal} \right\rangle$	$\left\{\mathrm {id} \right\}$	$\left\{\mathrm {eof} \right\}$
$\left\langle \mathrm {expr} \right\rangle$	$\left\{\mathrm {id} \right\}$	$\left\{\mathrm {eof} \right\}$
$\left\langle \mathrm {term} \right\rangle$	$\left\{\mathrm {id} \right\}$	$\left\{\mathrm {eof} ,+\right\}$
$\left\langle \mathrm {factor} \right\rangle$	$\left\{\mathrm {id} \right\}$	$\left\{\mathrm {eof} ,+,*\right\}$
$+$	$\left\{+\right\}$	$\emptyset$
$*$	$\left\{*\right\}$	$\emptyset$
$\mathrm {id}$	$\left\{\mathrm {id} \right\}$	$\emptyset$

LR(k) Items

Table construction algorithms use LR(k) items to represent the set of possible states in a parse.

an LR(k) item is a pair $[\alpha ,\beta ]$ where

$\alpha$ is a production rule from $G$ with a $\cdot$ at some position in the RHS
$\beta$ is a lookahead string containing $k$ symbols (terminals or eof)

(two cases of interest here are when $k=0$ and $k=1$ :

LR(0) items play a key role in SLR(1) table construction
LR(1) items play a role in LR(1) and LALR(1) construction

A sentential form is a partially expanded parse (mixed terminals and nonterminals) that might appear as the RHS of a production rule, for example. In particular, a right-sentential form is a sentential form in which only the right side has been expanded.

A handle is a substring in a rule. For example, $A::=\beta$ (or simply just $\beta$ ) provides a handle for $\alpha \beta \omega$ since $\beta$ could be reduced to form $\alpha A\omega$ .

A viable prefix is the prefix of a right sentential form that could potentially appear on the stack of a shift-reduce parser. In other words, it does not continue past the end of the handle for that sentential form. Note that in the example above, $\omega$ must consist only of terminal symbols, and viable prefixes consist of initial substrings of $\alpha \beta$ .

Table Construction Algorithms

SLR(1)
- grammar space: smallest class of grammars
- table size: smallest number of states
- performance: simple, fast construction
LR(1)
- grammar space: full set of LR(1) grammars
- table size: largest number of states
- performance: slow, large construction
LALR(1)
- grammar space: intermediate sized set of grammars
- table size: same as SLR(1) — small
- performance: canonical construction is slow and large