CSCE 434 Lecture 11

« previous | Wednesday, September 18, 2013 | next »

Lecture Slides

The Parser

Scanner produces lexemes, and the Parser takes these lexemes to produce the IL

"It takes the words and puts them into sentences"

Perform context-free syntax analysis (we'll add limited context later)
Guide context-sensitive analysis
Like the scanner, it should output meaningful error messages
Can attempt error-correction

It's even possible to compile high-level language back into a high-level language

SUIF - Stanford University Intermediate Format

reads in C and spits out C
Useful for research, but impractical

Context-Free Syntax

Terminal Symbols: the stuff retured by flex
Nonterminal Symbols: syntactic, structural "variables" that represent (sub)strings in the language
Start Nonterminal: Denotes the entire set of strings in the language specified by the language; can never appear on the RHS of a production rule
Production Rules: transformations in following format: single nonterminal on LHS, any combination of terminals and nonterminals on the RHS.

Grammars expressed in Backus-Naur format:

1. <goal> ::= <expr>
2. <expr> ::= <expr> <op> <expr>
3.          | number
4.          | id
5. <op> ::= +
6.        | -
7.        | *
8.        | /

Why context-free? They're easy to specify and easy to convert to DFA

Syntax-directed Translation:

If language is well-behaved (i.e. no goto statements), program can be represented with only a stack.
Otherwise, construction requires a $O(n)$ graph.

CFGs are used to count:

open/close brackets
begin-end
if-then-else

Grammar for cc has 188 productions

The Parse

Recognizing x + 2 * y:

<goal> -> <expr>
       -> <expr> <op> <expr>
       -> <expr> <op> <expr> <op> <expr>
       -> <id,x> <op> <expr> <op> <expr>
       -> <id,x> + <expr> <op> <expr>
       -> <id,x> + <num,2> <op> <expr>
       -> <id,x> + <num,2> * <expr>
       -> <id,x> + <num,2> * <id,y>

We denote this as <goal> ⇒^* x + 2 * y ("x+2*y" is a valid sentence in $G$ ), and the sequence of rules that gets us there (1,2,2,4,5,3,7,4) is called a parse

Two types of derivation:

leftmost: only the leftmost non-terminal symbol is replaced at each step (as in this example)
rightmost: only the rightmost non-terminal symbol is replaced at each step

Regular Grammars

Only rules of the following form are allowed:

<A> ::= a<A>
<A> ::= a

These are called type 3 grammars (according to Chomsky hierarchy)

Complexity

Regular Grammars	DFAs	$O(n)$
LR Grammars	Knuth's Algorithm	$O(n)$
Arbitrary CFGs	Early's algorithm	$O(n^{3})$
Arbitrary CSGs	Ibas	P-SPACE COMPLETE

Parse Tree

Start is always the root

leaves are terminal symbols

Precedence

Previous example would interpret x + 2 * y as (x+2)*y, which is wrong.

The following grammar is a corrected version that respects mathematical precedence:

1. <goal> ::= <expr>
2. <expr> ::= <expr> + <term>
3.          | <expr> - <term>
4.          | <term>
5. <term> ::= <term> * <factor>
6.          | <term> / <factor>
7.          | <factor>
8. <factor> ::= number
9.            | id

Syntax tree for x + 2 * y is now:

goal
`-- expr
    |-- expr
    |   `-- term
    |       |-- factor
    |       |   `-- <id,x>
    |       |-- *
    |       `-- factor
    |           `-- <num,2>
    |-- +
    `-- term
        `-- factor
            `-- <id,y>