CSCE 434 Lecture 11

From Notes
Jump to navigation Jump to search

« previous | Wednesday, September 18, 2013 | next »

Lecture Slides

The Parser

Scanner produces lexemes, and the Parser takes these lexemes to produce the IL

"It takes the words and puts them into sentences"

  • Perform context-free syntax analysis (we'll add limited context later)
  • Guide context-sensitive analysis
  • Like the scanner, it should output meaningful error messages
  • Can attempt error-correction

It's even possible to compile high-level language back into a high-level language

SUIF - Stanford University Intermediate Format

  • reads in C and spits out C
  • Useful for research, but impractical

Context-Free Syntax

  • Terminal Symbols: the stuff retured by flex
  • Nonterminal Symbols: syntactic, structural "variables" that represent (sub)strings in the language
  • Start Nonterminal: Denotes the entire set of strings in the language specified by the language; can never appear on the RHS of a production rule
  • Production Rules: transformations in following format: single nonterminal on LHS, any combination of terminals and nonterminals on the RHS.

Grammars expressed in Backus-Naur format:

1. <goal> ::= <expr>
2. <expr> ::= <expr> <op> <expr>
3.          | number
4.          | id
5. <op> ::= +
6.        | -
7.        | *
8.        | /

Why context-free? They're easy to specify and easy to convert to DFA

Syntax-directed Translation:

  • If language is well-behaved (i.e. no goto statements), program can be represented with only a stack.
  • Otherwise, construction requires a graph.

CFGs are used to count:

  • open/close brackets
  • begin-end
  • if-then-else

Grammar for cc has 188 productions

The Parse

Recognizing x + 2 * y:

<goal> -> <expr>
       -> <expr> <op> <expr>
       -> <expr> <op> <expr> <op> <expr>
       -> <id,x> <op> <expr> <op> <expr>
       -> <id,x> + <expr> <op> <expr>
       -> <id,x> + <num,2> <op> <expr>
       -> <id,x> + <num,2> * <expr>
       -> <id,x> + <num,2> * <id,y>


We denote this as <goal> ⇒* x + 2 * y ("x+2*y" is a valid sentence in ), and the sequence of rules that gets us there (1,2,2,4,5,3,7,4) is called a parse

Two types of derivation:

  1. leftmost: only the leftmost non-terminal symbol is replaced at each step (as in this example)
  2. rightmost: only the rightmost non-terminal symbol is replaced at each step

Regular Grammars

Only rules of the following form are allowed:

<A> ::= a<A>
<A> ::= a

These are called type 3 grammars (according to Chomsky hierarchy)


Complexity

Regular Grammars DFAs
LR Grammars Knuth's Algorithm
Arbitrary CFGs Early's algorithm
Arbitrary CSGs Ibas P-SPACE COMPLETE

Parse Tree

Start is always the root

leaves are terminal symbols

Precedence

Previous example would interpret x + 2 * y as (x+2)*y, which is wrong.

The following grammar is a corrected version that respects mathematical precedence:

1. <goal> ::= <expr>
2. <expr> ::= <expr> + <term>
3.          | <expr> - <term>
4.          | <term>
5. <term> ::= <term> * <factor>
6.          | <term> / <factor>
7.          | <factor>
8. <factor> ::= number
9.            | id

Syntax tree for x + 2 * y is now:

goal
`-- expr
    |-- expr
    |   `-- term
    |       |-- factor
    |       |   `-- <id,x>
    |       |-- *
    |       `-- factor
    |           `-- <num,2>
    |-- +
    `-- term
        `-- factor
            `-- <id,y>