CSCE 434 Lecture 9
Jump to navigation
Jump to search
« previous | Friday, September 13, 2013 | next »
TODO: Hennesey, Patterson; Memory Hierarchy
Finite State Machines
Compact way of representing transformation rules of context-free grammar
3 basic states:
- Initial / starting state
- Accept (valid string)
- Reject (errors)
Deterministic Finite Automaton (DFA) code for scanner:
char c = next_char();
state = S0; // start state code
done = false;
token_value = "";
while(not done) {
int class = char_class[c];
int state = next_state[class, state];
switch (state) {
case S1: // building an id
token_value += c;
c = next_char();
break;
case S2: // accept case
token_type = IDENTIFIER;
done = true;
break;
case S3: // error
token_type = ERROR;
done = true;
break;
}
return token_type;
}
Table rows can be used to represent state transformation (input states → output vals and state), as in sequential logic from CSCE 312
We use case statements instead of conditionals here because machine can use JUMP statements (single evaluation) to go directly to a state instead of evaluating a predicate multiple times. Even better, GOTO:
S1: setup()
do_something()
if (condition)
goto S2
else
goto S3
S2: setup()
do_something_else()
if (condition)
goto S1
S3: setup
do_yet_another_thing()
// ...
Problematic features:
- reserved keywords
- significant blanks (whitespace)
- string constants and special chars in strings
- finite closures (limitations on identifier lengths)
Lexical Errors
- Report it?
- Correct it? (it is possible)
- Ignore and Recover?
Techniques
- minimum distance correction (did you mean?)
- hard token recovery (reset state after getting error)
Scanner Generators
- Construct a DFA
- Minimize states (automaton optimization, e.g. merge identical states)
- Emit code/table