CSCE 314 Lecture 14
« previous | Wednesday, September 28, 2011 | next »
Today's topic: Japanese electronics companies...
Haskell Parsing
data Tok = TokIdentifier String | TokIntLit Integer | TokPlus | TokOpenParen | TokCloseParen | ...
data AST = APlus AST AST | AVariable String | AIntLit Integer | ...
The Lexer Takes this:
(4 + i) + 3
and produces this:
[TokOpenParen, TokIntLit 4, TokPlus, TokIdentifier "i", TokCloseParen, TokPlus, TokIntLit 3]
The Parser takes the list of tokens and produces this:
APlus (
APlus (
AIntLit 4
AVariable "i"
)
AIntLit 3
)
Parser Type
At first glance, it would seem that a parser is of the type
type Parser = String -> Tree
However, the parser does not consume the entire string of characters (if there's a synax error, for example)
type Parser = String -> (Tree, String)
To account for ambiguity and test additional parse decisions and if want to return something else besides a string...
type Parser a = String -> [(a, String)]
We parse token streams instead of characters:
type TokenParser b a = [b] -> [(a, [b])]
Parsers are functions and can be applied directly, but it's sometimes useful to have a <tt>parse</tt> function to do this explicitly:
<syntaxhighlight lang="haskell">parse item "parse this"
Choices
What if we need to make choices with the grammar (backtracking; trying one way then trying another way)? ... define a composition of two parsers:
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case parse p inp of
[] -> parse q inp
[(v, out)] -> [(v, out)]
Sequencing
To sequence parsers for <if-stmt> :: if ( <expr> ) then <stmt>
if and then would be their own parsers
(>>=) :: Parser a -> (a -> Parser b -> Parser b
p >>= f = \inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (f v) out
"Bind" operator:
- fails if p fails
- applies f to the result of p
- results in a new parser, which is applied to the remaining characters
Example:
>((failure +++ item) >>= (\_ -> item)) "abc"
[('b',"c")]
This kind of code can get messy quickly, but parsers are normally structured the following way:
p1 >>= \v1 ->
p2 >>= \v2 ->
...
pn >>= \v3 ->
return (f v1 v2 ... vn)
or with syntactic sugar:
do v1 <- p1
v2 <- p2
...
vn <- pn
return (f v1 v2 ... vn)
Examples
Basic parser
Most basic parser consumes no input and always succeeds:
return :: a -> Parser a
return v = \inp -> [(v, inp)]
counterpart is a failure method that alwais fails:
failure :: Parser a
failure = \inp -> []
Examples:
> return 7 "parse this"
[(7, "parse this")]
> failure "parse this"
[]
Item Parser
Extracts the first character and succeeds.
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
Example usage:
> item "blah"
[('b',"lah")]