CSCE 314 Lecture 14

« previous | Wednesday, September 28, 2011 | next »

Today's topic: Japanese electronics companies...

Haskell Parsing

data Tok = TokIdentifier String | TokIntLit Integer | TokPlus | TokOpenParen | TokCloseParen | ...

data AST = APlus AST AST | AVariable String | AIntLit Integer | ...

The Lexer Takes this:

(4 + i) + 3

and produces this:

[TokOpenParen, TokIntLit 4, TokPlus, TokIdentifier "i", TokCloseParen, TokPlus, TokIntLit 3]

The Parser takes the list of tokens and produces this:

APlus (
    APlus (
        AIntLit 4
        AVariable "i"
    )
    AIntLit 3
)

Parser Type

At first glance, it would seem that a parser is of the type

type Parser = String -> Tree

However, the parser does not consume the entire string of characters (if there's a synax error, for example)

type Parser = String -> (Tree, String)

To account for ambiguity and test additional parse decisions and if want to return something else besides a string...

type Parser a = String -> [(a, String)]

A parser is a function that takes an input and returns a list of results and what was left over after parsing)

We parse token streams instead of characters:

type TokenParser b a = [b] -> [(a, [b])]



Parsers are functions and can be applied directly, but it's sometimes useful to have a <tt>parse</tt> function to do this explicitly:
<syntaxhighlight lang="haskell">parse item "parse this"

Choices

What if we need to make choices with the grammar (backtracking; trying one way then trying another way)? ... define a composition of two parsers:

(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case parse p inp of
    []         -> parse q inp
    [(v, out)] -> [(v, out)]

Sequencing

To sequence parsers for <if-stmt> :: if ( <expr> ) then <stmt>

if and then would be their own parsers

(>>=) :: Parser a -> (a -> Parser b -> Parser b
p >>= f = \inp -> case parse p inp of
    []         -> []
    [(v, out)] -> parse (f v) out

"Bind" operator:

fails if p fails
applies f to the result of p
results in a new parser, which is applied to the remaining characters

Example:

>((failure +++ item) >>= (\_ -> item)) "abc"
[('b',"c")]

This kind of code can get messy quickly, but parsers are normally structured the following way:

p1 >>= \v1 ->
    p2 >>= \v2 ->
        ...
            pn >>= \v3 ->
                return (f v1 v2 ... vn)

or with syntactic sugar:

do v1 <- p1
   v2 <- p2
   ...
   vn <- pn
   return (f v1 v2 ... vn)

Examples

Basic parser

Most basic parser consumes no input and always succeeds:

return :: a -> Parser a
return v = \inp -> [(v, inp)]

counterpart is a failure method that alwais fails:

failure :: Parser a
failure = \inp -> []

Examples:

> return 7 "parse this"
[(7, "parse this")]

> failure "parse this"
[]

Item Parser

Extracts the first character and succeeds.

item :: Parser Char
item = \inp -> case inp of
    []     -> []
    (x:xs) -> [(x,xs)]

Example usage:

> item "blah"
[('b',"lah")]