Parser Combinators

The above demo expects expressions conforming to a simple calculator grammar (the numbers and symbols have their usual meaning):

num    ::= ('0'|..|'9'|'.')+
factor ::= ('+'|'-')* ( num | '(' expr ')' )
term   ::= factor ( ('*'|'/') factor )*
expr   ::= term ( ('+'|'-') term )*
line   ::= expr EOF

We’ll see how to parse and evaluate expressions with code that takes about the same space as the grammar. But first, we need a few imports so we can show off our interpreter on this webpage:

import Control.Monad
import Haste.DOM
import Haste.Events

The interpreter itself closely follows the grammar. The main difference is one extra line for unary operators:

import Text.ParserCombinators.Parsec

line :: Parser Double
line = expr >>= (eof >>) . pure where
  num  = read <$> many1 (digit <|> char '.')
  una  = product <$> many ((char '+' >> pure 1) <|> (char '-' >> pure (-1)))
  fac  = (*) <$> una <*> num <|> between (char '(') (char ')') expr
  term = fac  `chainl1` ((char '*' >> pure (*)) <|> (char '/' >> pure (/)))
  expr = term `chainl1` ((char '+' >> pure (+)) <|> (char '-' >> pure (-)))

Lastly, we need code that pulls from the input text area when Enter is hit and writes the corresponding result to the output text area then scrolls down. For convenience, we strip out spaces from the input string.

main = withElems ["input", "output"] $ \[iEl, oEl] ->
  iEl `onEvent` KeyDown $ \d -> when (d == mkKeyData 13) $ do
    s <- filter (/= ' ') <$> getProp iEl "value"
    setProp iEl "value" ""
    hist <- (++ ("> " ++ s ++ "\n")) <$> getProp oEl "value"
    setProp oEl "value" $ (hist ++) $ case parse line "" s of
      Left e -> "Error: " ++ show e ++ "\n"
      Right d -> show d ++ "\n"
    getProp oEl "scrollHeight" >>= setProp oEl "scrollTop"

To build this demo yourself, install Haste and AsciiDoc, and then type:

$ haste-cabal install parsec
$ wget
$ hastec parse.lhs
$ sed 's/^\\.*{code}$/-----/' parse.lhs | asciidoc -o - - > parse.html

Open parse.html in a browser to see it action.

Patching our partial parser

Our use of the partial function read should be viewed with suspicion, and indeed, we find inputs such as "1.2.3" cause exceptions. We can fix this with readMaybe from Text.Read:

num = do
  s <- many1 (digit <|> char '.')
  case readMaybe s of
    Just x -> pure x
    Nothing -> fail "bad float" <?> "number"

Using readFloat from Numeric instead is similar.

Unfortunately, a Haste bug means these solutions fail. To work around the bug, we can roll our own floating-point reading routine. If we’re taking the trouble to do this, we may as well also modify the grammar to accept at most one decimal point, which produces friendlier error messages:

num = do
  s <- many1 digit
  t <- option "" $ char '.' >> many1 digit
  pure $ fromIntegral (read $ s ++ t) / fromIntegral (10 ^ length t)

Alternatively, we could use the Parsec library’s routines:

import Text.ParserCombinators.Parsec.Language
import Text.ParserCombinators.Parsec.Token

num = do
  e <- naturalOrFloat $ makeTokenParser emptyDef
  pure $ case e of
    Left  n -> fromIntegral n
    Right d -> d

This also adds support for all numeric literals defined in the Haskell report such as "1.23e-4" or "0xface".

It turns out Parsec’s Expr module is specifically designed for expression grammars like ours, and can build a parser from a supplied table of operators. However, it hides the interesting part of the library, namely the combinators. Also, using the module hardly saves any lines of code in our case.

The evolution of parsing

As a child, I was baffled by parsers because I only pursued iterative approaches. How does the computer handle arbitrarily nested parentheses?!

I was awestruck when I learned about recursive descent parsers. (For memory, it was Advanced Turbo C by Herbert Schildt.) Simple, elegant, yet so powerful. I felt I knew everything about parsers.

I was awestuck again on encountering the “red dragon book” at university. Forget those childish recursive descent parsers! Instead, the computer should do the hard work; the human should just write a handful of regular expressions and a BNF grammar. Not only is it simple and elegant, but there are strong guarantees: the parser generator produces efficient code and detects ambiguities. I felt I knew everything about parsers. This time for real.

I was awestruck yet again years later when I stumbled upon parser combinators. I had thought combinators were a mathematical curiosity; a fun alternative to Turing machines for studying computability. But parsing functions that return a parsing function in addition to performing their usual duties turn out to be ideal for succinct recursive descent parsers. The notation is so good that we can relax the somewhat arbitrary segregation of lexing and parsing. In some cases, we sprinkle semantics throughout the grammar and we’re done!

I now feel I know hardly anything about parsers.

These days, I let expedience be my guide: if I need the nice features guaranteed by the theory, or if parser combinators are unavailable, then I’ll resort to a parser generator. Otherwise, I’ll choose parser combinators.

Ben Lynn 💡