Parsing With Derivatives

Recall a context-free grammar is a set of production rules where each rule describes how a variable (also called a non-terminal) can be replaced with a sequence of variables and strings (also called terminals). For example, the following is a context-free grammar for binary arithmetic expressions:

expr   -> expr op expr
expr   -> "-" number
expr   -> number
expr   -> "(" expr ")"
op     -> "+"
op     -> "-"
op     -> "*"
op     -> "/"
number -> bit number
number -> bit ""
bit    -> "0"
bit    -> "1"

In practice, rules for the same variable are collected together and written as a single rule, with a vertical bar (|) separating the possible expansions. Denoting subexpressions with parentheses saves even more space. For example, our above grammar can be written:

expr   -> expr op expr | number | "-" number | "(" expr ")"
op     -> "+" | "-" | "*" | "/"
number -> bit (number | "")
bit    -> "0" | "1"

It is well-known that determining if a string can be produced by a given context-free grammars (the parsing problem) can be solved in cubic time with Earley’s algorithm or the GLR algorithm.

But it is less well-known that it can also be done via derivatives:

Yacc is dead: a brief explanation of parsing with derivatives.
Parsing with Derivatives functional pearl: improved version of original paper.
On the Complexity and Performance of Parsing with Derivatives: proves the algorithm has cubic time complexity.

In fact, derivatives are so obscure that some veteran computer scientists deny these facts. Russ Cox claims it’s "easy construct [sic] plausible examples that take exponential time to reject an input":

S -> T
T -> T + T | N
N -> 1

Supposedly, parsing with derivatives takes exponential time to reject inputs of the form "1+1+1+…1+1+1+1".

Is he right? Let’s find out with a little code.

Parsing parsing expressions; a grammar for grammars

We define a data type to represent the right-hand side of a production rule, which we call Pe for parsing expression. Non-empty strings are represented as concatenations of Char values. The empty string is represented by Empty.

data Pe = Var String | Ch Char | Empty | Pe :. Pe | Pe :| Pe |
  Nul | Eps Char | Del Pe deriving Show

The purpose of the mysterious data constructors Nul, Eps, Del will be become clear later. They are unneeded for representing a grammar.

We whip up a parser for grammars like our example with parser combinators. We store the grammar in a map, keyed by the variable on the left-hand side of a production rule.

jsEval "curl_module('../compiler/Charser.ob')"

jsEval "curl_module('../compiler/Map.ob')"

import Charser
import Map

cfg :: Charser (Map String Pe)
cfg = filler *> (fromList <$> some rule) <* eof where
  rule = (,) <$> sym <*> (want "=" *> expr <* want ";")
  expr = foldl1 (:|) <$> cat `sepBy1` want "|"
  cat = foldl1 (:.) <$> some atm
  atm = str <|> Var <$> sym <|> want "(" *> expr <* want ")"
  sym = some alphaNumChar <* filler
  str = char '"' *> (ngram <$> many (noneOf "\"")) <* want "\""
  want s = string s <* filler
  filler = many $ com <|> oneOf " \n" *> pure ()
  com = string "--" *> some (noneOf "\n") *> pure ()
  ngram = \case
    "" -> Empty
    s  -> foldr1 (:.) $ Ch <$> s

For example:

Right calc = parse cfg "" [r|
expr   = expr op expr | number | "-" number | "(" expr ")";
op     = "+" | "-" | "*" | "/";
number = bit (number | "");
bit    = "0" | "1";
|]

The least fixpoint

Now for a brief interlude: abstract mathematics that turns out to have concrete applications.

Suppose we have a bunch of simultaneous Boolean equations with a single variable on the left:

A = A ∨ (B ∧ C) ∨ 0
B = 1 ∨ D
C = C ∨ B
D = E
E = D

Suppose each variable can take one of three values: 0 (false), 1 (true), or ⊥ (undefined). Then we can find a solution as follows:

Set all variables to ⊥.
For each equation, evaluate the right-hand side, using rules such as 1 ∨ ⊥ = 1 and ⊥ ∧ 0 = 0, along with the latest values of the variables.
Did this change any values of the variables? If a 0 changed to 1 or vice versa, then report a contradiction and halt. If ⊥ became 0 or 1 then go back to step 2. Otherwise halt.

This works because the only transitions for each variable assignment that keep the algorithm going are from ⊥ to 0 or from ⊥ to 1. So if there are \(N\) variables, then we halt after at most \(N\) steps.

In mathematical terms, we can define a partial order:

\[\{0,1,\bot | \bot \le 0, \bot \le 1\}\]

which induces a partial order on the set of possible assignments \( \{0,1,\bot\}^N \); an assignment is less than or equal to another exactly when this is true coordinate-wise. We can think of a lesser assignment as being less defined.

In particular, the \(N\)-tuple where each coordinate is \(\bot\) is less than or equal to every other assignment; it is the bottom of our partially ordered set of assignments.

We represent the special case of contradiction with \(\top\) and define this element to be greater than all other elements.

Our algorithm can be viewed as repeatedly feeding the output of a function \(f\) to itself until there are no more valid changes. In other words, it finds a fixed point of \(f\).

This function \(f\) is monotonic, namely \(x \le y \implies f(x) \le f(y)\), and we start from the least element \(\bot\), so we wind up with a least fixpoint of \(f\).

The element ⊤ is always a fixpoint of \(f\). In general there may exist multiple nontrivial fixpoints. In our above example, the least fixpoint is \(\{1, 1, 1, \bot, \bot\}\), but there are two other nontrivial fixpoints \(\{1, 1, 1, 0, 0\}\) and \(\{1, 1, 1, 1, 1\}\).

See the Kleene Fixed-Point Theorem. Least fixpoints feature in denotational semantics, though we’ll only need them to solve Boolean equations.

Nothing matters

A parsing expression is nullable if it accepts the empty string.

The following converts a parsing expression to a Boolean formula that evaluates to true exactly when it is nullable. The nullability of expressions can depend on the nullability of the nonterminals it contains, thus the Boolean formulas may contain Boolean variables.

data BFor = BCon Bool | BVar String | BFor :\/ BFor | BFor :/\ BFor deriving Show

bfor = \case
  Nul    -> BCon False
  Eps _  -> BCon True
  Del _  -> BCon True

  Empty  -> BCon True
  Ch _   -> BCon False
  x :| y -> case bfor x of
    BCon True -> BCon True
    BCon False -> bfor y
    _ -> case bfor y of
      BCon True -> BCon True
      BCon False -> bfor x
      _ -> bfor x :\/ bfor y
  x :. y -> case bfor x of
    BCon True -> bfor y
    BCon False -> BCon False
    _ -> case bfor y of
      BCon True -> bfor x
      BCon False -> BCon False
      _ -> bfor x :/\ bfor y
  Var s   -> BVar s

Thanks to constant folding, an expression is either true, false, or a bunch of variables connected with logical AND and OR operations. From the rules, we see there is never a need for NOT.

example = [r|
A = "X";
B = "" | "" | "";
C = C | "" | A;
D = E E F;
E = F "X" F | F F;
F = D F | D "" E | E F;
|]

(bfor <$>) <$> parse cfg "" example

We solve the equations by finding the least fixpoint using the algorithm described earlier, but with some optimizations. For each variable, we record the formulas it appears in. When the truth value of a variable has been determined, we only re-evaluate the formulas in which it appears, rather than all formulas.

Re-evaluating a formula is simply constant folding and constant propagation:

reval solns = go where
  go = \case
    BCon b -> BCon b
    BVar v -> case mlookup v solns of
      Nothing -> BVar v
      Just b -> BCon b
    x :\/ y -> case go x of
      BCon True -> BCon True
      BCon False -> go y
      _ -> case go y of
        BCon True -> BCon True
        BCon False -> go x
        _ -> go x :\/ go y
    x :/\ y -> case go x of
      BCon True -> go y
      BCon False -> BCon False
      _ -> case go y of
        BCon True -> go x
        BCon False -> BCon False
        _ -> go x :/\ go y

For example:

reval (fromList [("A", True), ("B", False)]) $
  BVar "A" :/\ (BVar "B" :\/ BVar "A")

Our implementation is crude. The deps map associates a Boolean variable v with a plain list of other variables whose nullability formulas contain v somewhere, hence the uniqcons helper for preventing duplicates in a list.

mustLookup k m = maybe (error "mustLookup") id $ mlookup k m
uniqcons x xs
  | elem x xs = xs
  | otherwise = x:xs

leastfix solns bots deps = \case
  [] -> foldr (flip insert False) solns $ keys bots
  v:todo -> case mlookup v solns of
    Just _ -> leastfix solns bots deps todo
    Nothing -> case reval solns $ mustLookup v bots of
      BCon b -> leastfix (insert v b solns) (delete v bots) deps $
        foldr uniqcons todo $ maybe [] id $ mlookup v deps
      sol -> leastfix solns (insert v sol bots) deps todo

We bundle a few data structures together in the PWD data type:

The nonterminals whose nullabilities are known.
A map of nonterminals to the parsing expressions they expand to.
A list of nonterminals whose nullability we need to figure out.

data PWD = PWD
  { nulls :: Map String Bool
  , derivs :: Map String Pe
  , inbox :: [String]
  } deriving Show

The following computes the nullability of each variable in a given context-free grammar. After computing the least fixpoint, any remaining undefined variables are considered non-nullable, as they correspond to production rules that never terminate, apart from branches that lead to non-nullable terminals.

bvars = \case
  BCon _ -> []
  BVar s -> [s]
  x :\/ y -> bvars x `union` bvars y
  x :/\ y -> bvars x `union` bvars y

solve pwd = pwd
  { nulls = leastfix (nulls pwd) bots deps $ inbox pwd
  , inbox = []
  }
  where
  bots = foldr (\v -> insert v $ bfor $ mustLookup v $ derivs pwd) Tip $ inbox pwd
  deps = foldr addDeps Tip $ toAscList bots
  addDeps (v, t) m = foldr ($) m $ flip (insertWith uniqcat1) [v] <$> bvars t
  uniqcat1 [x] xs = uniqcons x xs

initNulls m = solve $ PWD
  { nulls = Tip
  , derivs = m
  , inbox = keys m
  }

In our next example, only D is nullable:

nulls . initNulls <$> parse cfg "" [r|
A = A | "x";
B = C;
C = B;
D = D | "";
|]

We can also completely solve our example from above:

nulls . initNulls <$> parse cfg "" example

Derivative Work

We write \(D_c f\) for the the derivative of a parsing expression \(f\) with respect to a character \(c.\)

We provisionally define \(D_c f\) to be a parsing expression \(f'\) that accepts a string \(s\) if and only if \(f\) accepts \(c : s\).

For example, since the empty string rejects any character, for all \(c\) we have \(D_c \varepsilon = \emptyset\), where \(\varepsilon\) denotes the empty string, and \(\emptyset\) denotes a parsing expression that accepts no strings, which is Nul in our code.

Naturally, we get:

\[ D_c (f | g) = D_c f | D_c g \]

What about \(D_c x\) for a character \(x\)? If \(c \ne x\) then obviously \(D_c x = \emptyset\).

But if \(c = x\), although \(D_c x = \varepsilon\) is a reasonable answer, in practice, solving the parsing problem is not enough; we want to return at least one parse tree, and ideally the entire parse forest.

Thus we tag each \(\varepsilon\) with the character that just got eaten, which will later allow us to construct parse trees:

\[ D_c c = \varepsilon_c \]

We represent this with the Eps constructor in our code.

Similarly, if \(f\) is nullable, we have:

\[D_c (f . g) = D_c f . g | \delta_f D_c g \]

where \(\delta_f\) is like \(\varepsilon\) except it is tagged with a parsing expression, that again later allows us to construct parse trees. We represent this with the Del constructor in our code.

If \(f\) is non-nullable then we simply have:

\[D_c (f . g) = D_c f . g \]

The rules for computing derivatives are conceptually simple, but consider a production rule like:

S = S | "x";

Then:

\[ D_c S = D_c S | D_c \text{x} \]

and we see the derivative \(D_c S\) depends on itself. In general, there could be circular dependencies among multiple derivatives.

We maintain a map of all derivatives we have computed. Initially, it holds a context-free grammar: the keys are variable names and the values are the parsing expressions they expand to. For a key s and a character c, we store the derivative of its parsing expression with respect to c at the key c:s, which is convenient but can cause collisions. We carefully write our demos to avoid them. For example, we might stick to capital letters for variables and never use them as constants on the right-hand side.

When computing a derivative of a variable f, if f is already in the map, then we simply return f. Otherwise, we insert a placeholder undefined value for f and add f to the inbox list before traversing the parsing expression corresponding to f. Eventually, this finds all dependencies of the derivatives of f without getting caught in an infinite loop.

Next we, determine the nullabilities of f and its dependencies, so that we can derive the derivatives we just computed.

We can probably simplify the code. Rather than repeated solve-derive cycles, perhaps we could mix it all up in code that resembles dynamic programming with memoization. But my gut feeling is it’s faster to solve nullability in waves.

derive :: Char -> String -> State PWD Pe
derive c s = do
  pwd <- get
  case mlookup (c:s) $ derivs pwd of
    Just _ -> pure ()
    Nothing -> do
      put $ pwd
        { derivs = insert (c:s) undefined $ derivs pwd
        , inbox = (c:s) : inbox pwd
        }
      d <- go $ mustLookup s $ derivs pwd
      pwd <- get
      put $ pwd { derivs = insert (c:s) d $ derivs pwd }
  pure $ Var $ c:s
  where
  nullable pe = do
    pwd <- get
    case reval (nulls pwd) $ bfor pe of
      BCon b -> pure b
      _ -> error "unreachable"
  go = \case
    Var t            -> derive c t
    Ch x | x == c    -> pure $ Eps x
    x :| y           -> (#|) <$> go x <*> go y
    Del x :. y       -> (Del x #.) <$> go y
    x :. y           -> do
      b <- nullable x
      dx <- go x
      if not b then pure $ dx #. y else do
        dy <- go y
        pure $ (dx #. y) #| (Del x #. dy)
    _                -> pure Nul
  (#.) = \cases
    Nul y -> Nul
    x Nul -> Nul
    x y -> x :. y
  (#|) = \cases
    Nul y -> y
    x Nul -> x
    x y -> x :| y

To determine if a grammar accepts a given string, we repeatedly derive the start symbol with respect to each successive character of the string. The resulting parsing expression is nullable if and only if the grammar accepts the string.

Let’s call its corresponding nonterminal the "final" symbol.

deriveString m start s = foldl go (start, initNulls m) s where
  go (key, pwd) c = (c:key, solve $ snd $ runState (derive c key) pwd)

For example, the following grammar accepts palindromic strings of bits of even length (a language beyond the capabilities of DPDAs).

Right epal = parse cfg "" [r|S = "0" S "0" | "1" S "1" | "";|]
deriveString epal "S" "1001"

It checks out. The nullability map takes "1001S" to True. Recall this string is how we represent \(D_1 D_0 D_0 D_1 S\), and since this parsing expression is nullable, we know the grammar accepts "1001". More generally, we see all palindromic strings are accepted and non-palindromic strings are rejected.

Its parse forest is encoded in the messy-looking parsing expressions in the second map.

Triming trees

We write a function that preserves only the nullable parts of the final parsing expression:

vars = \case
  Var v -> [v]
  x :. y -> vars x ++ vars y
  x :| y -> vars x ++ vars y
  _ -> []

filterNull pwd final = mgo final Tip where
  mgo k m = case mlookup k m of
    Nothing -> case go $ mustLookup k $ derivs pwd of
      Nothing -> m
      Just ast -> foldr mgo (insert k ast m) $ vars ast
    _ -> m
  go = \case
    Nul    -> Nothing
    Ch _   -> Nothing
    Empty  -> Just $ Empty
    Eps c  -> Just $ Eps c
    Del x  -> go x
    x :| y -> go x `mkOr` go y
    x :. y -> (:.) <$> go x <*> go y
    Var s   -> if mustLookup s $ nulls pwd then Just $ Var s else Nothing
  mkOr = \cases
    x Nothing -> x
    Nothing y -> y
    (Just x) (Just y) -> Just $ x :| y

We write a pretty printer for the filtered expressions:

pretty p = \case
  Eps c -> (c:)
  Empty -> ("\xce\xb5"++)
  x :. y -> pretty 1 x . pretty 1 y
  x :| y -> showParen (p == 1) $ pretty 0 x . (" | "++) . pretty 0 y
  Var s -> ('<':) . (s++) . ('>':)

prettyDemo g start s = putStr $ case toAscList $ filterNull pwd final of
  [] -> "parse failed\n"
  ds -> "<" ++ final ++ ">\nwhere:\n" ++ unlines (showDefn <$> ds)
  where
  showDefn (k, v) = "<" ++ k ++ "> = " ++ pretty 0 v ""
  (final, pwd) = deriveString g start s

For example:

prettyDemo epal "S" "11011011"

For ambiguous grammars, we print all possible values of all variables:

let Right g = parse cfg "" [r|S = S "+" S | "1";|] in
  prettyDemo g "S" "1+1+1+1"

We enumerate all members of a parse forest by going through all combinations. To distinguish between the parse trees, we parenthesize nontrivial variable expansions:

forest m k = go 0 $ mustLookup k m where
  go p = \case
    x :. y -> (.) <$> go 0 x <*> go 0 y
    x :| y -> go 0 x ++ go 0 y
    Var s -> showParen (p == 0) <$> go 1 (mustLookup s m)
    Eps c -> pure (c:)
    Empty -> pure ("\xce\xb5"++)

let
  Right g = parse cfg "" [r|S = S "+" S | "1";|]
  (final, pwd) = deriveString g "S" "1+1+1+1"
  in putStr $ unlines $ ($"") <$> forest (filterNull pwd final) final

We write a function that chooses a random parse tree. The distribution is not uniform: we just flip a coin each time we encounter an (:|) node while traversing the parsing expression.

parseFrac = uncurry (/) . foldl (\(a, b) d
  -> (a*10 + fromIntegral (ord d - ord '0'), b*10)) (0.0, 1)
jsrandom = parseFrac . drop 2 <$> jsEval "Math.random()"

sample g start s = putStrLn . ($"") =<< go 0 (mustLookup final m) where
  go p = \case
    Eps c -> pure (c:)
    Empty -> pure ("\xce\xb5"++)
    Var s -> showParen (p == 0) <$> go 1 (mustLookup s m)
    x :. y -> (.) <$> go 0 x <*> go 0 y
    x :| y -> do
      b <- (<1) . (2*) <$> jsrandom
      go 0 (if b then x else y)
  (final, pwd) = deriveString g start s
  m = filterNull pwd final

For example, here’s a sample parse tree from our first example grammar:

sample calc "expr" "11*(10+100)/1"

Our next example is a grammar that accepts matching pairs of parentheses (a language beyond the capabilities of DFAs). The sample should be unique in this case:

Right catalan = parse cfg "" [r|S = "[" S "]" S | "";|]
sample catalan "S" "[[[]][][]][]"

Believe it or not

The moment of truth has arrived. We start with the following grammar:

Right coxEx = parse cfg "" [r|
S = T;
T = T "+" T | N;
N = "1";
|]

Russ Cox alleges that parsing with derivatives takes exponential time to reject the following input:

prettyDemo coxEx "S" "1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1++1"

He also alleges it takes a long time to go far beyond the first few parse trees:

sample coxEx "S" "1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1"

Despite the questionable quality of our code (and our compiler), we still manage to reject the first input and print a random parse tree for the second.

Why Russ Cox Is Wrong

Consider the following grammar with start symbol A:

A = ("0" | "1") B
B = ("0" | "1") C
C = ("0" | "1")

Even though there are only 3 rules, this grammar accepts \(2^3 = 8\) different strings.

More generally, we can construct similar grammars with \(N\) rules that accept \(2^N\) different strings. In other words, a data structure whose size is polynomial in \(N\) can represent a parse forest whose size is exponential in \(N\).

Parsing with derivatives exhibits this phenomenon, which appears frequently in computer science under many different names: sharing, hash consing, lazy evaluation, automatic differentiation, …

We view the algorithm as adding rules to the grammar as we parse the input. The final parsing expression may represent a parse forest of exponential size, but the expression itself remains polynomial:

prettyDemo coxEx "S" "1+1+1+1+1"

If the input is rejected, our final symbol expands to Nul, that is, it accepts the empty language. For some reason, Cox falsely accuses the algorithm of backtracking here.

Returning a parse result is also efficient. We simply commit to one of the alternatives whenever a choice is offered while traversing the final parsing expression. Again, no backtracking.

Cox also fusses about ambiguity. This indeed is an important issue in language design, so it is strange that Cox condemns derivatives on these grounds. Firstly, parsing with derivatives trivially detects ambiguity: are there "|" operators in the final parsing expression? Secondly, common resolution strategies are also trivial. Want ambiguous infix operators to be left-associative? Then pick the first alternative each time.

At a higher level, Cox could have avoided hallucinating by tinkering with code like we just did, instead of penning a post riddled with errors. Could it be that Cox shied away from implementing parsing with derivatives because he was unfamiliar with languages in which this algorithm is easy to express. If so, it’s concerning that Cox was in charge of designing a programming language for a while. Does this mean entire classes of algorithms are difficult for many computer scientists to implement, or even comprehend?

Why I Was Wrong

An earlier version of this webpage claimed that we could avoid finding the least fixpoint when determining nullability. This happened to be true for the example grammars on this page, but my statement was false in general. A grammar with sufficient mutual recursion would cause my orginal implementation to fail. Realizing this compelled me to roll up my sleeves and write a routine to find a least fixpoint.

Of course, I have no idea if there are any remaining bugs!

Ben Lynn blynn@cs.stanford.edu 💡

Parsing With Derivatives

Parsing parsing expressions; a grammar for grammars

The least fixpoint

Nothing matters

Derivative Work

Triming trees

Believe it or not

Why Russ Cox Is Wrong

Why I Was Wrong

Contents