First-order logic




Negate, then:


▶ Sort demo

Boilerplate abounds in programs that manipulate syntax trees. Consider a function transforming a particular kind of leaf node. With a typical tree data type, we must add recursive calls for every recursive data constructor. If we later add a recursive data constructor we must update the function.

Or consider annotating a syntax tree. The most obvious way is to copy the syntax tree then add an extra annotation field for each data constructor. For example, compare the definitions of Expr and AnnExpr in GHC’s source code.

Paul Hai Liu showed me how to avoid code duplication. The trick is to use recursion schemes along with certain helper functions. With GHC’s pattern synonyms extension, our code resembles ordinary recursion.

We demonstrate by building classic theorem provers for first-order logic, by taking a whirlwind tour through chapters 2 and 3 of John Harrison, Handbook of Practical Logic and Automated Reasoning.

▶ Toggle extensions and imports

Recursion Schemes

We represent terms with an ordinary recursive data structure. Terms consist of variables, constants, and functions. Constants are functions that take zero arguments.

data Term = Var String | Fun String [Term] deriving (Eq, Ord)

We use a recursion scheme for the formulas of first-order predicate logic. These are like propositional logic formulas, except:

  • An atomic proposition is a predicate: a string constant accompanied by a list of terms.

  • Subformulas may be quantified by a universal (Forall) or existential (Exists) quantifier. A quantifier binds a variable in the same manner as a lambda.

data Quantifier = Forall | Exists deriving (Eq, Ord)
data Formula a = FTop | FBot | FAtom String [Term]
  | FNot a | FAnd a a | FOr a a | FImp a a | FIff a a
  | FQua Quantifier String a
  deriving (Eq, Ord, Functor, Foldable, Traversable)

A data type akin to the fixpoint combinator powers the recursion:

data FO = FO (Formula FO) deriving (Eq, Ord)

Setting up pattern synonyms is worth the boilerplate:

pattern Atom s ts = FO (FAtom s ts)
pattern Top = FO FTop
pattern Bot = FO FBot
pattern Not p = FO (FNot p)
pattern p :/\ q = FO (FAnd p q)
pattern p :\/ q = FO (FOr p q)
pattern p :==> q = FO (FImp p q)
pattern p :<=> q = FO (FIff p q)
pattern Qua q x p = FO (FQua q x p)

Next, functions to aid recursion. I chose the name unmap because its type seems to be the inverse of fmap.

bifix :: (a -> b) -> (b -> a) -> a
bifix f g = g $ f $ bifix f g
ffix :: Functor f => ((f a -> f b) -> a -> b) -> a -> b
ffix = bifix fmap
unmap :: (Formula FO -> Formula FO) -> FO -> FO
unmap h (FO t) = FO (h t)

Parsing and pretty-printing

Variables start with lowercase letters, while constants, functions, and predicates start with uppercase letters. We treat (⇐) as an infix binary predicate for the sake of some of our examples below.

▶ Toggle parser and pretty-printer

Free variables

To find the free variables of a Term, we must handle every data constructor and make explicit recursive functions calls. Fortunately, this data type only has two constructors: Var and Fun.

fvt :: Term -> [String]
fvt = \case
  Var x -> [x]
  Fun _ xs -> foldr union [] $ fvt <$> xs

Contrast this with the FO edition. Thanks to our recursion scheme, we write two special cases for Atom and Qua, then a terse catch-all expression does the obvious for all other cases.

This includes, for example, recursively descending into both arguments of an AND operator. Furthermore, if we add more operators to Formula, this code handles them automatically.

fv :: FO -> [String]
fv = ffix \h -> \case
  Atom _ ts -> foldr union [] $ fvt <$> ts
  Qua _ x p -> delete x $ fv p
  FO t -> foldr union [] $ h t


Thanks to pattern synonyms, recursion schemes are as easy as regular recursive data types.

Again, we write special cases for the formulas we care about, along with something perfunctory to deal with all other cases.

We unmap h before attempting rewrites because we desire bottom-up behaviour. For example, the inner subformula in \(\neg(x\wedge\bot)\) should first be rewritten to yield \(\neg\bot\) so that another rewrite rule can simplify this to \(\top\).

simplify :: FO -> FO
simplify = ffix \h fo -> case unmap h fo of
  Not (Bot) -> Top
  Not (Top) -> Bot
  Not (Not p) -> p
  Bot :/\ _ -> Bot
  _ :/\ Bot -> Bot
  Top :/\ p -> p
  p :/\ Top -> p
  Top :\/ _ -> Top
  _ :\/ Top -> Top
  Bot :\/ p -> p
  p :\/ Bot -> p
  _ :==> Top -> Top
  Bot :==> _ -> Top
  Top :==> p -> p
  p :==> Bot -> Not p
  p :<=> Top -> p
  Top :<=> p -> p
  Bot :<=> Bot -> Top
  p :<=> Bot -> Not p
  Bot :<=> p -> Not p
  Qua _ x p | x `notElem` fv p -> p
  t -> t

Negation normal form

A handful of rules transform a simplified formula to negation normal form (NNF), namely, the formula consists only of literals (atoms or negated atoms), conjunctions, disjunctions, and quantifiers.

This time, the recursion is top-down. We unmap h after the rewrite.

nnf :: FO -> FO
nnf = ffix \h -> unmap h . \case
  p :==> q -> Not p :\/ q
  p :<=> q -> (p :/\ q) :\/ (Not p :/\ Not q)
  Not (Not p) -> p
  Not (p :/\ q) -> Not p :\/ Not q
  Not (p :\/ q) -> Not p :/\ Not q
  Not (p :==> q) -> p :/\ Not q
  Not (p :<=> q) -> (p :/\ Not q) :\/ (Not p :/\ q)
  Not (Qua Forall x p) -> Qua Exists x (Not p)
  Not (Qua Exists x p) -> Qua Forall x (Not p)
  t -> t


Again we pit recursion schemes against plain old data structures. As before, the Term version must handle each case and its recursive calls are explicitly spelled out, while the FO version only handles the cases it cares about, provides a generic catch-all case, and relies on ffix and unmap to recurse. They are about the same size despite FO having many more data constructors.

This time, for variety, we unmap h in the catch-all case. We could also place it just inside or outside the case expression as above. It is irrelevant whether the recursion is top-down or bottom-up because only leaves are affected.

tsubst :: (String -> Maybe Term) -> Term -> Term
tsubst f t = case t of
  Var x -> maybe t id $ f x
  Fun s as -> Fun s $ tsubst f <$> as

subst :: (String -> Maybe Term) -> FO -> FO
subst f = ffix \h -> \case
  Atom s ts -> Atom s $ tsubst f <$> ts
  t -> unmap h t


Skolemization transforms an NNF formula to an equisatisfiable formula with no existential quantifiers, that is, the output is satisifiable if and only if the input is. Skolemization is "lossy" because validity might not be preserved.

We may need to mint new function names along the way. To avoid name clashes, the functions helper returns all functions present in a given formula. It also returns the arity of each function because we need this later to enumerate ground terms.

It is possible to Skolemize a non-NNF formula, but if negations can go anywhere, we may as well remove existential quantifiers by converting them to universal quantifiers via duality and preserve logical equivalence.

functions :: FO -> [(String, Int)]
functions = ffix \h -> \case
  Atom s ts -> foldr union [] $ funcs <$> ts
  FO t -> foldr union [] $ h t
  funcs = \case
    Var x -> []
    Fun f xs -> foldr union [(f, length xs)] $ funcs <$> xs

skolemize :: FO -> FO
skolemize t = evalState (skolem' $ nnf $ simplify t) (fst <$> functions t) where
  skolem' :: FO -> State [String] FO
  skolem' fo = case fo of
    Qua Exists x p -> do
      fns <- get
        xs = fv fo
        f = variant ((if null xs then "C_" else "F_") <> x) fns
        fx = Fun f $ Var <$> xs
      put $ f:fns
      skolem' $ subst (`lookup` [(x, fx)]) p
    FO t -> FO <$> mapM skolem' t

Prenex normal form

We can pull all the quantifiers of an NNF formula to the front by generating new variable names. This is known as prenex normal form (PNF).

variant :: String -> [String] -> String
variant s vs
  | s `elem` vs = variant (s <> "'") vs
  | otherwise   = s

prenex :: FO -> FO
prenex = ffix \h t -> let
  recursed = unmap h t
  f qua s g = Qua qua z $ prenex $ g \x -> subst (`lookup` [(x, Var z)])
    where z = variant s $ fv recursed
  in case recursed of
  Qua Forall x p :/\ Qua Forall y q -> f Forall x \r -> r x p :/\ r y q
  Qua Exists x p :\/ Qua Exists y q -> f Exists x \r -> r x p :\/ r y q
  Qua qua x p :/\ q -> f qua x \r -> r x p :/\ q
  p :/\ Qua qua y q -> f qua y \r -> p :/\ r y q
  Qua qua x p :\/ q -> f qua x \r -> r x p :\/ q
  p :\/ Qua qua y q -> f qua y \r -> p :\/ r y q
  t -> t

pnf :: FO -> FO
pnf = prenex . nnf . simplify

Quantifier-free formulas

A quantifier-free formula is one where every variable is free. Each variable is implicitly universally quantified, that is, for each variable x, we behave as if forall x. has been prepended to the formula.

We can remove all quantifiers from a skolemized NNF formula by pulling all the universal quantifiers to the front and then dropping them.

Our specialize helper appears exactly as it would if FO were an ordinary recursive data structure. Explicit recursion suits this function because we only want to transform the top of the tree.

deQuantify :: FO -> FO
deQuantify = specialize . pnf where
  specialize = \case
    Qua Forall x p -> specialize p
    t -> t

Ground terms

A ground term is a term containing no variables, that is, a term exclusively built from constants and functions.

We describe how to enumerate all possible terms given a set of constants and functions. For example, given X, Y, F(,), G(_), we want to generate something like:

X, Y, F(X,X), G(X), F(X,Y), F(Y,X), F(Y,Y), G(Y), F(G(X),G(Y)), ...

In general, there are infinite ground terms, but we can enumerate them in an order that guarantees any given term will appear: start with the terms with no functions, namely constant terms, then those that contain exactly one function call, then those that contain exactly two function calls, and so on.

groundTerms cons funs n
  | n == 0 = cons
  | otherwise = concatMap
    (\(f, m) -> Fun f <$> groundTuples cons funs m (n - 1)) funs

groundTuples cons funs m n
  | m == 0 = if n == 0 then [[]] else []
  | otherwise = [h:t | k <- [0..n], h <- groundTerms cons funs k,
                 t <- groundTuples cons funs (m - 1) (n - k)]

Herbrand universe

The Herbrand universe of a formula are the ground terms made from all the constants and functions that appear in the formula, with one special case: if no constants appear, then we invent one to avoid an empty universe.

For example, the Herbrand universe of:

\[(P(y) \implies Q(F(z))) \wedge (P(G(x)) \vee Q(x))\]


C, F(C), G(C), F(F(C)), F(G(C)), G(F(C)), G(G(C)), ...

We add the constant C because there were no constants to begin with. Since P and Q are predicates and not functions, they are not part of the Herbrand universe.

herbTuples :: Int -> FO -> [[Term]]
herbTuples m fo
  | null funs = groundTuples cons funs m 0
  | otherwise = concatMap (reverse . groundTuples cons funs m) [0..]
  (cs, funs) = partition ((0 ==) . snd) $ functions fo
  cons | null cs   = [Fun "C" []]
       | otherwise = flip Fun [] . fst <$> cs

We reverse the output of groundTuples because it happens to work better on a few test cases.

Automated Theorem Proving

It can be shown a quantifier-free formula is satisfiable if and only if it is satisfiable under a Herbrand interpretation. Loosely speaking, we treat terms like the abstract syntax trees that represent them; if a theorem holds under some interpretation, then it also holds for syntax trees.

Why? Intuitively, given a formula and an interpretation where it holds, we can define a syntax tree based on the constants and functions of the formula, and rig predicates on these trees to behave enough like their counterparts in the interpretation.

For example, the formula \(\forall x . x + 0 = x\) holds under many familiar interpretations. Here’s a Herbrand interpretation:

data Term = Plus Term Term | Zero
eq :: (Term, Term) -> Bool
eq _ = True

For our next example we take a formula that holds under interpretations such as integer arithmetic:

\[\forall x . Odd(x) \iff \neg Odd(Succ(x))\]

Here’s a Herbrand interpretation:

data Term = Succ Term | C
odd :: Term -> Bool
odd = \case
  C -> False
  Succ x -> not $ odd x

This important result suggests a strategy to prove any first-order formula f. As a preprocessing step, we prepend explicit universal quantifiers for each free variable:

generalize fo = foldr (Qua Forall) fo $ fv fo


  1. Negate \(f\) because validity and satisfiability are dual: the formula \(f\) is valid if and only if \(\neg f\) is unsatisfiable.

  2. Transform \(\neg f\) to an equisatisfiable quantifier-free formula \(t\). Let \(m\) be the number of variables in \(t\). Initialize \(h\) to \(\top\).

  3. Choose \(m\) elements from the Herbrand universe of \(t\).

  4. Let \(t'\) be the result of substituting the variables of \(t\) with these \(m\) elements. Compute \(h \leftarrow h \wedge t'\).

  5. If \(h\) is unsatisfiable, then \(t\) is unsatisfiable under any interpretation, hence \(f\) is valid. Otherwise, go to step 3.

We have moved from first-order logic to propositional logic; the formula \(h\) only contains ground terms which act as propositional variables when determining satisfiability. In other words, we have the classic SAT problem.

If the given formula is valid, then this algorithm eventually finds a proof provided the method we use to pick ground terms eventually selects any given possibility. This is the case for our groundTuples function.


It remains to detect unsatisfiability. One of the earliest approaches (Gilmore 1960) transforms a given formula to disjunctive normal form (DNF):

\[\bigvee_i \bigwedge_j x_{ij}\]

where the \(x_{ij}\) are literals. For example: \((\neg a\wedge b\wedge c) \vee (d \wedge \neg e) \vee (f)\).

We represent a DNF formula as a set of sets of literals. Given an NNF formula, the function pureDNF builds an equivalent DNF formula:

distrib s1 s2 = (uncurry S.union) $ S.cartesianProduct s1 s2

pureDNF = \case
  p :/\ q -> distrib (pureDNF p) (pureDNF q)
  p :\/ q -> S.union (pureDNF p) (pureDNF q)
  t -> S.singleton $ S.singleton t

Next, we eliminate conjunctions containing \(\bot\) or the positive and negative versions of the same literal, such as and ~P©. The formula is unsatisfiable if and only if nothing remains.

To reduce the formula size, we replace clauses containing \(\top\) with the empty clause (the empty conjunction is \(\top\)), and drop clauses that are supersets of other clauses.

nono = \case
  Not p -> p
  p -> Not p

isPositive = \case
  Not p -> False
  _ -> True

nontrivial lits = S.null $ S.intersection pos $ nono neg
  where (pos, neg) = S.partition isPositive lits

simpDNF = \case
  Bot -> S.empty
  Top -> S.singleton S.empty
  fo -> let djs = S.filter nontrivial $ pureDNF $ nnf fo in
    S.filter (\d -> not $ any (`S.isProperSubsetOf` d) djs) djs

Now we fill in the other steps. Our main loop takes in 3 functions so we can later try out different approaches to detecting unsatisfiable formulas.

We reverse the output of groundTuples because it happens to work better on a few test cases.

skno :: FO -> FO
skno = skolemize . nono . generalize

type Loggy = Writer ([String] -> [String])

output :: Show a => a -> IO ()
output s = do
  out <- getElem "out"
  appendValue out $ show s <> "\n"
runThen cont wr = do
  let (a, w) = runWriter wr
  cb <- makeHaskellCallback $ stream cont (a, w [])
  js_setTimeout cb 0
foreign import javascript "wrapper" makeHaskellCallback :: IO () -> IO JSFunction
foreign import javascript "wrapper" makeHaskellCallback1 :: (JSObject -> IO ()) -> IO JSFunction
output = print
runThen cont wr = do
  let (a, w) = runWriter wr
  mapM_ putStrLn $ w []
  cont a

herbrand conjSub refute uni fo = runThen output $ herbLoop (uni Top) [] herbiverse where
  qff = deQuantify . skno $ fo
  fvs = fv qff
  herbiverse = herbTuples (length fvs) qff
  t = uni qff
  herbLoop :: S.Set (S.Set FO) -> [[Term]] -> [[Term]] -> Loggy [[Term]]
  herbLoop h tried = \case
    [] -> error "invalid formula"
    (tup:tups) -> do
      tell (concat
        [ show $ length tried, " ground instances tried; "
        , show $ length h," items in list"
      let h' = conjSub t (subst (`M.lookup` (M.fromList $ zip fvs tup))) h
      if refute h' then pure $ tup:tried else herbLoop h' (tup:tried) tups

gilmore = herbrand conjDNF S.null simpDNF where
  conjDNF djs0 sub djs = S.filter nontrivial (distrib ( ( sub) djs0) djs)


The DPLL algorithm uses the conjunctive normal form (CNF), which is the dual of DNF:

\[\bigwedge_i \bigvee_j x_{ij}\]
pureCNF = ( nono) . pureDNF . nnf . nono

Constructing a CNF formula in this manner is potentially expensive, but at least we only pay the cost once. The main loop just piles on more conjunctions.

As with DNF, we simplify:

simpCNF = \case
  Bot -> S.singleton S.empty
  Top -> S.empty
  fo -> let cjs = S.filter nontrivial $ pureCNF fo in
    S.filter (\c -> not $ any (`S.isProperSubsetOf` c) cjs) cjs

We write DPLL functions and pass them to herbrand:

oneLiteral clauses = do
  u <- S.findMin <$> find ((1 ==) . S.size) (S.toList clauses)
  Just $ (S.delete (nono u)) $ S.filter (u `S.notMember`) clauses

affirmativeNegative clauses
  | S.null oneSided = Nothing
  | otherwise       = Just $ S.filter (S.disjoint oneSided) clauses
  (pos, neg') = S.partition isPositive $ S.unions clauses
  neg = nono neg'
  posOnly = pos S.\\ neg
  negOnly = neg S.\\ pos
  oneSided = posOnly `S.union` nono negOnly

dpll clauses
  | S.null clauses = True
  | S.empty `S.member` clauses = False
  | otherwise = rule1
  rule1 = maybe rule2 dpll $ oneLiteral clauses
  rule2 = maybe rule3 dpll $ affirmativeNegative clauses
  rule3 = dpll (S.insert (S.singleton p) clauses)
       || dpll (S.insert (S.singleton $ nono p) clauses)
  pvs = S.filter isPositive $ S.unions clauses
  p = maximumBy (comparing posnegCount) $ S.toList pvs
  posnegCount lit = S.size (S.filter (lit `elem`) clauses)
                  + S.size (S.filter (nono lit `elem`) clauses)

davisPutnam = herbrand conjCNF (not . dpll) simpCNF where
  conjCNF cjs0 sub cjs = S.union ( ( sub) cjs0) cjs

Definitional CNF

We can efficiently translate any formula to an equisatisfiable CNF formula with a definitional approach. Logical equivalence may not be preserved, but only satisfiability matters, and in any case Skolemization may not preserve equivalence.

We need a variant of NNF that preserves equivalences:

nenf :: FO -> FO
nenf = nenf' . simplify where
  nenf' = ffix \h -> unmap h . \case
    p :==> q -> Not p :\/ q
    Not (Not p) -> p
    Not (p :/\ q) -> Not p :\/ Not q
    Not (p :\/ q) -> Not p :/\ Not q
    Not (p :==> q) -> p :/\ Not q
    Not (p :<=> q) -> p :<=> Not q
    Not (Qua Forall x p) -> Qua Exists x (Not p)
    Not (Qua Exists x p) -> Qua Forall x (Not p)
    t -> t

Then, for each node with two children, we mint a 0-ary predicate that acts as its definition:

satCNF fo = S.unions $ simpCNF p
  : map (simpCNF . uncurry (:<=>)) (M.assocs ds)
  (p, (ds, _)) = runState (sat' $ nenf fo) (mempty, 0)
  sat' :: FO -> State (M.Map FO FO, Int) FO
  sat' = \case
    p :/\ q  -> def =<< (:/\)  <$> sat' p <*> sat' q
    p :\/ q  -> def =<< (:\/)  <$> sat' p <*> sat' q
    p :<=> q -> def =<< (:<=>) <$> sat' p <*> sat' q
    p        -> pure p
  def :: FO -> State (M.Map FO FO, Int) FO
  def t = do
    (ds, n) <- get
    case M.lookup t ds of
      Nothing -> do
        let v = Atom ("*" <> show n) []
        put (M.insert t v ds, n + 1)
        pure v
      Just v -> pure v

We define another DPLL prover using this definitional CNF algorithm:

davisPutnam2 = herbrand conjCNF (not . dpll) satCNF where
  conjCNF cjs0 sub cjs = S.union ( ( sub) cjs0) cjs


To refute \(P(F(x), G(A)) \wedge \neg P(F(B), y)\), the above algorithms would have to luck out and select, say, \( (x, y) = (B, G(A)) \).

Unification finds this assignment intelligently. This observation inspired a more efficient approach to theorem proving.

Harrison’s implementation of unification differs from that of Jones. Accounting for existing substitutions is deferred until variable binding, where we perform the occurs check, as well as a redundancy check. However, perhaps lazy evaluation means the two approaches are more similar than they appear.

istriv env x = \case
  Var y | y == x -> Right True
        | Just v <- M.lookup y env -> istriv env x v
        | otherwise -> Right False
  Fun _ args -> do
    b <- or <$> mapM (istriv env x) args
    if b then Left "cyclic"
         else Right False

unify env = \case
  [] -> Right env
  h:rest -> case h of
    (Fun f fargs, Fun g gargs)
      | f == g, length fargs == length gargs -> unify env $ zip fargs gargs <> rest
      | otherwise -> Left "impossible unification"
    (Var x, t)
      | Just v <- M.lookup x env -> unify env $ (v, t):rest
      | otherwise -> do
        b <- istriv env x t
        unify (if b then env else M.insert x t env) rest
    (t, Var x) -> unify env $ (Var x, t):rest

As well as terms, unification in first-order logic must also handle literals, that is, predicates and their negations.

literally nope f = \case
  (Atom p1 a1, Atom p2 a2) -> f [(Fun p1 a1, Fun p2 a2)]
  (Not p, Not q) -> literally nope f (p, q)
  _ -> nope

unifyLiterals = literally (Left "Can't unify literals") . unify


After Skolemizing, we recurse on the structure of the formula, gathering literals that must all play nice together, and branching when necessary. If one literal in our collection unifies with the negation of another, then the current branch is refuted. The theorem is proved once all branches are refuted.

When we encounter a universal quantifier, we instantiate a new variable then move the subformula to the back of the list in case we need it again. This creates tension. On the one hand, we want new variables so we can find unifications to refute branches. On the other hand, it may be better to move on and look for a literal that is easier to contradict.

Iterative deepening comes to our rescue. We bound the number of variables we may instantiate to avoid getting lost in the weeds. If the search fails, we bump up the bound and try again.

(We mean "branching" in a yak-shaving sense, that is, while trying to refute A, we find we must also refute B, so we add B to our to-do list. At a higher level, there is another sense of branching where we realize we made the wrong decision so we have to undo it and try again; we call this backtracking.)

deepen :: (Show t, Num t) => (t -> Either b c) -> t -> Loggy c
deepen f n = do
  tell (("Searching with depth limit " <> show n):)
  either (const $ deepen f (n + 1)) pure $ f n

tabRefute fos = deepen (\n -> go n fos [] Right (mempty, 0)) 0 where
  go n fos lits cont (env,k)
    | n < 0 = Left "no proof at this level"
    | otherwise = case fos of
      [] -> Left "tableau: no proof"
      h:rest -> case h of
        p :/\ q -> go n (p:q:rest) lits cont (env,k)
        p :\/ q -> go n (p:rest) lits (go n (q:rest) lits cont) (env,k)
        Qua Forall x p -> let
          y = Var $ '_':show k
          p' = subst (`lookup` [(x, y)]) p
          in go (n - 1) (p':rest <> [h]) lits cont (env,k+1)
        lit -> asum ((\l -> cont =<< (, k) <$>
            unifyLiterals env (lit, nono l)) <$> lits)
          <|> go n rest (lit:lits) cont (env,k)

tableau fo = runThen output $ case skno fo of
  Bot -> pure (mempty, 0)
  sfo -> tabRefute [sfo]

Some problems can be split up into the disjunction of independent subproblems, which we can solve individually:

splitTableau = map (tabRefute . S.toList) . S.toList . simpDNF . skno

Connection Tableaux

We can view tableaux as a lazy CNF-based algorithm. We convert to CNF as we go, stopping immediately after showing unsatisifiability. In particular, our search is driven by the order in which clauses appear in the input formula.

Perhaps it is wiser to select clauses less arbitrarily. How about requiring the next clause we examine to be somehow connected to the current clause? For example, maybe we should insist a certain literal in the current clause unifies with the negation of a literal in the next.

With this in mind, we arrive at Prolog-esque unification and backtracking, but with a couple of tweaks so that it works on any CNF formula rather than merely on a bunch of Horn clauses:

  1. Employ iterative deepening instead of a depth-first search.

  2. Look for conflicting subgoals.

Any unsatisfiable CNF formula must contain a clause containing only negative literals. We pick one to start the refutation.

selections bs = unfoldr (\(as, bs) -> case bs of
  [] -> Nothing
  b:bt -> Just ((b, as <> bt), (b:as, bt))) ([], bs)

instantiate :: [FO] -> Int -> ([FO], Int)
instantiate fos k = (subst (`M.lookup` (M.fromList $ zip vs names)) <$> fos, k + length vs)
  vs = foldr union [] $ fv <$> fos
  names = Var . ('_':) . show <$> [k..]

conTab clauses = deepen (\n -> go n clauses [] Right (mempty, 0)) 0 where
  go n cls lits cont (env, k)
    | n < 0 = Left "too deep"
    | otherwise = case lits of
      [] -> asum [branch ls (env, k) | ls <- cls, all (not . isPositive) ls]
      lit:litt -> let nlit = nono lit in asum (contra nlit <$> litt)
        <|> asum [branch ps =<< (, k') <$> unifyLiterals env (nlit, p)
        | cl <- cls, let (cl', k') = instantiate cl k, (p, ps) <- selections cl']
    branch ps = foldr (\l f -> go (n - length ps) cls (l:lits) f) cont ps
    contra p q = cont =<< (, k) <$> unifyLiterals env (p, q)

mesonBasic fo = runThen output $ conTab
  $ S.toList <$> S.toList (simpCNF $ deQuantify $ skno fo)

We translate a Prolog sorting program and query to CNF to illustrate the correspondence.

sortExample = intercalate " & "
  [ "(Sort(x0,y0) | !Perm(x0,y0) | !Sorted(y0))"
  , "Sorted(Nil)"
  , "Sorted(C(x1, Nil))"
  , "(Sorted(C(x2, C(y2, z2))) | !(x2 <= y2) | !Sorted(C(y2,z2)))"
  , "Perm(Nil,Nil)"
  , "(Perm(C(x3, y3), C(u3, v3)) | !Delete(u3,C(x3,y3),z3) | !Perm(z3,v3))"
  , "Delete(x4,C(x4,y4),y4)"
  , "(Delete(x5,C(y5,z5),C(y5,w5)) | !Delete(x5,z5,w5))"
  , "Z <= x6"
  , "(S(x7) <= S(y7) | !(x7 <= y7))"
  , "!Sort(C(S(S(S(S(Z)))), C(S(Z), C(Z,C(S(S(Z)), C(S(Z), Nil))))), x8)"

prologgy fo = conTab $ S.toList <$> S.toList (simpCNF fo)

tsubst' :: (String -> Maybe Term) -> Term -> Term
tsubst' f t = case t of
  Var x -> maybe t (tsubst' f) $ f x
  Fun s as -> Fun s $ tsubst' f <$> as

sortDemo = runThen prSub $ prologgy $ mustFO sortExample where
  prSub (m, _) = output $ tsubst' (`M.lookup` m) $ Var "x8"

We refine mesonBasic by aborting whenever the current subgoal is equal to an older subgoal under the substitutions found so far. In addition, as with splitTableau, we split the problem into smaller independent subproblems when possible.

equalUnder :: M.Map String Term -> [(Term, Term)] -> Bool
equalUnder env = \case
  [] -> True
  h:rest -> case h of
    (Fun f fargs, Fun g gargs)
      | f == g, length fargs == length gargs -> equalUnder env $ zip fargs gargs <> rest
      | otherwise -> False
    (Var x, t)
      | Just v <- M.lookup x env -> equalUnder env $ (v, t):rest
      | otherwise -> either (const False) id $ istriv env x t
    (t, Var x) -> equalUnder env $ (Var x, t):rest

noRep n cls lits cont (env, k)
  | n < 0 = Left "too deep"
  | otherwise = case lits of
    [] -> asum [branch ls (env, k) | ls <- cls, all (not . isPositive) ls]
      | any (curry (literally False $ equalUnder env) lit) litt -> Left "repetition"
      | otherwise -> let nlit = nono lit in asum (contra nlit <$> litt)
        <|> asum [branch ps =<< (, k') <$> unifyLiterals env (nlit, p)
        | cl <- cls, let (cl', k') = instantiate cl k, (p, ps) <- selections cl']
  branch ps = foldr (\l f -> noRep (n - length ps) cls (l:lits) f) cont ps
  contra p q = cont =<< (, k) <$> unifyLiterals env (p, q)

meson fos = mapM_ (runThen output) $ map (messy . listConj) $ S.toList <$> S.toList (simpDNF $ skno fos)
  messy fo = deepen (\n -> noRep n (toCNF fo) [] Right (mempty, 0)) 0
  toCNF = map S.toList . S.toList . simpCNF . deQuantify
  listConj = foldr1 (:/\)

Due to a misunderstanding, our code applies the depth limit differently to the book. Recall in our tableau function, the two branches of a disjunction receive the same quota for new variables. I had thought the same was true for the branches of meson, and that is what appears above.

I later learned the quota is meant to be shared among all subgoals. I wrote a version more faithful to the original meson (our demo calls it "MESON by the book"). It turns out to be slow.

▶ Toggle faithful rendition


Our gilmore and davisPutnam functions perform better than the book suggests they should. In particular, gilmore p20 finishes quickly.

I downloaded Harrison’s source code for a sanity check, and found the original gilmore implementation easily solves p20. It seems the book is mistaken; perhaps the code was buggy at the time.

The original source also contains several test cases:

▶ Toggle problems

Our code sometimes takes a better path through the Herbrand universe than the original. For example, our davisPutnam goes through 111 ground instances to solve p29 while the book version goes through 180.

Curiously, if we leave the output of our groundtuples unreversed, then gilmore p20 seems intractable.

Our davisPutnam2 function is unreasonably effective. Definitional CNF suits DPLL by producing fewer clauses and fewer literals per clause, so rules fire more frequently. The vaunted p38 and even the dreaded steamroller ("216 ground instances tried; 497 items in list") lie within its reach. The latter may be too exhausting for a browser and should be confirmed with GHC. Assuming Nix is installed:

$ nix-shell -p "haskell.packages.ghc881.ghcWithPackages (pkgs: [pkgs.megaparsec])"
$ wget
$ ghci fol.lhs

then type davisPutnam2 steamroller at the prompt.

Definitional CNF hurts our connection tableaux solvers. It introduces new literals which only appear a few times each. Our code fails to take advantage of this to quickly find unifiable literals.

Our connection tableaux functions mesonBasic and meson are mysteriously miraculous. Running mesonBasic steamroller succeeds at depth 21, and meson gilmore1 at depth 13, though our usage of the depth limit differs from that in the book.

They are so fast that I was certain there was a bug. After extensive tracing, I’ve concluded laziness is the root cause. Although our meson appears to be a reasonably direct translation of the OCaml version, Haskell’s lazy evaluation means we memoize expensive computations, and distributing the size bound among subgoals erases these gains.

The first time the cont continuation is reached, certain reductions remain on the heap so the next time we reach it, we can avoid repeating expensive computations. It is true that each cont invocation gets its own (env,k), but we can accomplish a lot without looking at them, such as determining that two literals cannot unify because the outermost function names differ.

We can push further and memoize more. Here’s an obvious way to filter out candidates that could never unify:

couldMatchTerms = \case
  [] -> True
  h:rest -> case h of
    (Fun f fargs, Fun g gargs)
      | f == g, length fargs == length gargs -> couldMatchTerms $ zip fargs gargs <> rest
      | otherwise -> False
    _ -> True
couldMatch x y = case (x, y) of
  (Atom p1 a1, Atom p2 a2) -> couldMatchTerms [(Fun p1 a1, Fun p2 a2)]
  (Not p, Not q) -> couldMatch p q
  _ -> False

noRep' n cls lits cont (env, k)
  | n < 0 = Left "too deep"
  | otherwise = case lits of
    [] -> asum [branch ls (env, k) | ls <- cls, all (not . isPositive) ls]
      | any (curry (literally False $ equalUnder env) lit) $ filter (couldMatch lit) litt -> Left "repetition"
      | otherwise -> let nlit = nono lit in asum (contra nlit <$> filter (couldMatch nlit) litt)
        <|> asum [branch ps =<< (, k') <$> unifyLiterals env (nlit, p)
        | cl <- cls, let (cl', k') = instantiate cl k, (p, ps) <- selections cl', couldMatch nlit p]
  branch ps = foldr (\l f -> noRep' (n - length ps) cls (l:lits) f) cont ps
  contra p q = cont =<< (, k) <$> unifyLiterals env (p, q)

meson' fos = mapM_ (runThen output) $ map (messy . listConj) $ S.toList <$> S.toList (simpDNF $ skno fos)
  messy fo = deepen (\n -> noRep' n (toCNF fo) [] Right (mempty, 0)) 0
  toCNF = map S.toList . S.toList . simpCNF . deQuantify
  listConj = foldr1 (:/\)

This is about 5% faster, despite wastefully traversing the same literals once for couldMatch and another time for unifyLiterals. It would be better if unifyLiterals could use what couldMatch has already learned.

Perhaps better still would be to divide the substitutions into those that are known when the continuation is created, and those that are not. Then couldMatch can take the first set into account while still being memoizable.


We compile to wasm with Asterius.

▶ Toggle front-end

To force the browser to render log updates, we use zero-duration timeouts. The change in control flow means that the web version of meson interleaves the refutations of independent subformulas.

We may be running into an Asterius bug involving callbacks and garbage collection. The callbacks created in the stream function are all one-shot, but if we declare them as "oneshot" then our code crashes on the steamroller problem.

Letting them build up on the heap means we can solve steamroller with "Lazy MESON", but only once. The second time we click the button, we run into a strange JavaScript error:

Uncaught (in promise) JSException "RuntimeError: function signature mismatch

Ben Lynn 💡