Equational Reasoning

Let’s take a well-known programming problem, and solve it with generous helpings of equational reasoning. This just means if we know x = y, then anywhere we see x we may replace it by y and vice versa. This sounds simple and natural, but fails in many languages due to side effects. In contrast, much of Haskell is pure, so we greatly benefit from equational reasoning.

Define a binary tree data structure that stores items of type a in its internal nodes, and stores nothing in its leaves:

{-# LANGUAGE LambdaCase #-}
data Tree a = N | B a (Tree a) (Tree a) deriving Show

An inorder traversal produces a list as follows:

inorder :: Tree a -> [a]
inorder = \case
  N -> []
  B x l r -> inorder l <> [x] <> inorder r

A preorder traversal produces a list as follows:

preorder :: Tree a -> [a]
preorder = \case
  N -> []
  B x l r -> [x] <> preorder l <> preorder r

Puzzle: Recover a binary tree of distinct elements from its inorder and preorder traversals.

In other words, define a function:

f :: Eq a => [a] -> [a] -> Tree a

that satisfies, for all t of type Tree a:

f (preorder t) (inorder t) == t

In the old days, to solve this, I might have racked my brain, trying to bring to the surface what I know about binary trees. I might go over other puzzles about binary trees. I might try small examples to see if they cast light on a general solution.

These are good strategies, but then I read Pearls of Functional Algorithm Design, where Richard Bird aims "to see to what extent algorithm design can be cast in a familiar mathematical tradition of calculating a result by using well-established mathematical principles, theorems and laws."

Of course! I calculate 123 + 456 + 789 by robotically following well-known recipes, reserving my intellect for greater challenges. Why not do the same when programming? Be smart by being stupid!

Thus we begin with structural induction on t, which by definition splits into two cases.

Case t == N:

By definition:

preorder N = []
inorder N = []

thus equational reasoning implies:

f [] [] = N

Case t == B x l r:

f (preorder (B x l r)) (inorder (B x l r)) == B x l r

We rewrite the left-hand side using the definitions of preorder and inorder:

f (x : preorder l <> preorder r) (inorder l <> [x] <> inorder r)

where we apply the identity [a] <> bs == a : bs.

By inductive assumption, the right-hand side is:

B x (f (preorder l) (inorder l)) (f (preorder r) (inorder r))

Let’s write:

plpr = preorder l <> preorder r
ilxir = inorder l <> [x] <> inorder r

to work towards a definition of f:

f (x : plpr) ilxir ==
  B x (f (preorder l) (inorder l)) (f (preorder r) (inorder r))

A decent start, but we must find some way to write various terms on the right-hand side in terms of x, plpr and ilxir. Sadly, at this point, we need to think.

Since the elements are distinct, we can show not (x `elem` inorder l), hence:

(inorder l, x:inorder r) == break (==x) ilxir

The splitAt function has the property:

(x, y) == splitAt (length x) (x <> y)

By induction, we can show for all x:

length (preorder x) == length (inorder x)

Altogether, we have calculated:

f :: Eq a => [a] -> [a] -> Tree a
f [] [] = N
f (x : plpr) ilxir = B x (f pl il) (f pr ir) where
  (il, _:ir) = break (== x) ilxir
  (pl, pr) = splitAt (length il) plpr

Solved! But we had to turn off autopilot for the last stretch.

Nevertheless, blindly attempting structural induction and expanding definitions gave us a good head start, and although we had to work for our meal in the end, by sticking to equational reasoning:

  • We proved our code is correct.

  • We proved a solution exists and must be unique.

  • We can pinpoint the step that needs the elements to be distinct.

My old self would likely written a solution in C that was "obviously correct" and have trouble with these tasks. Equational reasoning is possible in some parts of C such as arithmetic expressions, but it covers so little of the language that we must resort to other methods to show correctness.

This exercise naturally leads to:

Puzzle: Write a program that automates calculations like the above.

[Originally I had planned to hold off on this article until after I had solved this!]

The structural induction and expansion of definitions seems reasonable, and also the identity [a] <> bs == a : bs. The splitAt property we used could conceivably be built-in.

But how would a program figure out break and its connection with the condition that the elements are distinct? And how would it discover the lengths of the two different traversals must always match?

Why do natural numbers arise in a problem about lists and trees? Is it because of how I think, or is it fundamental to the problem? Perhaps Peano-encoding numbers as lists would make certain connections easier to see, that is, we represent the natural n with replicate n (), in which case length = map (const ()), and so on.

Ben Lynn blynn@cs.stanford.edu 💡