Combinatory Logic

Lambda term: \f.(\x.x x)(\x.f(x x))

Look ma,

Recall a lambda term is built from variables, applications, and abstractions:

data Term = Var String | App Term Term | Lam String Term

instance Show Term where
  show (Lam s t)  = "λ" ++ s ++ showB t where
    showB (Lam x y) = " " ++ x ++ showB y
    showB expr      = "." ++ show expr
  show (Var s)    = s
  show (App x y)  = showL x ++ showR y where
    showL (Lam _ _) = "(" ++ show x ++ ")"
    showL _         = show x
    showR (Var s)   = ' ':s
    showR _         = "(" ++ show y ++ ")"

Variables are the trickiest part of lambda calculus. And naming is the trickiest part of variables: the most complex code in our lambda evaluator is the part that renames variables to perform capture-avoiding substitutions.

Originally, variable names were part of the theory and renaming was dealt formally via alpha-conversion. Now, names are seen as artificial tedious tags whose sole purpose is to aid human comprehension. Renaming is an administrative detail that lies outside the theory, though one which must be handled with care when proving theorems.

Can we get rid of them instead? There ought to be a way to study computation without naming names.

(We will shortly mention many names, but they will be the names of researchers and not variables!)

De Bruijn

Large institutions like universities often anonymize by replacing names with numbers. We do the same. We replace each occurrence of a variable with a natural number indicating which lambda bound it, known as its De Bruijn index.

The innermost lambda is 0 (though some heretics prefer to start from 1), and we increment as we go outwards. Numbering the other way is possible, but then common subexpressions appear distinct.

Instead of a sequence of bound variables like \(\lambda x y z\), we now write a string of repeated lambdas like \(\lambda \lambda \lambda\). This means we no longer we need a period to separate the lambda bindings from the body of an expression.

For example,

\[\lambda f.(\lambda x.x x)(\lambda x.f(x x))\]

becomes:

\[\lambda(\lambda 0 0)(\lambda 1(0 0))\]

data DB = N Int | L DB | A DB DB

instance Show DB where
  showsPrec p = \case
    N n -> (if p > 0 then (' ':) else id) . shows n
    L t -> ("λ"++) . shows t
    A t u -> showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u

index x xs = lookup x $ zip xs [0..]

deBrujin = go [] where
  go binds = \case
    Var x -> maybe (error $ "free: " ++ x) N $ index x binds
    Lam x t -> L $ go (x:binds) t
    App t u -> A (go binds t) (go binds u)

We port our lambda term evaluator to De Bruijn indices. Gone is the complex renaming logic, and basic arithmetic suffices for telling apart free and bound variables.

whnfDB t = case t of
  A f x -> case whnfDB f of
    L body -> whnfDB $ beta 0 x body
    g -> A g x
  L _ -> t
beta lvl x = \case
  N n | n < lvl   -> N n
      | n == lvl  -> x
      | otherwise -> N $ n - 1
  A t u -> A (beta lvl x t) (beta lvl x u)
  L t -> L $ beta (lvl + 1) x t

Schönfinkel, Curry

Buoyed by our success, we aim higher. Can we remove variables altogether, whether they’re names or numbers?

We start with a simple demonstration. Define:

\[I = \lambda x . x = \lambda 0\]

and:

\[K = \lambda x y . x = \lambda \lambda 1\]

Then we find for any \(a, b\):

\[KIab = (\lambda \lambda x y.x) (\lambda x.x) a b = (\lambda x.x) b = b\]

Thus \(KI = \lambda a b . b = \lambda \lambda 0\). In other words, we have combined \(K\) and \(I\) to produce a closed lambda term distinct to either one. (Thanks to De Bruijn indices, we can see at a glance they differ.) Moreover, no variables were needed: we merely applied \(K\) to \(I\).

Can we choose a small set of closed lambda terms such that they can be applied to one another to produce any given closed lambda term?

Moses Schönfinkel asked and answered this question when he presented the ideas of On the building blocks of mathematical logic to the Göttingen Mathematical Society on 7 December 1920. It was written up for publication in March 1924 by Heinrich Behmann.

A few years later, Haskell Curry rediscovered similar ideas, and only became aware of Schönfinkel’s work in late 1927. His solution was slightly different.

We say combinator rather than "closed lambda term" in this context, which perhaps reminds us that we’re focused on combining specific closed lambda terms to see what we can build, rather than playing games with variables.

Church

The work of Schönfinkel and Curry predates lambda calculus. How is it possible that they were devising ways of removing variables from a calculus that hadn’t been invented yet?!

It turns out that viewing combinators as lambda terms without variables is a modern idea. The original motivations were different, and it is a beautiful coincidence that they all lead to an elegant model of computation.

Recall any boolean circuit can be constructed just from NAND gates. Schönfinkel sought to analogously reduce predicate logic into as few elements as possible, and it turns out the "for all" and "there exists" quantifiers of predicate logic behave like lambda abstractions.

Logic also motivated Curry, who wanted to better understand the rule of substitution. Russell and Whitehead’s Principia Mathematica was famed for an excruciatingly detailed proof of \(1 + 1 = 2\), but they overlooked an important detail: they lacked an explicit definition of substitution.

Around 1928 Alonzo Church devised lambda calculus so he could explicitly define and analyze substitution, so it’s not that Schönfinkel and Curry traveled in time, but that everyone was studying the same hot topic. Due to its origin story, the study of combinators is called combinatory logic.

Schönfinkel wrote definitions like \(Ix = x\) instead of \(I = \lambda x.x\), notation which Church would later invent. We will, too, to set the mood. (Decades later, Landin reverted to Schönfinkel’s style to avoid frightening readers with lambdas!)

Schönfinkel found that all combinators, that is, all closed lambda terms, can be built exclusively from the \(S\) and \(K\) combinators, which are given by:

\[\begin{aligned} & Sxyz = xz(yz) \\ & Kxy = x \end{aligned}\]

except he wrote "C" instead of "K" (which is strange because the paper calls it the Konstanzfunktion so you’d think "K" would be preferred; Curry likely changed the letters to what we write today). By the way, in Haskell \(S\) is the Reader instance of ap or (<*>) and \(K\) is const.

We define a data structure to hold expressions built from combinators. It’s little more than a binary tree, because we no longer have lambda abstractions. There’s one wrinkle: it turns out we need to temporarily store a variable in a leaf at times so we add a Tmp data constructor.

data CL = Com String | Tmp String | CL :@ CL

instance Show CL where
  showsPrec p = \case
    Com s -> (if p > 0 then (' ':) else id) . (s++)
    Tmp s -> (if p > 0 then (' ':) else id) . (s++)
    t :@ u -> showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u

A cleaner design may be to import the Data.Magma package and use BinaryTree String to hold combinator terms and BinaryTree (Either String String) to hold combinator terms that may contain variables. But that’s more code than I want for simple demo.

Schönfinkel went further and combined \(S\) and \(K\) into one combinator to rule them all. The Iota and Jot languages do something similar. We seem to gain no advantage from these contrived curiosities, other than being able to brag that one combinator suffices for any computation. See also Fokker, 1992, The systematic construction of a one-combinator basis for Lambda-Terms.

Rosenbloom

So how did Schönfinkel convert any closed lambda term to a tree of \(S\) and \(K\) combinators? We won’t say, firstly because Schönfinkel didn’t actually state an algorithm and only walked through an example, and secondly because we prefer to examine a more streamlined version published by Paul Rosenbloom in 1950 (page 117 of The Elements of Mathematical Logic).

The key is to solve a subproblem known as bracket abstraction. Suppose we have an expression \(e\) built from \(S\) and \(K\) combinators and variables, for example:

\[e = S K (S x) (y K z)\]

Pick a variable, say \(x\). Then find an expression \(f\) such that

\[f x = e\]

where \(f\) does not contain any occurrences of \(x\) (so \(f\) is built from \(S, K\) and variables other than \(x\)). Smullyan calls \(f\) an \(x\)-eliminate of \(e\) in his puzzle book To Mock a Mockingbird.

Some authors write:

\[f = [x] e\]

hence the term "bracket abstraction". These authors may also use brackets to denote substitution. For example, the notation \([P/x] x = P\) means "substituting \(P\) for \(x\) in the term \(x\) results in \(P\)".

[x] S K (S x)(y K z)

Like just about all problems involving trees, recursion is the solution.

If \(e = x\), then a solution is \(f = SKK\). We can check \(SKKx = x\).

If \(e = y\), then a solution is \(f = Ky\), as \(Kyx = y\). This works for any variable apart from \(x\), or \(S\) or \(K\).

Otherwise \(e = t u\) where \(t, u\) are expressions. Then we recursively solve the problem for \(t, u\) to get \(t', u'\), and a solution is \(f = S t' u'\). We have \(S t' u' x = t' x (u' x) = t u\).

rosenbloom x = \case
  Tmp y | x == y -> Com "S" :@ Com "K" :@ Com "K"
  t :@ u -> Com "S" :@ rosenbloom x t :@ rosenbloom x u
  t -> Com "K" :@ t

Antoni Diller’s page on classic bracket abstraction features online interactive demos of many variants of this algorithm.

Converting a given closed lambda term to \(S\) and \(K\) combinators is simply a matter of performing bracket abstraction for each lambda. We parameterize the bracket abstraction function so we can play with different versions.

nova elim = \case
  App t u -> rec t :@ rec u
  Lam x t -> elim x $ rec t
  Var x -> Tmp x
  where rec = nova elim

Naturally, we can run \(SK\) programs in our lambda evaluator by adding the definitions:

S = \x y z.x z(y z)
K = \x y.x

However, we wish to flaunt the simplicity of \(SK\) interpreters. We even throw in bonus standard combinators because it’s easy, and they make the web widget more useful.

Here are functions for:

head reduction: reduce a standard combinator at the beginning of the expression
weak head normalization: perform head reductions until no longer possible
normalization: reduce combinators anywhere in the expression until no longer possible, taking care to search for reducible expressions (redexes) starting from the left.

Like lambda calculus, preferring the leftmost redex at each step guarantees we’ll reach the normal form if it exists.

reduce (Com "S" :@ x :@ y :@ z) = Just $ x :@ z :@ (y :@ z)
reduce (Com "K" :@ x :@ y)      = Just x
-- Bonus combinators.
reduce (Com "B" :@ x :@ y :@ z) = Just $ x :@ (y :@ z)
reduce (Com "C" :@ x :@ y :@ z) = Just $ x :@ z :@ y
reduce (Com "R" :@ x :@ y :@ z) = Just $ y :@ z :@ x
reduce (Com "I" :@ x)           = Just x
reduce (Com "T" :@ x :@ y)      = Just $ y :@ x
reduce _ = Nothing

step (f :@ z) = maybe (step f :@ z) id $ reduce (f :@ z)
step t = t

whnf (f :@ z) = maybe t whnf $ reduce t where t = whnf f :@ z
whnf t = t

norm t = case whnf t of
  x :@ y -> norm x :@ norm y
  t -> t

Clearly, evaluating \(S\) and \(K\) combinators is far simpler than dealing with beta-reduction in lambda calculus even if we employ De Bruijn indices.

Combinatory logic term: S K K (K (S K K K))

Turner

Together, the \(S\) and \(K\) combinators possess as much power as lambda calculus, that is, they are Turing complete! Thus a good strategy for a compiler might be to translate a given lambda calculus program to combinators because it’s so easy to generate code that repeatedly reduces \(S\) and \(K\) combinators.

However, there’s a snag. This idea only works if the conversion is efficient. Rosenbloom’s bracket abstraction algorithm is exponential in the worst case. Consider, say:

\[\lambda x_1 ... x_n . S K\]

We start with one application, of \(S\) to \(K\). For each lambda, the rosenbloom function recursively calls itself on either side of each application, which at least doubles the number of applications, so the result will have at least \(2^n\) applications.

A few tweaks greatly improve the efficiency, and indeed, Schönfinkel’s paper suggests a faster algorithm, and Curry explicitly stated more efficient algorithms.

A particularly effective improvement known as the K-optimization is to detect an unused variable as far up the tree as possible and apply \(K\) here, rather than painstakingly send the variable down to each leaf of the tree, only to have it eventually ignored via \(K\) at the last minute.

occurs x = \case
  Tmp s | s == x -> True
  t :@ u -> occurs x t || occurs x u
  _ -> False

optimizeK x t
  | not $ occurs x t = Com "K" :@ t
  | otherwise = case t of
    Tmp y -> Com "S" :@ Com "K" :@ Com "K"
    t :@ u -> Com "S" :@ optimizeK x t :@ optimizeK x u

David Turner found enough optimizations to enable a practical compiler for his Miranda language, that is, we implement a small set of combinators, then rewrite the input program in terms of those combinators. Our lambda-calculus-to-wasm compiler demonstrates this approach.

Early Haskell compilers took the same approach, and later Haskell compilers extended this approach by generating custom combinators that are tailor-made for each input program. These are known as supercombinators in the literature. However, modern GHC shuns combinators in favour of the Spineless Tagless G-Machine.

The following glues our code to the widgets in this page:

jsEval "curl_module('../compiler/Charser.ob')"

import Charser
lcTerm = lam <|> app where
  lam = flip (foldr Lam) <$> (lam0 *> some var <* lam1) <*> lcTerm
  lam0 = str "\\" <|> str "λ"
  lam1 = str "->" <|> str "."
  app = foldl1 App <$> some (Var <$> var <|> str "(" *> lcTerm <* str ")")
  var = some alphaNumChar <* space

clTerm = app where
  app = foldl1 (:@) <$> some (com <|> var <|> str "(" *> clTerm <* str ")")
  com = Com . (:[]) <$> sat (`elem` "SKBCRIT") <* space
  var = Tmp <$> some alphaNumChar <* space

str = (<* space) . string

parseLC = parse (space *> lcTerm <* eof) ""
parseCL = parse (space *> clTerm <* eof) ""

demoLC f = do
  s <- jsEval "input.value;"
  jsEval $ "output.value = `" ++ either id (show . f) (parseLC s) ++ "`;"

demoNox f = do
  s <- jsEval "inputnox.value;"
  jsEval $ "outputnox.value = `" ++ either id (show . f) (parseCL s) ++ "`;"

demoReduce f = do
  s <- jsEval "iosk.value;"
  jsEval $ "iosk.value = `" ++ either id (show . f) (parseCL s) ++ "`;"

jsEval [r|
  addButton(repl, "no names!", "demoLC deBrujin", maButtons);
  addButton(repl, "no variables!", "demoLC (nova rosenbloom)", maButtons);
  addButton(repl, "no variables, K-optimized!", "demoLC (nova optimizeK)", maButtons);

  addButton(repl, "Rosenbloom", "demoNox (rosenbloom \"x\")", noxButtons);
  addButton(repl, "K-optimized", "demoNox (optimizeK \"x\")", noxButtons);

  addButton(repl, "Step", "demoReduce step", reduceButtons);
  addButton(repl, "Weak Head Normalize", "demoReduce whnf", reduceButtons);
  addButton(repl, "Normalize", "demoReduce norm", reduceButtons);
|]

Iverson

Combinators are also useful on the other side of the compiler, that is, in source programs thesmelves. Variables can be distracting at times, and code may be clearer if it describes what it is doing without mentioning variables. This is known as point-free style or tacit programming.

Ken Iverson’s APL language pioneered this technique, and developed it to such an extent that avoiding variables became idiomatic. John Backus was inspired, as can be observed in his renowned Turing award lecture, though oddly, he accuses the closely related lambda calculus of fomenting chaos; perhaps he was overly enamoured by APL’s informal ban on variables!

Kiselyov

In 2018, Kiselyov found another bracket abstraction algorithm which we describe elsewhere.

Mac Lane, Eilenberg

In mathematics, the study of functions without mentioning variables occurs in category theory, founded by Saunders Mac Lane and Samuel Eilenberg. The absence of variables not only simplifies the picture, but also enables generalization: we can make sweeping statements that apply to sets and functions as well as, say, posets.

Conal Elliott, Compiling to categories, explores the connection between point-free code and category theory.

My main sources for the historical details were:

Cardone and Hindley, History of Lambda-calculus and Combinatory Logic
van Heijenoort, From Frege to Gödel

Ben Lynn blynn@cs.stanford.edu 💡