{ For GHC, uncomment the following, and comment the next block. {# LANGUAGE LambdaCase, BlockArguments #} import Control.Applicative import Control.Arrow } module Main where import Base import System foreign export ccall "debruijnMain" debruijnMain debruijnMain = interact $ either id (show . deBrujin) . parseLC foreign export ccall "rosenbloomMain" rosenbloomMain rosenbloomMain = interact $ either id (show . nova rosenbloom) . parseLC foreign export ccall "optimizeKMain" optimizeKMain optimizeKMain = interact $ either id (show . nova optimizeK) . parseLC foreign export ccall "rosenbloomnoxMain" rosenbloomnoxMain rosenbloomnoxMain = interact $ either id (show . rosenbloom "x") . parseCL foreign export ccall "optimizeknoxMain" optimizeknoxMain optimizeknoxMain = interact $ either id (show . optimizeK "x") . parseCL foreign export ccall "stepskMain" stepskMain stepskMain = interact $ either id (show . step) . parseCL foreign export ccall "whnfskMain" whnfskMain whnfskMain = interact $ either id (show . whnf) . parseCL foreign export ccall "normskMain" normskMain normskMain = interact $ either id (show . norm) . parseCL data Charser a = Charser { getCharser :: String > Either String (a, String) } instance Functor Charser where fmap f (Charser x) = Charser $ fmap (first f) . x instance Applicative Charser where pure a = Charser \s > Right (a, s) f <*> x = Charser \s > do (fun, t) < getCharser f s (arg, u) < getCharser x t pure (fun arg, u) instance Monad Charser where Charser f >>= g = Charser $ (good =<<) . f where good (r, t) = getCharser (g r) t return = pure instance Alternative Charser where empty = Charser \_ > Left "" (<>) x y = Charser \s > either (const $ getCharser y s) Right $ getCharser x s sat f = Charser \case h:t  f h > Right (h, t) _ > Left "unsat" eof = Charser \case [] > Right ((), "") _ > Left "want EOF" char :: Char > Charser Char char = sat . (==) string :: String > Charser String string s = mapM char s isAlphaNum c  'A' <= c && c <= 'Z' = True  'a' <= c && c <= 'z' = True  '0' <= c && c <= '9' = True  otherwise = False lcTerm = lam <> app where lam = flip (foldr Lam) <$> (lam0 *> some var <* lam1) <*> lcTerm lam0 = str "\\" <> str "\955" lam1 = str ">" <> str "." app = foldl1 App <$> some (Var <$> var <> str "(" *> lcTerm <* str ")") var = some (sat isAlphaNum) <* whitespace clTerm = app where app = foldl1 (:@) <$> some (com <> var <> str "(" *> clTerm <* str ")") com = Com . (:[]) <$> sat (`elem` "SKBCRIT") <* whitespace var = Tmp <$> some (sat isAlphaNum) <* whitespace str = (<* whitespace) . string whitespace = many $ char ' ' parseLC = fmap fst . getCharser (whitespace *> lcTerm <* eof) parseCL = fmap fst . getCharser (whitespace *> clTerm <* eof) instance Show Term where show (Lam s t) = "\955" ++ s ++ showB t where showB (Lam x y) = " " ++ x ++ showB y showB expr = "." ++ show expr show (Var s) = s show (App x y) = showL x ++ showR y where showL (Lam _ _) = "(" ++ show x ++ ")" showL _ = show x showR (Var s) = ' ':s showR _ = "(" ++ show y ++ ")"
Combinatory Logic
Lambda term:
Look ma,data Term = Var String  App Term Term  Lam String Term
Variables are the trickiest part of lambda calculus. And naming is the trickiest part of variables: the most complex code in our lambda evaluator is the part that renames variables to perform captureavoiding substitutions.
Also, the lambda term \(\lambda x \lambda y . y x\) is really the same as \(\lambda y \lambda z . z y\), but we must rename variables in order to see this.
Names are artificial tedious tags whose sole purpose is to aid human comprehension. Can we get rid of them? There ought to be a way to study computation without naming names.
(We will shortly mention many names, but they will be the names of researchers and not variables!)
De Bruijn
Large institutions like universities often anonymize by replacing names with numbers. We do the same. We replace each occurrence of a variable with a natural number indicating which lambda bound it, known as its De Bruijn index.
The innermost lambda is 0 (though some heretics prefer to start from 1), and we increment as we go outwards. Numbering the other way is possible, but then common subexpressions appear distinct.
Instead of a sequence of bound variables like \(\lambda x y z\), we now write a string of repeated lambdas like \(\lambda \lambda \lambda\). This means we no longer we need a period to separate the lambda bindings from the body of an expression.
For example,
becomes:
data DB = N Int  L DB  A DB DB instance Show DB where showsPrec p = \case N n > (if p > 0 then (' ':) else id) . shows n L t > ('\955':) . shows t A t u > showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u index x xs = lookup x $ zip xs [0..] deBrujin = go [] where go binds = \case Var x > maybe (error $ "free: " ++ x) N $ index x binds Lam x t > L $ go (x:binds) t App t u > A (go binds t) (go binds u)
We port our lambda term evaluator to De Bruijn indices. Gone is the complex renaming logic, and basic arithmetic suffices for telling apart free and bound variables.
whnfDB t = case t of A f x > case whnfDB f of L body > whnfDB $ beta 0 x body g > A g x L _ > t beta lvl x = \case N n  n < lvl > N n  n == lvl > x  otherwise > N $ n  1 A t u > A (beta lvl x t) (beta lvl x u) L t > L $ beta (lvl + 1) x t
Schönfinkel, Curry
Buoyed by our success, we aim higher. Can we remove variables altogether, whether they’re names or numbers?
We start with a simple demonstration. Define:
and:
Then we find for any \(a, b\):
Thus \(KI = \lambda a b . b = \lambda \lambda 0\). In other words, we have combined \(K\) and \(I\) to produce a closed lambda term distinct to either one. (Thanks to De Bruijn indices, we can see at a glance they differ.) Moreover, no variables were needed: we merely applied \(K\) to \(I\).
Can we choose a small set of closed lambda terms such that they can be applied to one another to produce any given closed lambda term?
Moses Schönfinkel asked and answered this question when he presented the ideas of On the building blocks of mathematical logic to the Göttingen Mathematical Society on 7 December 1920. It was written up for publication in March 1924 by Heinrich Behmann.
A few years later, Haskell Curry rediscovered similar ideas, and only became aware of Schönfinkel’s work in late 1927. His solution was slightly different.
We say combinator rather than "closed lambda term" in this context, which perhaps reminds us that we’re focused on combining specific closed lambda terms to see what we can build, rather than playing games with variables.
Church
All this was years before lambda calculus were discovered! How is this possible? Recall any boolean circuit can be constructed just from NAND gates. Schönfinkel sought to analogously reduce predicate logic into as few elements as possible, and it turns out the "for all" and "there exists" quantifiers of predicate logic behave like lambda abstractions.
Logic also motivated Curry, who wanted to better understand the rule of substitution. Even Russell and Whitehead’s Principia Mathematica, famed for an excruciatingly detailed proof of \(1 + 1 = 2\), lacked an explicit definition of substitution.
In fact, around 1928 Alonzo Church devised lambda calculus so he could explicitly define and analyze substitution, so it’s not that Schönfinkel and Curry traveled in time, but that everyone was studying the same hot topic. Due to its origin story, the study of combinators is called combinatory logic.
Schönfinkel wrote definitions like \(Ix = x\) instead of \(I = \lambda x.x\), notation which Church would later invent. We will, too, to set the mood. (Decades later, Landin reverted to Schönfinkel’s style to avoid frightening readers with lambdas!)
Schönfinkel found that all combinators, that is, all closed lambda terms, can be built exclusively from the \(S\) and \(K\) combinators, which are given by:
except he wrote "C" instead of "K" (which is strange because the paper calls it
the Konstanzfunktion so you’d think "K" would be preferred;
Curry likely changed the letters
to what we write today). By the way, in Haskell \(S\) is the Reader instance
of ap
or (<*>)
and \(K\) is const
.
We define a data structure to hold expressions built from combinators. It’s
little more than a binary tree, because we no longer have lambda abstractions.
There’s one wrinkle: it turns out we need to temporarily store a variable in a
leaf at times so we add a Tmp
data constructor.
data CL = Com String  Tmp String  CL :@ CL instance Show CL where showsPrec p = \case Com s > (if p > 0 then (' ':) else id) . (s++) Tmp s > (if p > 0 then (' ':) else id) . (s++) t :@ u > showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u
A cleaner design may be to import the Data.Magma
package and use BinaryTree
String
to hold combinator terms and BinaryTree (Either String String)
to
hold combinator terms that may contain variables. But that’s more code than
I want for simple demo.
Schönfinkel went further and combined \(S\) and \(K\) into one combinator to rule them all. The Iota and Jot languages do something similar. We seem to gain no advantage from these contrived curiosities, other than being able to brag that one combinator suffices for any computation. See also Fokker, 1992, The systematic construction of a onecombinator basis for LambdaTerms.
Rosenbloom
So how did Schönfinkel convert any closed lambda term to a tree of \(S\) and \(K\) combinators? We won’t say, firstly because Schönfinkel didn’t actually state an algorithm and only walked through an example, and secondly because we prefer to examine a more streamlined version published by Paul Rosenbloom in 1950 (page 117 of The Elements of Mathematical Logic).
The key is to solve a subproblem known as bracket abstraction. Suppose we have an expression \(e\) built from \(S\) and \(K\) combinators and variables, for example:
Pick a variable, say \(x\). Then find an expression \(f\) such that
where \(f\) does not contain any occurrences of \(x\) (so \(f\) is built from \(S, K\) and variables other than \(x\)). Smullyan calls \(f\) an \(x\)eliminate of \(e\) in his puzzle book To Mock a Mockingbird. Some authors write:
These authors may also use brackets to denote substitution. For example, the notation \([P/x] x = P\) means "substituting \(P\) for \(x\) in the term \(x\) results in \(P\)".
[x]
Like just about all problems involving trees, recursion is the solution.
If \(e = x\), then a solution is \(f = SKK\). We can check \(SKKx = x\).
If \(e = y\), then a solution is \(f = Ky\), as \(Kyx = y\). This works for any variable apart from \(x\), or \(S\) or \(K\).
Otherwise \(e = t u\) where \(t, u\) are expressions. Then we recursively solve the problem for \(t, u\) to get \(t', u'\), and a solution is \(f = S t' u'\). We have \(S t' u' x = t' x (u' x) = t u\).
rosenbloom x = \case Tmp y  x == y > Com "S" :@ Com "K" :@ Com "K" t :@ u > Com "S" :@ rosenbloom x t :@ rosenbloom x u t > Com "K" :@ t
Antoni Diller’s page on classic bracket abstraction features online interactive demos of many variants of this algorithm.
Converting a given closed lambda term to \(S\) and \(K\) combinators is simply a matter of performing bracket abstraction for each lambda. We parameterize the bracket abstraction function so we can play with different versions.
nova elim = \case App t u > rec t :@ rec u Lam x t > elim x $ rec t Var x > Tmp x where rec = nova elim
Naturally, we can run \(SK\) programs in our lambda evaluator by adding the definitions:
S = \x y z.x z(y z) K = \x y.x
However, we wish to flaunt the simplicity of \(SK\) interpreters. We even throw in bonus standard combinators because it’s easy, and they make the web widget more useful.
Here are functions for:

head reduction: reduce a standard combinator at the beginning of the expression

weak head normalization: perform head reductions until no longer possible

normalization: reduce combinators anywhere in the expression until no longer possible, taking care to search for reducible expressions (redexes) starting from the left.
Like lambda calculus, preferring the leftmost redex at each step guarantees we’ll reach the normal form if it exists.
reduce (Com "S" :@ x :@ y :@ z) = Just $ x :@ z :@ (y :@ z) reduce (Com "K" :@ x :@ y) = Just x  Bonus combinators. reduce (Com "B" :@ x :@ y :@ z) = Just $ x :@ (y :@ z) reduce (Com "C" :@ x :@ y :@ z) = Just $ x :@ z :@ y reduce (Com "R" :@ x :@ y :@ z) = Just $ y :@ z :@ x reduce (Com "I" :@ x) = Just x reduce (Com "T" :@ x :@ y) = Just $ y :@ x reduce _ = Nothing step (f :@ z) = maybe (step f :@ z) id $ reduce (f :@ z) step t = t whnf (f :@ z) = maybe t whnf $ reduce t where t = whnf f :@ z whnf t = t norm t = case whnf t of x :@ y > norm x :@ norm y t > t
Clearly, evaluating \(S\) and \(K\) combinators is far simpler than dealing with betareduction in lambda calculus even if we employ De Bruijn indices.
Combinatory logic term:
Turner
Together, the \(S\) and \(K\) combinators possess as much power as lambda calculus, that is, they are Turing complete! Thus a good strategy for a compiler might be to translate a given lambda calculus program to combinators because it’s so easy to generate code that repeatedly reduces \(S\) and \(K\) combinators.
However, there’s a snag. This idea only works if the conversion is efficient. Rosenbloom’s bracket abstraction algorithm is exponential in the worst case. Consider, say:
We start with one application, of \(S\) to \(K\). For each lambda, the
rosenbloom
function recursively calls itself on either side of each
application, which at least doubles the number of applications, so the result
will have at least \(2^n\) applications.
A few tweaks greatly improve the efficiency, and indeed, Schönfinkel’s paper suggests a faster algorithm, and Curry explicitly stated more efficient algorithms.
A particular effective improvement known as the Koptimization is to detect an unused variable as far up the tree as possible and apply \(K\) here, rather than painstakingly send the variable down to each leaf of the tree, only to have it eventually ignored via \(K\) at the last minute.
occurs x = \case Tmp s  s == x > True t :@ u > occurs x t  occurs x u _ > False optimizeK x t  not $ occurs x t = Com "K" :@ t  otherwise = case t of Tmp y > Com "S" :@ Com "K" :@ Com "K" t :@ u > Com "S" :@ optimizeK x t :@ optimizeK x u
David Turner found enough optimizations to enable a practical compiler for his Miranda language, that is, we implement a small set of combinators, then rewrite the input program in terms of those combinators. Our lambdacalculustowasm compiler demonstrates this approach.
Early Haskell compilers took the same approach, and later Haskell compilers extended this approach by generating custom combinators that are tailormade for each input program. These are known as supercombinators in the literature. However, modern GHC shuns combinators in favour of the Spineless Tagless GMachine.
Iverson
Combinators are also useful on the other side of the compiler, that is, in source programs thesmelves. Variables can be distracting at times, and code may be clearer if it describes what it is doing without mentioning variables. This is known as pointfree style or tacit programming.
Ken Iverson’s APL language pioneered this technique, and developed it to such an extent that avoiding variables became idiomatic. John Backus was inspired, as can be observed in his renowned Turing award lecture, though oddly, he accuses the closely related lambda calculus of fomenting chaos; perhaps he was overly enamoured by APL’s informal ban on variables!
Mac Lane, Eilenberg
In mathematics, the study of functions without mentioning variables occurs in category theory, founded by Saunders Mac Lane and Samuel Eilenberg. The absence of variables not only simplifies the picture, but also enables generalization: we can make sweeping statements that apply to sets and functions as well as, say, posets.
Conal Elliott, Compiling to categories, explores the connection between pointfree code and category theory.
My main sources for the historical details were:

Cardone and Hindley, History of Lambdacalculus and Combinatory Logic

van Heijenoort, From Frege to Gödel