Combinator Golf

Gregory Chaitin’s pioneering results on algorithmic information theory are exciting because he wrote real programs to make theory concrete. Sadly, Chaitin chose LISP due to a flawed understanding of lambda calculus, diminishing the beauty of his results.

He’s not alone. Alan Kay described LISP as the "Maxwell’s equations of software" because a "half page of code" described LISP itself. Surely the equally powerful lambda calculus, which can describe itself in far less space, is more deserving of the title. Paul Graham’s The Roots of Lisp dubs the LISP self-interpreter "The Surprise" because of its supposed brevity. What, then, should one call a one-line lambda calculus self-interpereter?

(Aside: on the other hand, the analogy with Maxwell’s equations is apt, because if Maxwell had used geometric algebra instead of vector algebra, he could have been less verbose and we would be celebrating Maxwell’s equation, that is, one equation instead of four:

\[(\partial_t + \nabla)(\mathbf{e} + \mathbf{B}) = 0\]

in a vacuum, and more generally:

\[(\partial_t + \nabla)(\mathbf{e} + \mathbf{B}) = \rho + \mathbf{j}\]

where bivectors represent the magnetic field \(\mathbf{B}\).)

John Tromp’s reworking of Chaitin’s ideas in lambda calculus and combinatory logic is a fascinating read. Much of the fun involves tiny self-interpreters that read binary encodings of themselves.

Tromp’s binary CL self-interpreter expects the input in the form of nested pairs such as:

(True, (False, (False, ...  (True, const) ... ))

where the booleans and pairs have their usual Church/Scott encodings.

If we could disable type-checking in Haskell, the interpreter is simply:

eval c = uncurry (bool (uncurry (c . bool const ap)) (eval (eval . (c .))))

and satisfies:

eval c (encode m n) == c m n

where:

data CL = App CL CL | S | K
encode m n = case m of
  x :@ y -> (True, (encode x (encode y n)))
  S -> (False, (False, n))
  K -> (False, (True , n))

Some of our type-challenged compilers will happily run this crazy code. Our "Fixity" compiler accepts the following:

data Bool = True | False;
id x = x;
const x y = x;
ap x y z = x z(y z);
bool x y b = case b of { True -> y ; False -> x };
uncurry f x = case x of { (,) a b -> f a b };
(.) f g x  = f (g ( x));
encode m n = case m of
  { S -> (False, (False, n))
  ; K -> (False, (True , n))
  ; App x y -> (True, (encode x (encode y n)))
  };
eval c = uncurry (bool (uncurry (c . bool ap const)) (eval (eval . (c .))));
go s = eval id (encode (App (App S K) K) s);

Since SKK is the identity, the above program just returns the input.

Kiselyov’s bracket abstraction algorithm leads us to wonder: why limit ourselves to S and K? After all, in lambda calculus, we could prohibit terms with a De Bruijn index greater than 2 and retain the same computing power, but nobody bothers. Why insist on two particular combinators?

Hole in one

With no restrictions, adding combinators to the definition of CL to obtain a smaller self-interpreter is too easy. In the extreme, we could take a "trusting trust" approach and define a combinator X which decodes a given binary string to a CL term and interprets it.

The set of all combinators is X, S, and K. Representing X with 1 bit trivially results in a 1-bit self-interpreter.

Rule of three

Perhaps it’s reasonable to say a combinator is eligible if it is equivalent to a closed lambda term with at most 3 lambda abstractions and at most 3 applications. That is, every combinator is at most as complex as the S combinator.

We choose the following 6 combinators:

Sxyz = xz(yz)
Bxyz = x (yz)
Cxyz = xz(y )
Kxy  = x
Txy  = yx
Vxyz = zxy

We encode them in binary according to the following table (where the backquote represents application):

` 1
B 01
V 0011
T 0010
S 0001
C 00001
K 00000

Then the following is a binary self-interpreter:

f c = T(V(T(V(T(T.V(V(T(c.V K C))(c S))(c.V T V)))(c B)))(f(f.(c.))))

Kiselyov’s algorithm yields:

`Y``B`BT``B`S``BV``BT``S``BV``BT``B`BT``S``BV``S``BV``BT``CB``VKC`TS``CB``VTV`TB``SB``C``BBBB

We define the Y combinator with:

Y = ``B``TT``CB``STT

so our self-interpreter takes 232 bits, beating Tromp’s Theorem 4 record of 263 bits.

The following demonstrates this self-interpreter in our "Fixity" compiler:

data Bool = True | False;
id x = x;
const x y = x;
flip x y z = x z y;
ap x y z = x z(y z);
bool x y b = case b of { True -> y ; False -> x };
uncurry f x = case x of { (,) a b -> f a b };
(.) f g x  = f(g(x));
data CL = K | C | S | T | V | B | App CL CL;
encode m n = case m of
  { K -> (False, (False, (False, (False, (False, n)))))
  ; C -> (False, (False, (False, (False, (True , n)))))
  ; S -> (False, (False, (False, (True , n))))
  ; T -> (False, (False, (True , (False, n))))
  ; V -> (False, (False, (True , (True , n))))
  ; B -> (False, (True , n))
  ; App x y -> (True, (encode x (encode y n)))
  };
t = uncurry;
v = bool;
eval c = t(v(t(v(t(t . v(v(t(c . v const flip))(c ap))(c . v t v)))(c(.)))) (eval(eval . (c .))));
demo _ = eval id (encode (App T (App K (App (App S K) K))) ("one", "two"));

The term T(K(SKK)) returns the second element of a Scott-encoded pair, so the above computes snd ("one", "two").

Is it sporting to consider the Y combinator primitive? If so, we could likely shrink the interpreter further.

Instant REPL play

Interactive REPLs such as GHCi shine when we want to ask the computer for help while reducing the number of dreaded edit-compile-run cycles.

We still write code in a file, but once done, we play around in the REPL to print codes and sizes.

import Data.List
import qualified Data.Map as M
import Data.Map (Map, (!))
import Data.Ord
import Data.Tree

coms = "```B``TT``CB``STT``B`BT``B`S``BV``BT``S``BV``BT``B`BT``S``BV``S``BV``BT``CB``VCK`TS``CB``VTV`TB``SB``C``BBBB"

histo = M.fromListWith (+) $ (\c -> ([c], 1)) <$> coms

huff :: Map String Int -> [(String, [Int])]
huff ps = huff' [] $ huffTree $ (\(k, v) -> Node (k, v) []) <$> M.assocs ps
  where
  huff' s (Node (k, _) []) = [(k, s)]
  huff' s (Node _ [x, y]) = huff' (0:s) x ++ huff' (1:s) y
  huffTree [p] = p
  huffTree ps = huffTree $ Node ("", v0 + v1) [p0, p1]:ps2 where
    p0@(Node (_, v0) _) = getMin ps
    ps1 = delete p0 ps
    p1@(Node (_, v1) _) = getMin ps1
    ps2 = delete p1 ps1
    getMin = minimumBy $ comparing $ \(Node (_, v) _) -> v

total = foldr (\(c, enc) n -> n + length enc * histo!c) 0 $ huff histo

Infinite regress

Is mainstream mathematics mistaken in its handling of the infinite? What does it mean, Edward Nelson asks, to treat the unfinished as finished? We might get a theory that is internally consistent, but so what? Good stories that have no bearing on reality are also internally consistent. Perhaps undecidability is meaningless, as Doron Zeilberger spiritedly opines.

If this turns out to be the case, then I’ll be annoyed because of the time I spent learning a lot of this stuff. But no matter what, it’ll always be fun to seek tinier self-interpreters!

Turing machine golf is also possible, though more painful. Yedidia and Aaronson find small Turing machines related to Busy Beaver numbers, the Goldbach conjecture, and the Riemann hypothesis, which were later shrunk further. Porting combinators to Turing machines may compare well with their approach.


Ben Lynn blynn@cs.stanford.edu 💡