```
{- For GHC, uncomment the following, and comment the next block.
{-# LANGUAGE LambdaCase, BlockArguments #-}
import Control.Applicative
import Control.Arrow
-}
module Main where
import Base
import System
foreign export ccall "plainMain" plainMain
plainMain = interact $ either id (show . snd . plain . deBrujin) . parseLC
foreign export ccall "optimizeKMain" optimizeKMain
optimizeKMain = interact $ either id (show . snd . optK . deBrujin) . parseLC
foreign export ccall "optimizeEtaMain" optimizeEtaMain
optimizeEtaMain = interact $ either id (show . snd . optEta . deBrujin) . parseLC
foreign export ccall "rawBulkMain" rawBulkMain
rawBulkMain = interact $ either id (show . snd . rawBulk . deBrujin) . parseLC
foreign export ccall "linBulkMain" linBulkMain
linBulkMain = interact $ either id (show . snd . linBulk . deBrujin) . parseLC
foreign export ccall "logBulkMain" logBulkMain
logBulkMain = interact $ either id (show . snd . logBulk . deBrujin) . parseLC
foreign export ccall "linearbbMain" linearbbMain
linearbbMain = interact $ either id (show . uncurry breakBulkLinear) . parseBulk
foreign export ccall "logbbMain" logbbMain
logbbMain = interact $ either id (show . uncurry breakBulkLog) . parseBulk
data Charser a = Charser { getCharser :: String -> Either String (a, String) }
instance Functor Charser where fmap f (Charser x) = Charser $ fmap (first f) . x
instance Applicative Charser where
pure a = Charser \s -> Right (a, s)
f <*> x = Charser \s -> do
(fun, t) <- getCharser f s
(arg, u) <- getCharser x t
pure (fun arg, u)
instance Monad Charser where
Charser f >>= g = Charser $ (good =<<) . f
where good (r, t) = getCharser (g r) t
return = pure
instance Alternative Charser where
empty = Charser \_ -> Left ""
(<|>) x y = Charser \s -> either (const $ getCharser y s) Right $ getCharser x s
sat f = Charser \case
h:t | f h -> Right (h, t)
_ -> Left "unsat"
eof = Charser \case
[] -> Right ((), "")
_ -> Left "want EOF"
char :: Char -> Charser Char
char = sat . (==)
string :: String -> Charser String
string s = mapM char s
isDigit c = '0' <= c && c <= '9'
isAlphaNum c
| 'A' <= c && c <= 'Z' = True
| 'a' <= c && c <= 'z' = True
| isDigit c = True
| otherwise = False
lcTerm = lam <|> app where
lam = flip (foldr Lam) <$> (lam0 *> some var <* lam1) <*> lcTerm
lam0 = str "\\" <|> str "\955"
lam1 = str "->" <|> str "."
app = foldl1 App <$> some (Var <$> var <|> str "(" *> lcTerm <* str ")")
var = some (sat isAlphaNum) <* whitespace
str = (<* whitespace) . string
whitespace = many $ sat $ (`elem` " \n")
parseLC = fmap fst . getCharser (whitespace *> lcTerm <* eof)
parseBulk = fmap fst . getCharser do
whitespace
c <- sat (`elem` "BCS")
whitespace
n <- foldl (\n c -> n*10 + fromEnum c - fromEnum '0') 0 <$> some (sat isDigit)
eof
pure ([c], n)
instance Show Term where
show (Lam s t) = "\955" ++ s ++ showB t where
showB (Lam x y) = " " ++ x ++ showB y
showB expr = "." ++ show expr
show (Var s) = s
show (App x y) = showL x ++ showR y where
showL (Lam _ _) = "(" ++ show x ++ ")"
showL _ = show x
showR (Var s) = ' ':s
showR _ = "(" ++ show y ++ ")"
```

# Kiselyov Combinator Translation

Lambda term:

Oleg Kiselyov’s algorithm for compiling lambda calculus to combinators seems unreasonably effective in practice.

## A Special Bracket Abstraction

The heart of the algorithm is a specialized bracket abstraction. Let \(d_1\) be a
combinator and *eta-expand* it any number of times, that is, nest it in a bunch
of lambda abstractions and apply to each of the new variables, starting from
the variable bound in the outermost lambda. For example: eta-expanding \(d_1\)
three times yields:

when we write it with De Bruijn indices, Let \(d_2\) be a combinator, and eta-expand it any number of times. For example, eta-expanding \(d_2\) twice yields:

Let \(d\) be the result of applying the innermost body of one to that of the other, while sharing the same variables:

How do you eliminate the added variables? That is, can you write \(d\) just using \(d_1, d_2\) and some fixed set of combinators?

We’ll have more to say about eta-expansion in our tour of Böhm’s Theorem, but we do want to mention a subtlety here. The eta-expansion of a lambda term \(F\) is defined to be \(\lambda F 0\). Eta-expanding again results in \(\lambda (\lambda F 0) 0\), which reduces to \(\lambda F 0\), getting us nowhere. On the other hand, eta-expanding just the body of the term, namely \(F 0\), results in the less trivial \(\lambda \lambda F 1 0\).

So when we say we eta-expand a term multiple times, we mean we expand the part after the lambdas (the body of a Böhm tree node), and not the entire term.

## Solution

We choose the combinators \(B, R, S\):

(Schönfinkel would have written \(Z, TT, S\); in Haskell, these are (.), flip flip, and the Reader instance of ap or (<*>).)

Then the following solves our example puzzle:

What’s the secret to deriving this? We could run a standard bracket abstraction algorithm a few times, but better to take advantage of the eta-expansions.

First, some notation. Let \(n\models d\) denote the term \(d\) eta-expanded \(n\) times. For example, \(3 \models d = \lambda \lambda \lambda d 2 1 0\).

Then we may restate our problem as follows. Given \(n_1 \models d_1\) and \(n_2 \models d_2\), our goal is find a combinator term equivalent to:

where there are \(\max(n_1, n_2)\) lambda abstractions. (This equation would be more compact if we started our De Bruijn indices from 1, but I refuse to budge from 0-indexing!)

The trick is to use induction. Define the operation \(\amalg\) by:

We can mechanically verify \(d = n_1 \models d_1 \amalg n_2 \models d_2\) is a solution.

## Combinators

Adding a little code turns the above bracket abstraction for variable-sharing eta-expansions into an algorithm for translating any lambda calculus term to combinators.

But first, let’s recall some standard combinators:

(Schönfinkel would have written \(I, C, T\); in Haskell, these are id, const, and flip.)

Now, given a lmabda term, we rewrite it in De Bruijn form, where each index is written using a unary representation known as the Peano numbers:

For example, the standard combinators we just mentioned are:

We rewrite our De Bruijn conversion code to use Peano numbers:

```
data Term = Var String | App Term Term | Lam String Term
data Peano = S Peano | Z
data DB = N Peano | L DB | A DB DB | Free String
index x xs = lookup x $ zip xs $ iterate S Z
deBrujin = go [] where
go binds = \case
Var x -> maybe (Free x) N $ index x binds
Lam x t -> L $ go (x:binds) t
App t u -> A (go binds t) (go binds u)
```

Define:

where we have recursively computed:

If \(e\) is a closed lambda term and \(n \models d = \ceil{e}\), then \(n = 0\) and the combinator \(d\) is the bracket abstraction of \(e\). More generally, for any (possibly open) lambda term \(e\), adding \(n\) abstractions to \(e\) will close it, and \(d\) is the bracket abstractions of the result.

We’ll wave our hands to explain the algorithm and hope informality helps us get away with less theory. See Kiselyov for details.

The first rule says \(Z\) translates to \(1\models I\), namely a thing that becomes the \(I\) combinator after adding one lambda. Indeed, if we apply the third rule, we find \(\ceil{\lambda Z} = 0\models I\), as expected.

For the second rule, consider the example:

As expected, we need 2 lambdas to close the term \(SZ\), and roughly speaking, the \(K\) combinator shifts the focus from the second variable to the first. This generalizes to higher De Bruijn indices.

The third rule has two cases. If the \(d\) is already a closed term, then the \(K\) combinator ignores the new variable and simply returns \(d\). Otherwise, we decrement the count indicating how many more lambdas we need to close the term.

The last rule uses the specialized bracket abstraction, which captures the idea of applying a possibly open term to another. We see why we want to share variables: when we eventually close the term with lambda abstractions, the indices in both sides of the application refer to the same set of variables.

In our code, the tuple (n, d) represents \(n \models d\).

```
data CL = Com String | CL :@ CL
instance Show CL where
showsPrec p = \case
Com s -> (if p > 0 then (' ':) else id) . (s++)
t :@ u -> showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u
convert (#) = \case
N Z -> (1, Com "I")
N (S e) -> (n + 1, (0, Com "K") # t) where t@(n, _) = rec $ N e
L e -> case rec e of
(0, d) -> (0, Com "K" :@ d)
(n, d) -> (n - 1, d)
A e1 e2 -> (max n1 n2, t1 # t2) where
t1@(n1, _) = rec e1
t2@(n2, _) = rec e2
Free s -> (0, Com s)
where rec = convert (#)
plain = convert (#) where
(0 , d1) # (0 , d2) = d1 :@ d2
(0 , d1) # (n , d2) = (0, Com "B" :@ d1) # (n - 1, d2)
(n , d1) # (0 , d2) = (0, Com "R" :@ d2) # (n - 1, d1)
(n1, d1) # (n2, d2) = (n1 - 1, (0, Com "S") # (n1 - 1, d1)) # (n2 - 1, d2)
```

I have a poor intuitive grasp why Kiselyov’s algorithm seems to beat traditional methods, but it’s clear the algorithm sticks out from the rest of the pack. Roughly speaking, all of them start from the innermost lambdas and work their way outwards, but while traditional algorithms treat locally unbound variables as opaque constants, Kiselyov’s algorithm meticulously records information about each unbound variable: the left of a \(\models\) is the number of lambdas needed to close the variable on the right. Perhaps this industriousness leads to better performance.

## Lazy Weakening

Consider the term \(\lambda y x.y y\), or in De Bruijn notation:

The second variable is unused, so we could derive the following combinator:

However, recall:

which leads to the verbose:

A better strategy is to refrain from immediately converting terms like \(SZ\) to combinators. Instead, on encountering \(S\), we leave a sort of IOU that we eventually redeem with a combinator. If both sides of an application turn out to ignore the same variable, we merge the IOUs into one, resulting in savings.

To this end, we replace the number \(n\) in \(n \models d\) with a list of \(n\) booleans, which we often denote by \(\Gamma\). We use cons lists, and denote cons by \((:)\), and the empty list by \(\emptyset\).

The list item at index \(k\) is True when the variable with De Bruijn index \(k\) appears at least once in the term, in which case we have already generated combinators to get at it. Otherwise it is unused and marked False, and is only cashed in for a \(K\) combinator when eventually lambda-abstracted, provided it has never been upgraded to True because the other branch of an application uses the corresponding variable.

These 4 cases are the same as before. The remaining cases prepare the ground for lazy weakening.

They also complicate the connection with the specialized bracket abstraction:

where \(\vee\) means we OR together the corresponding booleans of each input list, and if one list is longer than the other, we append the leftovers.

```
convertBool (#) = \case
N Z -> (True:[], Com "I")
N (S e) -> (False:g, d) where (g, d) = rec $ N e
L e -> case rec e of
([], d) -> ([], Com "K" :@ d)
(False:g, d) -> (g, ([], Com "K") # (g, d))
(True:g, d) -> (g, d)
A e1 e2 -> (zipWithDefault False (||) g1 g2, t1 # t2) where
t1@(g1, _) = rec e1
t2@(g2, _) = rec e2
Free s -> ([], Com s)
where rec = convertBool (#)
optK = convertBool (#) where
([], d1) # ([], d2) = d1 :@ d2
([], d1) # (True:g2, d2) = ([], Com "B" :@ d1) # (g2, d2)
([], d1) # (False:g2, d2) = ([], d1) # (g2, d2)
(True:g1, d1) # ([], d2) = ([], Com "R" :@ d2) # (g1, d1)
(False:g1, d1) # ([], d2) = (g1, d1) # ([], d2)
(True:g1, d1) # (True:g2, d2) = (g1, ([], Com "S") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (True:g2, d2) = (g1, ([], Com "B") # (g1, d1)) # (g2, d2)
(True:g1, d1) # (False:g2, d2) = (g1, ([], Com "C") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (False:g2, d2) = (g1, d1) # (g2, d2)
zipWithDefault d _ [] [] = []
zipWithDefault d f (x:xt) (y:yt) = f x y : zipWithDefault d f xt yt
zipWithDefault d f (x:xt) [] = f x d : zipWithDefault d f xt []
zipWithDefault d f [] (y:yt) = f d y : zipWithDefault d f [] yt
```

This corresponds with Kiselyov’s OCaml implementation from Section 4, but we’re more lax with types, and we use a standard list of booleans rather than define a dedicated type. Kiselyov’s C can be thought of as the empty list, while N and W add True and False to an existing list. It’s like Peano arithmetic with two kinds of successors.

As Kiselyov notes, lazy weakening is precisely the famous K-optimization of textbook bracket abstraction. It fits snugly here, whereas the bracket abstraction algorithms awkwardly require extra passes to detect the absence of a variable in a subtree for each lambda.

## The Eta Optimization

The above postpones the conversion of \(S\) to a \(K\) combinator. Section 4.1 of Kiselyov describes how to postpone the conversion of \(Z\) to an \(I\) combinator. The V that appears in his code is an IOU that is either eventually redeemed for an \(I\) combinator or optimized away.

We take a blunter approach. During the computation of \(\amalg\), we simply match on \(I\) combinators that can be optimized. Loosely speaking, we examine the result of a bracket abstraction to determine our course of action, rather than pass around what we need to know on the left of a \(\models\).

This may be messy, but on the other hand, we have no need to extend the list of booleans to a more complex data type, and we simply omit the (N e, V) and (V, N e) cases where no optimizations apply.

Here, we prefer the combinator \(T\) given by \(Txy = yx\) to the \(CI\) in the paper.

```
optEta = convertBool (#) where
([], d1) # ([], d2) = d1 :@ d2
([], d1) # (True:[], Com "I") = d1
([], d1) # (True:g2, d2) = ([], Com "B" :@ d1) # (g2, d2)
([], d1) # (False:g2, d2) = ([], d1) # (g2, d2)
(True:[], Com "I") # ([], d2) = Com "T" :@ d2
(True:[], Com "I") # (False:g2, d2) = ([], Com "T") # (g2, d2)
(True:g1, d1) # ([], d2) = ([], Com "R" :@ d2) # (g1, d1)
(True:g1, d1) # (True:g2, d2) = (g1, ([], Com "S") # (g1, d1)) # (g2, d2)
(True:g1, d1) # (False:g2, d2) = (g1, ([], Com "C") # (g1, d1)) # (g2, d2)
(False:g1, d1) # ([], d2) = (g1, d1) # ([], d2)
(False:g1, d1) # (True:[], Com "I") = d1
(False:g1, d1) # (True:g2, d2) = (g1, ([], Com "B") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (False:g2, d2) = (g1, d1) # (g2, d2)
```

## Bulk combinators

If we allow families of certain *bulk combinators*, we get a linear-time
specialized bracket abstraction, which in turn yields a linear-time translation
algorithm from lambda terms to combinators.

Define:

These perfectly suit shared-eta bracket abstraction:

```
convertBulk bulk = convert (#) where
(a, x) # (b, y) = case (a, b) of
(0, 0) -> x :@ y
(0, n) -> bulk "B" n :@ x :@ y
(n, 0) -> bulk "C" n :@ x :@ y
(n, m) | n == m -> bulk "S" n :@ x :@ y
| n < m -> bulk "B" (m - n) :@ (bulk "S" n :@ x) :@ y
| otherwise -> bulk "C" (n - m) :@ (bulk "B" (n - m) :@ bulk "S" m :@ x) :@ y
rawBulk = convertBulk bulk where
bulk c 1 = Com c
bulk c n = Com (c ++ show n)
```

## No bulk combinators

What if we disallow families of combinators?

If we allow sharing (memoization), then define the following consins of \(B, C, S\) (Smullyan might call them "once removed"):

The eta-optimized translation yields:

We have:

We can use these recurrences to generate each bulk combinator we need exactly once, again resulting in a linear-time translation:

```
breakBulkLinear "B" n = iterate (comB' :@) (Com "B") !! (n - 1)
breakBulkLinear "C" n = iterate (comC' :@) (Com "C") !! (n - 1)
breakBulkLinear "S" n = iterate (comS' :@) (Com "S") !! (n - 1)
comB' = Com "B":@ Com "B"
comC' = Com "B":@ (Com "B" :@ Com "C"):@ Com "B"
comS' = Com "B":@ (Com "B" :@ Com "S"):@ Com "B"
linBulk = convertBulk breakBulkLinear
```

Our interactive demo supports this translation, but shows the combinator term expanded in full, with no sharing.

## No sharing

What if we also diasllow sharing?

We generalize \(C'\) and \(S'\) from the paper:

One can check:

This suggests breaking bulk combinators into standard combinators in a manner akin to exponentiation by squaring. To compute \(S_{50}\) for example, define the following combinators:

In combinators: \(b_0 = SBI, b_1 = B(B(B(BS)B))(SBI)\).

Then chain together \(b_0\) or \(b_1\) according to the bits of 50 written in binary (110010), and apply to \(I\):

The \(B_n\) case is simpler because there’s no need for \(B'\).

```
bits n = r:if q == 0 then [] else bits q where (q, r) = divMod n 2
logBulk = convertBulk breakBulkLog
breakBulkLog c 1 = Com c
breakBulkLog "B" n = foldr (:@) (Com "B") $ map (bs!!) $ init $ bits n where
bs = [sbi, Com "B" :@ (Com "B" :@ Com "B") :@ sbi]
breakBulkLog c n = (foldr (:@) (prime c) $ map (bs!!) $ init $ bits n) :@ Com "I" where
bs = [sbi, Com "B" :@ (Com "B" :@ prime c) :@ sbi]
prime c = Com "B" :@ (Com "B" :@ Com c) :@ Com "B"
sbi = Com "S" :@ Com "B" :@ Com "I"
```

Thus we can break down bulk combinators into standard combinators without sharing if we pay a factor logarithmic in the input size.

As we’ve described it without any optimizations, the benefits are only noticeable for deeply nested lambda terms. The savings are more easily seen with the following demo, which breaks down a single bulk combinator.

Bulk combinator:

## Return of the Combinators?

Combinators were abandoned by compiler authors long ago: how do you optimize a combinator term? It seems we must simulate it in order to undertand it enough to tinker with it, but doesn’t this simulation convert it back to a lamda term?

Kiselyov suggests combinators may be worth considering again. Unlike existing approaches, his algorithm understands lambda terms, so it might be suitable for optimizations that are conventionally performed after generating combinations, thus avoiding simulations. At any rate, on real world examples, the eta-optimized variant seems to produce reasonable results.

Our toy compiler for various minimalist combinatory logic and lambda calculus languages features Kiselyov’s algorithm.

So does my award-winning Haskell compiler, which built the wasm binaries running the widgets on this page.

*blynn@cs.stanford.edu*💡