Matrix Combinators

Lambda term: \a b c d.d c b a

Kiselyov describes a linear-time algorithm to convert a lambda term to combinators, provided the input is written with De Bruijn indexes in unary, and provided we allow bulk combinators.

Can we do better with different bulk combinators? In particular, we would like to drop the unary representation proviso.

This is related to my thoughts on combinator golf, and again, a trivial answer exists. We define a complex combinator \(X\) which takes a representation of a lambda term and returns an equivalent combinator term. In other words, this single combinator implies an optimal algorithm for converting lambda terms to combinators!

We would like a more honest solution, though we are perhaps willing to tolerate more complexity than most.

Inspired by Kiselyov’s method of keeping track of the variables used on either side of an application ("directing whole environments rather than single variables"), we define bulk combinators that correspond to matrices of bits as follows.

Let \(X\) be a \(m \times n\) matrix of bits, and let \(b_{ij}\) be the bit in the \(i\)th row and \(j\) column.

An empty matrix represents the identity combinator \(I\). Otherwise, \(X\) represents the combinator:

\[\lambda f_1 ... f_m x_1 ... x_n . (f_1 (b_{11} ? x_1) ... (b_{1n} ? x_n)) ... (f_m (b_{m1} ? x_1) ... (b_{mn} ? x_n))\]

where \(b ? x\) means \(x\) if \(b = 1\), and is omitted if \(b = 0\). We mostly stick to the \(m\le 2\) cases.

Examples:

\([0] a b = a\): the \(K\) combinator.
\([00010101] a b c d e f g h i = a e g i\). We can immediately write down the matrix combinator that selects any given subset of arguments.
\([0, 1] a b c = a (b c)\): the \(B\) combinator.
\([1, 0] a b c = (a c) b\): the \(C\) combinator.
\([1, 1] a b c = (a c) (b c)\): the \(S\) combinator.
\([0, 0] a b c = a b\): the \(BK\) combinator.
\(([11,00]I) a b c = (I b c) a = b c a\): the \(R\) combinator.

These matrix combinators suggest a K-optimized bracket abstraction algorithm in the style of Kiselyov, but with more efficient projection: we combine consecutive lambdas as if we were building a Böhm node, and use a one-row matrix (or vector) to describe a combinator that selects out exactly the arguments that are needed. If all arguments are needed, then we skip this selector vector.

data Peano = S Peano | Z
data DB = N Peano | L DB | A DB DB | Free String

data MatrixCom = Com [[Bool]] | ComFree String | MatrixCom :@ MatrixCom
instance Show MatrixCom where
  showsPrec p = \case
    Com xs -> case xs of
      [] -> str "I"
      _ -> ('[':) . foldr (.) (']':) (intersperse (',':) $ foldr (.) id . map (shows . fromEnum) <$> xs)
    ComFree s -> str s
    t :@ u -> showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u
    where str s = (if p > 0 then (' ':) else id) . (s++)

matrix = \case
  N Z -> ([True], Com [])
  N (S e) -> first (False:) $ matrix $ N e
  L e -> sledL 1 e
  A e1 e2 -> matrix e1 ## matrix e2
  Free s -> ([], ComFree s)
  where
  sledL n = \case
    L e -> sledL (n + 1) e
    e -> let
      (g, d) = matrix e
      present = reverse $ take n (g ++ repeat False)
      in (if and present then id else (([], Com [present]) ##)) (drop n g, d)

  ([], d1) ## ([], d2) = ([], d1 :@ d2)
  (g1, d1) ## (g2, d2) = (g, Com [p1, p2] :@ d1 :@ d2)
    where
    zs = zipWithDefault False (,) g1 g2
    g = uncurry (||) <$> zs
    (p1, p2) = unzip $ reverse $ filter (uncurry (||)) zs

zipWithDefault d f     []     ys = f d <$> ys
zipWithDefault d f     xs     [] = flip f d <$> xs
zipWithDefault d f (x:xt) (y:yt) = f x y : zipWithDefault d f xt yt

Instead of handling one application at a time, we could handle an entire Böhm node in one step, generating a matrix with \(m\) rows for a Böhm node with \(m\) children. This may result in savings. For example, the term \(\lambda a b c d . a (b d) (c d)\) could be written \([0, 1, 1]\) rather than \([1101,0011] [0,1]\).

A few more lines give us eta-optimization. We generalize \(BIx \rightarrow x\) by looking for \([0 … 0] I x\). A simple helper etaRight handles the \(BxI \rightarrow x\) case.

We also apply the optimizations:

\([1x] I \rightarrow [x]\)
\([10x, 01y] I I \rightarrow [x, y]\)
\([1 … 10, 0 … 01] I I \rightarrow I\)

matrixOpt = \case
  N Z -> ([True], Com [])
  N (S e) -> first (False:) $ matrixOpt $ N e
  L e -> sledL 1 e
  A e1 e2 -> matrixOpt e1 ## matrixOpt e2
  Free s -> ([], ComFree s)
  where
  sledL n = \case
    L e -> sledL (n + 1) e
    e -> let
      (g, d) = matrixOpt e
      present = reverse $ take n (g ++ repeat False)
      in (if and present then id else (([], Com [present]) ##)) (drop n g, d)

([], Com [True:rest]) ## ([], Com []) = ([], Com [rest])
([], d1) ## ([], d2) = ([], d1 :@ d2)
(g1, d1) ## (g2, d2)
  | Com [] <- d1, Com [] <- d2 = \cases
    | not $ or $ last p1 : init p2 -> go $ Com []
    | True:False:t1 <- p1, False:True:t2 <- p2 -> go $ Com [t1, t2]
    | otherwise -> common
  | Com [] <- d1, not $ or p1 = go d2
  | otherwise = common
  where
  common = go $ etaRight $ Com [p1, p2] :@ d1 :@ d2
  zs = zipWithDefault False (,) g1 g2
  go = (uncurry (||) <$> zs,)
  (p1, p2) = unzip $ reverse $ filter (uncurry (||)) zs
  etaRight (Com [False:t1, True:t2] :@ d :@ Com []) = case t1 of
    [] -> d
    _ -> Com [t1, t2] :@ d
  etaRight d = d

For the demo at the top of the page, we write code to parse lambda terms along with a function that converts them to De Bruijn form:

jsEval "curl_module('../compiler/Charser.ob')"

import Charser
lcTerm = lam <|> app where
  lam = flip (foldr Lam) <$> (lam0 *> some var <* lam1) <*> lcTerm
  lam0 = str "\\" <|> str "λ"
  lam1 = str "->" <|> str "."
  app = foldl1 App <$> some (Var <$> var <|> str "(" *> lcTerm <* str ")")
  var = some alphaNumChar <* space
  str = (<* space) . string

parseLC = parse (space *> lcTerm <* eof) ""

data Term = Var String | App Term Term | Lam String Term
instance Show Term where
  show (Lam s t)  = "λ" ++ s ++ showB t where
    showB (Lam x y) = " " ++ x ++ showB y
    showB expr      = "." ++ show expr
  show (Var s)    = s
  show (App x y)  = showL x ++ showR y where
    showL (Lam _ _) = "(" ++ show x ++ ")"
    showL _         = show x
    showR (Var s)   = ' ':s
    showR _         = "(" ++ show y ++ ")"

index x xs = lookup x $ zip xs $ iterate S Z

deBruijn = go [] where
  go binds = \case
    Var x -> maybe (Free x) N $ index x binds
    Lam x t -> L $ go (x:binds) t
    App t u -> A (go binds t) (go binds u)

demo f = do
  s <- jsEval "input.value;"
  jsEval $ "output.value = `"
    ++ either id (show . snd . f . deBruijn) (parseLC s)
    ++ "`;"

jsEval [r|
  matrix.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "demo matrix"); });
  matrixOpt.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "demo matrixOpt"); });
|]

Complexity

A lambda term with \(n\) distinct variables requires \(O(n\log n)\) bits to represent. Thus there are some terms that are more efficient with our scheme: in particular, any one- or two-row matrix combinator taking \(n\) arguments only needs \(O(n)\) bits.

My guess is the worst case is the classic \(\lambda x_1 … x_n.x_n … x_1\), which only requires \(O(n\log n)\) bits as a lambda term, but \(O(n^2)\) bits with matrix combinators.

When variables are written in unary, our algorithm is linear, and compares favourably with Kiselyov’s algorithm because we’ve chosen more complex bulk combinators.

Two Rows

It could be worth restricting our attention to \(2 \times n\) matrices. The empty \(2 \times 0\) matrix is the identity combinator. Any one-row matrix \([r]\) is equivalent to \([0 … 0,r] I\).

We could then design a virtual machine that only needs to handle two-row matrix combinators. It ought to be easy to implement as there is only kind of combinator. At the same time, we only have one (bulk) combinator per application or lambda streak.

We can impose a maximum number of columns per combinator by observing:

\[[r] = [0 ... 0,r_1] ([0 ... 0,r_2] I)\]

if \(r = r_1 \diamond r_2\), where \((\diamond)\) denotes concatenation; and:

\[[x,y] = [0,1] [x_1,y_1] ([0 ... 0,1 ... 1] [x_2,y_2])\]

where \(x = x_1 \diamond x_2, y = y_1 \diamond y_2\) and where each \(x_i, y_i\) have the same length.

Ben Lynn blynn@cs.stanford.edu 💡

Matrix Combinators

Complexity

Two Rows

Contents