data Adt a b c = Foo a | Bar | Baz b c
The Scott encoding
Encoding data as functions may seem strange at first, but after playing with combinators for a while, the Scott encoding suggests itself.
Take an algebraic data type:
Suppose we come across a value
x of type
Adt. What can we do with it?
The only nontrivial course of action is to scrutinize it with a case statement:
case x of Foo a -> f a Bar -> g Baz b c -> h b c
What’s the quickest way to turn this expression into a closed lambda calculus term?
One idea that springs to mind is to delete the keywords
replace each data constructor in a pattern with
\. Then we get:
x (\a -> f a) (\ -> g) (\b c -> h b c)
Here, a lambda abstraction with no variables has the obvious meaning, namely,
\ → g is the same as
Then the Scott encoding of the value
x of type
Adt is whatever it takes to
make the above work like the original case expression.
For example, the value
Foo 42 must be the function:
\f _ _ -> f 42
Bar must be:
\_ g _ -> g
and the value
Baz "qux" 9000 must be:
\_ _ h -> h "qux" 9000
Foo a = \f _ _ -> f a Bar = \_ g _ -> g Baz a b = \_ _ h -> h a b
Booleans, Numbers, Lists
We would like to define booleans as:
data Bool = False | True
False appears before
True, a compiler seeing this would naturally
index them with the numbers 0 and 1 respectively, which matches common
practice. It matches Haskell, too, as can be seen in the functions:
toEnum minBound maxBound succ pred.
Unfortunately, for the time being, we must define:
data Bool = True | False
so our compiler produces the Scott encodings:
True = \x _ -> x False = \_ y -> y
This is because long before computers were commonplace, Alonzo Church devised this particular encoding of booleans, which subsequently became standard. We bow down to this convention to avoid confusion.
Peano numbers are defined by the following algebraic data type:
data Peano = Zero | Succ Peano
The Scott encoding is:
Zero = \f _ -> f Succ n = \_ g -> g n
Unlike Church numerals, the predecessor function (where we define the predecessor of zero to be zero) is easy to write down:
predecessor n = case n of Zero -> Zero Succ n -> n
Using the Scott encoding:
predecessor n = n Zero (\n -> n)
Pure lambda calculus has an undeserved reputation for sloth. Perhaps one of the misconceptions is arithmetic must be performed in unary. Not so! We may define numbers in binary instead of unary:
data Binary = End | Nil Binary | One Binary
But we may as well use a list of booleans instead. A list is defined by:
data [a] =  | a : [a]
yielding the Scott encodings:
 = \f _ -> f (:) a as = \_ g -> g a as
Here’s a starter pack of functions for numbers encoded in binary:
type Binary = [Bool] suc :: Binary -> Binary suc bs = case bs of  -> [True] False : bt -> True : bt True : bt -> False : suc bt addC :: Binary -> Binary -> Bool -> Binary addC as bs c = case as of  -> if c then suc bs else bs (a:at) -> case bs of  -> if c then suc as else as (b:bt) -> ((a /= b) /= c) : addC at bt (a && b || c && (a || b)) add :: Binary -> Binary -> Binary add as bs = addC as bs False strip :: Binary -> Binary strip bs = case bs of  ->  True : bt -> True : strip bt False : bt -> case strip bt of  ->  bt' -> False : bt' pre' bs = case bs of  ->  True : bs' -> False : bs' False : bs' -> True : pre' bs' pre :: Binary -> Binary pre bs = strip (pre' bs) decode' n acc bs = case bs of  -> acc False : bt -> decode' (2*n) acc bt True : bt -> decode' (2*n) (acc + n) bt decode :: Binary -> Int decode = decode' 1 0 encode :: Int -> Binary encode n = if n == 0 then  else case divMod n 2 of (n', 0) -> False : encode n' (n', 1) -> True : encode n'
We have no need for these since we’ll extend CL to support the native integer types of the underlying machine. But we stress again pure lambda calculus numerals are not condemned to be unary!