The land of the point-free

What is an injective function?

The first time around, I was taught a function \(f : X \to Y\) is injective or one-to-one if for all \(a, b \in X\):

\[ f(a) = f(b) \implies a = b \]

Can we rewrite this definition without mentioning set elements? This might sound hopeless: how can we speak of injectivity if we’re forbidden to name individual elements?

Here’s a hint: consider function composition instead of function application, and consider pairs of functions instead of pairs of elements. From this, we might try defining a function \(f : X \to Y\) to be injective if all for functions \(g, h:1 \to X\): \[ f \cdot g = f \cdot h \implies g = h \] where \(1\) denotes a singleton set. The idea is to select elements of \(X\) with \(g\) and \(h\). We cannot name \(a\) and \(b\) directly, but we sure can name functions that map to them!

However, we’re cheating: to talk of singleton sets is to count elements of sets. We’ll soon find a way around this, but for now, we find we can adjust the definition to avoid mentioning singletons: \(f : X \to Y\) is injective if for all sets \(Z\) and all functions \(g, h:Z \to X\):

\[ f \cdot g = f \cdot h \implies g = h \]

This high-level element-free approach is fun to visualize with arrows and circles:

ZhgXfY

In contrast, the diagram would be less pleasant if we had to depict elements of sets as well. We’d probably have to enlarge the circles and add big dots to represent individual elements.

Less is more

Programmers of all stripes appreciate abstracting away details. In fact, even poets advocate this: Wang Zhihuan famously observed that to see a thousand miles, one must ascend one more level.

When code avoids directly referring to the input, we call it point-free style or tacit programming, and everything is a combinator.

To recap, the high-level version replaces function application and set elements with just function composition. Besides prettier diagrams, this beats the low-level version in at least 3 ways:

  • Suppose we wish to prove a theorem. Since the high-level view involves fewer concepts, proofs are easier.

  • We instantly gain a dual by reversing the arrows, that is, we swap the sources and destinations of functions, and flip compositions.

  • We can generalize to any setting where composition of function-like things makes sense. That is, there need not be any notion of set elements or function application. All we need is composition.

We illustrate the first point by showing the composition of two injective functions is injective. Suppose \(f, f'\) are injective. Then given \(f \cdot f' \cdot g = f \cdot f' \cdot h\), we wish to show \(g = h\). The only obvious first step is to use our definition and the injectivity of \(f\) to obtain \(f' \cdot g = f' \cdot h\). Then doing the same for the injectivity of \(f'\) yields \(g = h\). Done!

We see left cancellation is the fundamental idea behind the proof. We might fail to see the forest for the trees if our proof follows the fates of particular elements.

As for the second point, the dual of an injective function is a function \(f : Y \to X\) such that for all \(g, h: X \to Z\):

\[ g \cdot f = h \cdot f \implies g = h \]

This characterizes surjective or onto functions. Furthermore, mechanically taking the dual of our proof shows the composition of surjections is surjective. Could you get all this by tweaking the conventional low-level definition of injections?

ZhgXfY

The third point is surprisingly vital for computer science. Although we might think we are writing mathematical functions that map elements of one set to another, this analogy breaks down. Code is data and data is code, but the number of programs is countable, while the number of functions from a countable set to a countable set is uncountable. This discrepancy is essentially why the halting problem is undecidable. (Even if we stick to finite sets, the size mismatch remains because the number of functions is exponentially larger.)

To correctly model programs with mathematics we need function-like things without this runaway size problem. The construction is tricky, so instead we supply a much simpler example of a function-like thing.

Instead of sets, let \(X, Y, Z\) be elements of a poset, and let \(f:X \to Y\) indicate \(X \le Y\). In this setting, there is at most one "function" between any two "sets", and composition is well-behaved because the poset relation is transitive. Nothing corresponds to set elements or function application, yet our definitions still make sense.

Category Theory

To avoid scare quotes, we say arrow for a generalized function, and object for a generalized set. We also say monic and epic arrows for generalized injections and surjections. A rich field known as category theory develops from merely stipulating that:

  1. An identity arrow exists for each object:

    XidXfYidY

    \[ f \cdot id_X = f = id_Y \cdot f \]

  2. Composition of arrows is associative:

    WfXgYhZ

    \[ h \cdot (g \cdot f) = (h \cdot g) \cdot f \]

The first axiom often leads to reflection laws. For example, identity arrows are monic and epic.

The second axiom often leads to fusion laws. For example, the composition of two monic arrows is monic, and similarly for epic arrows.

[I’m not sure what the official definitions of reflection and fusion are, but this seems to fit what I’ve observed.]

Something from nothing

The empty set is the set with no elements. Can we also rewrite this definition without mentioning elements?

Yes! The empty set is the set \(X\) such that for every set \(Y\), there exists a unique function \(f:X \to Y\) (the empty function).

XfY

Sometimes this is written \(\exists ! f\). Instead, we indicate existence and uniqueness by drawing an arrow shaft with a dashed line.

How about the dual? That is, what is the set \(X\) where for each set \(Y\) there exists a unique function \(f:Y \to X\)? A little thought shows \(X\) must be a singleton set.

XfY

These definitions of the empty set and singleton sets generalize to what are called the initial object and the terminal object in category theory. That is, so long as we have objects and arrows, we can define objects akin to the empty set and singleton sets.

In a poset, the initial object, if it exists, is the minimum, and the terminal object, if it exists, is the maximum.

The axioms of category theory quickly lead to some laws. The arrow from the initial object to itself is the identity arrow. Composing an arrow from the initial object with another arrow gives a unique arrow from the initial object to the new destination. We have similar results for terminal objects.

As an aside, now that we can define singleton sets without mentioning elements, we could fix our first attempt at defining injectivity. We could require \(Z\) to be a terminal object. However, in the dual, \(Z\) becomes an initial object, and we find the only valid choice for \(f\) is the identity function on the empty set. Thus this version is over-engineered: by adding an unnecessary condition, we have made the dual boring.

Products

For our next trick, consider the Cartesian product of two sets \(A, B\), which is typically defined to be the set

\[ A \times B = \{ (a, b) | a \in A, b \in B \} \]

How can we rewrite this definition without mentioning set elements?

We start by thinking about the API. The projection functions are how programmers interact with the product without knowing its implementation details. In C++, the first and second members of std::pair hide the dirty work done by the compiler. In Haskell, the fst and snd functions play this role. Theoreticians prefer the names outl and outr.

Next, for inspiration, consider the product type (Int, String) in Haskell. For any type c, given functions f :: c → Int and g :: c → String, we can combine them into a function h :: c → (Int, String):

h :: (c -> Int) -> (c -> String) -> (c -> (Int, String))
h f g = \c -> (f c, g c)

-- Fancier definition using the Arrow monad: h = (&&&)

prop_product = h length reverse "foo" == (3, "oof")

Furthermore, we can recover the original functions with fst and snd:

f' = fst . h
g' = snd . h

This guides our approach. If \(X\) denotes the product of \(A\) and \(B\) then roughly speaking, two functions \(f:C \to A\) and \(g:C \to B\) should be equivalent to a single function \(h:C \to X\) because we can combine them. Also, this function \(h\) should be unique, because \(X\) should contain no superfluous information: \(X\) should contain all combinations of the elements of \(A\) and \(B\), and nothing more.

More formally, the product of sets \(A\) and \(B\) is a set \(X\) and two functions \(outl : X \to A\) and \(outr : X \to B\) such that for any functions \(f:C \to A\) and \(g:C \to B\), there exists a unique \(h : C \to X\) such that \(outl \cdot h = f\) and \(outr \cdot h = g\).

AoutlfXChoutrgB

We write \(\langle f,g \rangle\) for the function \(h\) above. We have the reflection law: \[ \langle outl, outr \rangle = id \] and the fusion law: \[ \langle f, g \rangle \cdot m = \langle f \cdot m, g \cdot m \rangle \] Their proofs are left as an exercise. Since we’re working at a high level, it’s a matter of applying a few definitions in the right order.

The set \(X\) is not unique, but we justify calling it the product because the definition implies there is a unique isomorphism between any two products, that is, the product is unique up to unique isomorphism.

Our original definition yields a unique product set, but why single out one particular implementation? A compiler can choose how best to lay out a pair of elements in memory. For efficiency, it may use different schemes for different cases, and convert between them automatically. At a high level, they’re all the same.

Co-products

We flip the arrows to reveal our bonus dual idea. The definition now reads: the co-product of sets \(A\) and \(B\) is a set \(X\) and two functions \(inl : A \to X\) and \(inr : B \to X\) such that for any functions \(f:A \to C\) and \(g:B \to C\), there exists a unique \(h : X \to C\) such that \(h \cdot inl = f\) and \(h \cdot inr = g\).

AinlfXChinrgB

We write \([f,g]\) for the function \(h\) above. We have the reflection law: \[ [inl, inr] = id \] and the fusion law: \[ m \cdot [f, g] = [m \cdot f, m \cdot g] \] which again are exercises in applying definitions. Alternatively, we can take the dual of the proofs of the corresponding laws for products.

A moment’s thought shows \(X\) is the direct sum of \(A\) and \(B\), which we write \(A + B\). This is clear if we map \(inl\) and \(inr\) to Left and Right, and \(X\) to Either A B in Haskell:

import Data.Char

h :: (a -> c) -> (b -> c) -> Either a b -> c
h f _ (Left a)  = f a
h _ g (Right b) = g b

-- In fact, the Prelude already defines this function: h = either

prop_sumL = h chr head (Left 65)     == 'A'
prop_sumR = h chr head (Right "foo") == 'f'

So from a high-level standpoint, it would be odd to talk about product types without mentioning sum types. They’re two sides of the same coin.

For posets, the product and co-product correspond to the meet and join, which we naturally think of as duals.

It seems we have stumbled across a practical use of category theory already: to argue in favour of sum types when designing programming languages. We might worry that sum types are too complex in practice, but this is not the case. Despite being relatively close to assembly language, C manages to support both products (struct) and sums (union), albeit unsafely. Also, ML and Haskell prove it is possible to design usable syntax for sum types.

Indeed, we experience friction when a language supports product types, but not sum types. For example, the Go language has excellent notation for product types (at least, from a C programmer’s point of view) but conspicuously lacks sum types. Programmers clumsily compensate by using product types in lieu of their dual. A function that returns an error or an integer instead returns an error and an integer, with the unenforceable understanding that only one of them should be taken seriously.

Actually, it’s even worse. Go perpetuates what Tony Hoare calls his billion-dollar mistake, which means that certain types are secretly sum types where we have no control over one of the summands.

Let me draw you a diagram

To draw the above diagrams, I wrote a small library inspired by the diagrams package and ran the following to produce a shell script, which in turn produces a bunch of SVG files.

lbl = translate (-0.4 :+ -0.25) . text
object s = lbl s <> (unitCircle |> named s)

catId = foldr1 (|||)
  [ object "X"
    <> (lbl "id" <> lbl "X" |> scale 0.5 |> translate (0.5 :+ -0.2)) |> translateY 1.8
  , translateY 0.5 (lbl "f") <> strutX 4
  , object "Y"
    <> ((lbl "id" <> (lbl "Y" |> scale 0.5 |> translate (0.5 :+ -0.2))) |> translateY 1.8)
  ]
  |> straightArrow "X" "Y"
  |> curvedArrow (-5/8 * tau) "X" "X" (5/16 * tau) (3/16 * tau)
  |> curvedArrow (-5/8 * tau) "Y" "Y" (5/16 * tau) (3/16 * tau)

zxy = hcat
  [ z
  , translateY (-2) (lbl "h") <> translateY 2 (lbl "g") <> strutX 4
  , x
  , translateY 0.5 (lbl "f") <> strutX 4
  , y
  ] where [x, y, z] = object <$> ["X", "Y", "Z"]

injection = zxy
  |> straightArrow "X" "Y"
  |> curvedArrow (-tau/6) "Z" "X" (tau/8) (tau*3/8)
  |> curvedArrow (tau/6) "Z" "X" (-tau/8) (-tau*3/8)

surjection = zxy
  |> straightArrow "Y" "X"
  |> curvedArrow (tau/6) "X" "Z" (tau*3/8) (tau/8)
  |> curvedArrow (-tau/6) "X" "Z" (-3*tau/8) (-tau/8)

xy = hcat
  [ x
  , translateY 0.5 (lbl "f") <> strutX 4
  , y
  ] where [x, y] = object <$> ["X", "Y"]

initial = xy |> existsArrow "X" "Y"

terminal = xy |> existsArrow "Y" "X"

catCompose = hcat
  [ object "W"
  , translateY 0.5 (lbl "f") <> strutX 4
  , object "X"
  , translateY 0.5 (lbl "g") <> strutX 4
  , object "Y"
  , translateY 0.5 (lbl "h") <> strutX 4
  , object "Z"
  ]
  |> straightArrow "W" "X"
  |> straightArrow "X" "Y"
  |> straightArrow "Y" "Z"

axbc l r = hcat
  [ a
  , translateY 0.5 (lbl l) <> translateY (-3) (lbl "f") <> strutX 4
  , x <> translateY (-4) c <> lbl "h" |> translateY (-2) |> translateX 0.5
  , translateY 0.5 (lbl r) <> translateY (-3) (lbl "g") <> strutX 4
  , b
  ] where [a, x, b, c] = object <$> ["A", "X", "B", "C"]

catProduct = axbc "outl" "outr"
  |> straightArrow "X" "A"
  |> straightArrow "X" "B"
  |> straightArrow "C" "A"
  |> straightArrow "C" "B"
  |> existsArrow "C" "X"

catSum = axbc "inl" "inr"
  |> straightArrow "A" "X"
  |> straightArrow "B" "X"
  |> straightArrow "A" "C"
  |> straightArrow "B" "C"
  |> existsArrow "X" "C"

gen file dia = do
  putStrLn $ "cat > " ++ file ++ ".svg << EOF"
  putStrLn $ svg 20 dia
  putStrLn "EOF"

main = do
  gen "id"         catId
  gen "compose"    catCompose
  gen "injection"  injection
  gen "surjection" surjection
  gen "initial"    initial
  gen "terminal"   terminal
  gen "product"    catProduct
  gen "coproduct"  catSum

Ben Lynn blynn@cs.stanford.edu 💡