Over the centuries, mathematicians have developed notation so eloquent that it is partly responsible for their progress. To quote George Boole: “That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted.”

Unfortunately, elegance is often lost in translation to a computer program. A short formula expands into several nested loops. Simple conditions become bulky if-then-else blocks. A few symbols that represent an infinite object may have no direct analogue.

However, if the target language is Haskell, we can retain much of the original mathematical flavour. It’s one of the reasons why Haskell is better for rapid prototyping than the usual suspects, such as Python.

Jeremy Gibbons shows off incredible Haskell snippets that compute digits of Pi (and also quotes a cute C program for the first 15000 digits). Though not the main aim, the paper gives a good overview of Haskell.

Despite its terse appearance, Haskell is a strongly typed compiled language. And despite its origins in theoretical research, Haskell can power high-performance web servers.

Gray code

In Volume 4 of The Art of Computer Programming, Knuth mathematically introduces Gray code as follows: “if Γn stands for the Gray binary sequence of n-bit strings, we can define Γn recursively by the two rules Γ0 = ϵ; Γn+1 = 0Γn , 1ΓnR”.

This translates eerily well to Haskell:

g 0 = [""]
g n = (map ('0':)) (g (n-1)) ++ (map ('1':)) (reverse (g (n-1)))

Why is it so easy to write Haskell code to resemble a mathematical explanation?

At a lower level, Haskell is well-endowed with tools for sequences (except they’re called lists): ++ is concatenation, reverse reverses a list, map applies a function to each element of a list, : prepends an element to a list, and so on.

At a higher level, pattern matching lets us write out the special cases and the general cases just as a mathematician would.

As a bonus, type inference means we can skip the ceremony and go straight to defining the sequence. However, in practice, types should sometimes be declared explicitly for safety and clarity. For example, in the above it would prevent attempts to feed fractions to g.

As one might expect, type declarations also resemble their mathematical counterparts:

  g :: Int -> [String]

Nested parentheses

Consider enumerating nested parentheses (say 5 pairs) lexicographically, where ) is considered smaller than (. Knuth presents a supernatural algorithm before revealing how it works by describing its mathematical underpinnings. Let’s cut to the chase:

Let Apq be the sequence of all strings α that contain p left parentheses and qp right parentheses, where (q-pα is properly nested, listed in lexicographic order…Apq obeys the recursive rules

Apq = )Ap(q-1), (A(p-1)q, if 0 ≤ pq ≠ 0; A00 = ϵ;

also Apq is empty if p < 0 or p > q.

Again, in Haskell, the resemblance is uncanny:

a :: Int -> Int -> [String]
a 0 0 = [""]
a p q | p < 0 || p > q = []
      | otherwise = (map (')': ) (a p (q-1))) ++ (map ('(': ) (a (p-1) q))

(So for example a 5 5 lists all ways of nesting 5 pairs of parentheses in lexicographic order.)

By the way, “otherwise” is simply syntactic sugar for “True”; it’s one of many little niceties.

Heron’s formula

Another little nicety is the “where” clause:

heron a b c = sqrt(s * (s-a) * (s-b) * (s-c)) where s = (a+b+c)/2

Compare this with the phrasing of the Wikipedia entry on Heron’s formula.

Other languages force us to at least declare s before using it. Of course, if preferred, we can respect this more conventional order in Haskell:

heron a b c = let s = (a+b+c)/2 in sqrt(s * (s-a) * (s-b) * (s-c))

Taming infinity

Squares: Consider the sequence of positive squares in ascending order. Mathematically we might write ⟨x2 | x = 1, 2, …⟩. Thanks to list comprehension, Haskell code looks similar:

sq = [ x*x | x <- [1..]]

Although this is an infinite sequence, it does not require infinite memory. In fact, thanks to lazy evaluation, it requires almost no memory until an element of the sequence is required for some calculation, in which case Haskell computes the minimum possible to find it.

For example, we can sum the first 10 squares:

sum (take 10 sq)

Haskell computes only 10 elements of the sequence (before summing them to get 385).

Continued fractions: The continued fraction expansion of e is [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, … ].

In Haskell, we can write:

e = 2 : f 2 where f x = 1 : x : 1 : f (x+2)

Again, although this is an infinite sequence, it takes almost no memory until an element of the sequence is required. Type e !! 23 and Haskell recurses just enough to return the 23rd element of the sequence (16).

The recursive nature of continued fractions plays to Haskell’s strengths:

f 1 (x:xs) = fromIntegral x
f n (x:xs) = fromIntegral x + 1 / (f (n-1) xs)

Then f 32 e returns the 32nd convergent.

Fibonacci sequence:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

This famous one-liner is Haskell’s answer to a top-down dynamic programming Fibonacci number generator of other languages.

Serious power

In Power series, power serious, Doug McIlroy constructs a simple yet powerful system for manipulating power series by utilizing Haskell’s operator overloading, lazy evaluation, and first-class functions. Among the many jaw-dropping highlights is the mutually recursive:

sinx = integral cosx
cosx = 1 - (integral sinx)

which amazingly generates the power series of both trigonometric functions.

McIlroy also identifies lazy evaluation with stream programming and coroutines; this is particularly poignant as he is the principal inventor of Unix pipes.

That Haskell has his approval should alone encourage others to take a close look, because McIlroy’s prowess with computers is the stuff of legend!

See also Enumerating the strings of regular languages, another McIlroy Haskell adventure.

Intellectual snobbery

A rich seam of theory lies beneath the surface of Haskell. One can ignore it and still appreciate Haskell. However, better to let curiosity get the better of you and read about the underlying computer science.

For starters, it’s simply fun to talk the talk. Lambda calculus. Combinators. Point-free programming. Hindley-Milner type inference, or Algorithm W (though Wikipedia says it should really be Damas-Milner). Functors. Monads. Each term reeks of erudition.

More importantly, you will gain knowledge beyond that of supposed experts, if I’m any guide. Despite holding an advanced degree in computer science, most of the material was new to me. In my defence, my specialty was cryptography, far removed from icky real-world matters such as programming languages, but I feel embarrassed nonetheless. So learn the theory, either to outsmart academics, or to prevent others outsmarting you!


I once thought purity was pointless because truly pure languages are impractical. At some point a computer must communicate with a human or another computer, yet I/O is impure.

In fact, initially I thought the situation was worse: without variables or anything else resembling a tape cell of a Turing machine, how could a pure function possibly be as powerful?

This is where the aforementioned theory comes in handy. It turns out you can cleverly encode the integers as strange-looking pure functions, and feed these into other pure functions to perform operations on them. A clever trick involving a fixed-point operator, such as the Y combinator, allows recursion, and this leads to a proof that pure functions and Turning machines can compute the same functions.

Thus pure functions can be equally powerful, so apart from little real-world inconveniences like I/O, a pure language ought to be practical. Good notation for pure functions may be worth the hassle: lack of side effects allows deeper analysis and optimization.

Of course, those inconveniences must be supported, hence we require some impurity. However, like dangerous chemicals, improperly handled impurities can contaminate a functional language. Haskell is particularly good at cordoning off the impure parts of a program.


There are times we must resort to C or lower. For example, the program below, also described by Knuth, prints all ways of nesting 5 pairs of parentheses in lexicographic order.

#include <stdio.h>
int main() {
  int n = 5;
  char s[2*n+2], *end = s + 2*n-1, *m = end;
  *s = end[1] = ')', end[2] = 0;
  for(char *c = s+1; c <= end; *c++ = '(', *c++ = ')');
  for(;; ) {
    *m = ')';
    if (')' == *--m && (*m = '(')) continue;
    char *j = m;
    for(char *k = end; '(' == *j; *j-- = ')', *k = '(', k-=2);
    if (s == j) break;
    *j = '(', m = end;
  return 0;

Contrast this to the Haskell version. The C version modifies a buffer in place on the stack. There are no function calls, apart from the I/O. No boundary checks, no garbage collection, no memory allocation, no runtime system. It is hard to imagine how a Haskell compiler could generate code this fast. Indeed, it is a nontrivial task for a human! We’ll never escape books like Knuth’s, nor languages like C.

Lazy evaluation

Lazy evaluation is a double-edged sword. In theory, it beats strict evaluation because, roughly speaking, a computation that can halt will halt. It is why we can play with infinite sets, so long as we only compute on finite subsets. In contrast, strict evaluation cannot get started because it tries to instantiate an infinite set before operating on it. Laziness also meshes well with purity.

However, lazy evaluation complicates profiling and optimization. Consider computing the mean of some numbers. In low-level languages, we can easily do this in constant space in a single linear loop. In contrast, the most natural Haskell code for this calculation uses at least linear space and squanders most of the CPU on garbage collection because of lazy evaluation. There are ways to force strict evaluations, but such optimization makes Haskell uglier and requires expertise.

In his 2003 talk, Wearing the hair shirt: a retrospective on Haskell, Simon Peyton Jones asserts that laziness does not matter; rather, it is purity that is important. On the other hand, John Hughes argues that lazy evaluation is the right choice.

UNIX pipes are more-or-less equivalent to lazy evaluation, as are the channels found in the Go language which are essentially statically typed pipes. Compare Rob Pike’s Squinting at Power Series, a channel-based approach to power series, to McIlroy’s Haskell version.


Though they caution against using macros throughout The Practice of Programming, in section 9.6, Kernighan and Pike give examples of code that become clearer with textual substitution.

Similarly, the goto statement is widely frowned upon, yet Knuth shows how goto makes some code clearer or faster.

Sometimes, the problem is that the code is repetitive, but the language lacks expressive power to factor out the common part. We might lack a “if the condition fails then skip the rest of the if blocks” statement, or a “start a timer, execute the given statement 10 times, then stop the timer” construct.

For these situations, Haskell has monads. Like macros and goto, monads can eliminate repetitive code, except they can do so cleanly. Monads are to noninvasive surgery as macros are to butchery.

Aspect-oriented programming appears related, though I have not investigated this.

Ben Lynn blynn@cs.stanford.edu