Warts and All

Even the relatively young field of computer science is haunted by entrenched traditions.

Take TAB characters in Makefiles. Why was this strange convention never changed? In the original author’s words: “I didn’t want to screw up my embedded base”.

Or consider the artificial divide between statements and expressions that still plagues many programming languages. This choice was natural for FORTRAN, partly because of the limitations of punch-cards, but in modern times, such segregation is counterproductive.

GHC takes the righteous path. The language atones for historical accidents, becoming more clear and logical over time, and although code will mostly be backwards compatible, breakages are possible. More accurately, it’s not the language itself but the core libraries that are changing.

AMP

The Functor-Applicative-Monad Proposal (AMP) was implemented in GHC version 7.10.

Roughly speaking, a parameterized type m is a functor if we can meaningfully define the function:

  (<$>) :: (a -> b) -> m a -> m b  -- Also known as: fmap, liftM, liftA.

In addition, m is an applicative if we can break down this function into two pieces:

  pure  :: a -> m a  -- Also known as: return.
  (<*>) :: m (a -> b) -> m a -> m b
  -- (<$>) = (<*>) . pure

In addition, m is a monad if we can work with functions with type a -> m b:

  (>>=) :: m a -> (a -> m b) -> m b

Despite being the middle child, applicatives were added to Haskell after monads. And because the family resemblance of functors, applicatives, and monads were once underappreciated, different names arose for the same functions.

Synonyms can be useful, but I find names like >> and *> equally illuminating. And chaining <*> beats functions like liftM5: why can’t the compiler figure out how many arguments should be lifted?

Moreover, much of the time, functors are applicatives, and applicatives are monads (exercise: think of counterexamples). If they’re usually the same, why not use the same notation?

Apart from sleeping more soundly knowing this wart has been removed, I mostly look forward to removing verbose Control.Applicative imports.

FTP

The Foldable/Traversable in Prelude proposal is sometimes called The Burning Bridges Proposal by its friends. It also found its way to GHC 7.10.

Haskell was designed to make lists easy to use. This is a good choice, as lists are simple to understand and implement, flexible, and efficient in certain circumstances.

However, we often need data structures such as Data.Sequence, or Data.Tree, which share some features with lists. Since Prelude and Data.List hogged all the good function names like "sum", "length", "foldl", "mapM", we’re forced to qualify our imports:

import qualified Data.Sequence as S
import qualified Data.Foldable as F

main = do
  print $ length [1..10]
  print $ S.length $ S.fromList [1..10]

  print $ sum [1..10]
  print $ F.sum $ S.fromList [1..10]

FTP fixes this by precisely nailing down the common properties of these data structures with type classes, and redefining the good function names so that they automatically work on all appropriate data structures:

import Data.Sequence (fromList)

main = do
  print $ length [1..10]
  print $ length $ fromList [1..10]

  print $ sum [1..10]
  print $ sum $ fromList [1..10]

One might say even C++11 beat Haskell to the punch in this regard, as the same range-based for loop (with auto for type inference) works on lists, vectors, sets, and so on.

At long last, GHC will end the tyranny of Data.List.

In addition, the OverloadedLists GHC extension liberates the square bracket notation:

{-# LANGUAGE OverloadedLists #-}
import Data.Set (Set)

s :: Set Char
s = ['0'..'9']

Workarounds

Sadly, even in Haskell, some wounds are likely too deep to heal. We can paper over the cracks, for a cost.

Working around a few of the problems is simply a matter of style, which naturally changes as one gains more Haskell experience. For example, nowdays I am less afflicted by Boolean blindness, because over time I grew more comfortable with pattern matching.

In contrast, fixes such as replacing the standard prelude requires at least two lines. If we cared enough about this stuff, we would have to add boilerplate to every program we write.

My code hardly leaves the prototype stage, so to avoid imports and language extensions, I tend to follow outdated practices. I use String to avoid imports and the OverloadedStrings extension. I use partial functions because their names are so succinct. I use a list of two elements instead of a tuple so I can the same function on both elements with map, which I find handier than importing Control.Arrow and calling join (***).

However, part of me detests these transgressions, especially when they cause crashes. I’m tempted to rig a preprocessor that inserts boilerplate for me so I can write in a more modern style without the clutter.

A Sinister Fold

One workaround is forced upon us even in prototype code due to performance issues.

The function foldl is lazy, which means:

foldl1 (+) [1..5]

expands to (((1 + 2) + 3) + 4) + 5 in memory before being evaluated. On longer lists, this is unbearably slow, and we risk stack overflow. The solution is to call the eager variant foldl1' instead:

import `Data.List`
foldl1' (+) [1..5]

So what’s the point of foldl in the first place? We might think a lazy foldl suits some problems because the lazy foldr can be quite dexterous:

> foldr1 (&&) $ repeat False  -- An infinite list.
False

However, by definition, foldl cannot short-circuit on infinite lists like foldr can: it must expand the input list entirely before getting started. We, too, are back where we started: why would we ever prefer the lazy foldl to its eager variant foldl'?

As far as I can tell, a lazy foldl is only applicable to cases such as:

let f x y = if y == 5 then 42 else x in foldl1 f [1..5]

In other words, even though the expression is expanded in memory, at least the evaluation can short-circuit on the first invocation of f: Haskell notices y == 5, so simply returns 42 and skips evaluating the rest of the expression. We can see this by unsafely inserting impure code:

> import Debug.Trace
> let f x y = if y == 5 then 42 else x
> foldl1 (\x y -> traceShow y $ f x y) [1..5]
5
42
> foldl1' (\x y -> traceShow y $ f x y) [1..5]
2
3
4
5
42

Even so:

  1. We may still be better off with eager evaluation. Needlessly evaluating a function may be cheaper than constructing a huge expression on the stack.

  2. In the above, we acheive a better short-circuit with foldr1 (flip f) [5,4..1]: not only do we stop evaluation after the first invocation of f, but we also stop expanding the expression.

We can guess why the default foldl is lazy. Its authors probably desired consistency: everything else (like foldr) is lazy, so surely foldl should be lazy too. However, I believe the foldl and foldl' functions should be swapped, that is, the left fold ought to be eager by default. Functions like maximum should also be eager by default.

This would spare Haskell novices from learning about the intricacies of lazy evaluation when all we want to do is write working code.

As for the few times a lazy left fold really is warranted: only Haskell experts would care, so they’re the ones who should be forced to import some module or other, and redecorate their code with extra punctuation.

If the two functions were swapped today, just about all existing Haskell programs would continue to work, except they’d be faster, and the rare breakage would be easy to fix.

For now, we have to put up with this wart of the Haskell Prelude. If we know a list is short, we can get away with foldl, maximum, and so on. Otherwise, we must import Data.List and append apostrophes to every foldl and foldl1, and also replace functions like maximum with foldl1' max.


Ben Lynn blynn@cs.stanford.edu