A Problem With I/O

“Functional programming languages have a … problem with I/O”, notes Rob Pike.

This was true years ago, when functional programming languages were fixated on pure functions. Useful programs must interact with the real world, for example, by printing a string, playing a sound, or reading network packets. How can we define pure functions for such tasks?

We cannot, by definition. Because of this, some languages became functional only in spirit; they encourage a certain kind of syntax, but like conventional languages, they assume side effects could occur almost anywhere.

Researchers since discovered a genuine solution: a simple yet powerful solution that accommodates impure functions without compromising the hard-won pure function core. Modern Haskell can fence off pure and impure functions from one another in a manner akin to the const keyword of C/C++.

Hello, World!

Consider a function call that prints a string:

puts("Hello, World!")

We know it doesn’t just take a string and return nothing. There are side effects, that is, something noticeable happens in the real world.

This suggests introducing a type that represents the real world. The puts function should take an input world and a string, and output a whole new world: a world where the given string has been printed. Similarly, a gets function that reads a string should take an input world, then output a string and a new world: a world where a string has been consumed from standard input.

In Go, it might go something like this:

func puts(s string, w world) world

func gets(w world) (string, world)

func main(w world) {
  w = puts("Enter a string:", w)
  var s string
  s, w = gets(w)
  w = puts("You typed: " + s, w)
}

We can teach the compiler to treat the world type carefully; in particular, to avoid optimizations that only work for pure functions. However, the syntax is clumsy. Must we clutter our source with world parameters?

World-changing Ideas

Before long, we notice any sensible function that outputs a world must also expect an input world, and vice versa. Moreover, there must be exactly one input world and one output world, the former representing the old world and the latter the new. To describe side effects is to describe the world before and after they transpire.

This inspires us to define a family of types to represent world-changing functions, and to redefine puts and gets as follows:

type IO       func(w world) (        world)
type IOString func(w world) (string, world)
type IOInt    func(w world) (   int, world)
...

func puts(s string) IO
func gets() IOString

We’ve added a layer of indirection. Before, the puts function took a world and a string then returned a changed world. Now, it takes a string and returns a world-changing function: side effects happen only when this returned function is executed:

func main(w world) {
  s, w1 := gets() (puts("Enter a string:") (w))
  puts("You typed: " + s) (w1)
}

Shut Out the World

Although we removed some world arguments, the Go code still looks odd. It’s better in Haskell. Firstly, thanks to parameterized types, we can just define IO (which is actually predefined by the Haskell Prelude), and all else will follow:

puts :: String -> IO ()
gets :: IO String

Secondly, Haskell’s syntax means we can pass the world parameter implicitly:

main :: IO ()
main = puts "Enter a string:" >> gets >>= (\s -> puts "You typed: " ++ s)

The presence of the IO type constructor means a function is impure, just as the absence of the const keyword in C/C++ means data might be modified.

Using Haskell’s do notation, we can mimic conventional languages:

main = do
  puts "Enter a string:"
  s <- gets
  puts ("You typed: " ++ s)

Roughly speaking, within a do block, the output of one line becomes the input of the next.

The above isn’t merely theoretical, by the way. We can turn our code into a working program by adding:

puts = putStrLn
gets = getLine

Haskell syntax is clean, yet powerful. We have mixed pure and impure code with ease, yet we can distinguish between the two at a glance.

One could argue IO functions are pure; it’s just that we inconveniently lack the power to create and destroy worlds, which forces the compiler to handle the unique instance of the world with great care. The real world is a global object, both literally and in computer science parlance.

Monads

We achieved simplicity making the world arguments implicit, and using do notation to replace nested function calls with a sequence of function calls.

We can generalize this technique to eliminate code duplication in other contexts. When we do, we call it a monad. I’ll refrain from explaining monads; to paraphrase a famous movie quote, some believe that unfortunately, no one can be told what a monad is; you have to write one for yourself.


Ben Lynn blynn@cs.stanford.edu