Hempel's Joke

Stop me if you’ve heard this one: 2 + 2 = 5 for sufficiently large values of 2. This is obviously a joke (though sometimes told so convincingly that the audience is unsure).

Hempel’s "Paradox" is a similar but less obvious joke that proceeds as follows. Consider the hypothesis: all ravens are black. This is logically equivalent to saying all non-black things are non-ravens. Therefore seeing a white shoe is evidence supporting the hypothesis.

The following Go program makes the attempted humour abundantly clear:

package main

import "fmt"

func main() {
  state := true
  for {
    var colour, thing string
    if _, e := fmt.Scan(&colour, &thing); e != nil {
      break
    }
    if thing == "raven" && colour != "black" {
      state = false
    }
    fmt.Println("  hypothesis:", state)
  }
}

A sample run:

black raven
  hypothesis: true
white shoe
  hypothesis: true
red raven
  hypothesis: false
black raven
  hypothesis: false
white shoe
  hypothesis: false

The state of the hypothesis is represented by a boolean variable. Initially the boolean is true, and it remains true until we encounter a non-black raven. This is the only way to change the state of the program: neither "black raven" nor "white shoe" has any effect.

Saying we have "evidence supporting the hypothesis" is saying there are truer values of true. It’s like saying there are larger values of 2.

The original joke exploits the mathematical concept ``sufficiently large'' which has applications, but is absurd when applied to constants.

Similarly, Hempel’s joke exploits the concept "supporting evidence", which has applications, but is absurd when applied to a lone hypothesis.

Off by one

If we want to talk about evidence supporting or undermining a hypothesis, we’ll need to advance beyond boolean logic. Conventionally we represent degrees of belief with numbers between 0 and 1. The higher the number, the stronger the belief. We call these probabilities.

Next, we propose some mutually exclusive hypotheses and assign probabilities between 0 and 1 to each one. The sum of the probabilities must be 1.

If we take a single proposition by itself, such as "all ravens are black", then we’re forced to give it a probability of 1. We’re reduced to the situation above, where the only interesting thing that can happen is that we see a non-black raven and we realize we must restart with a different hypothesis. (Look up "non-monotonic logic" for more on the mathematics of revising beliefs.)

We need at least two propositions with nonzero probabilties for the phrase "supporting evidence" to make sense. For example, we might have two propositions A and B, with probabilities of 0.2 and 0.8 respectively. If we find evidence supporting A, then its probability increases and the probability of B decreases accordingly, for their sum must always be 1. Naturally, as before, we may encounter evidence that implies all our propositions are wrong, in which case we must restart with a fresh set of hypotheses.

For example, we may take A: "all ravens are black", and B: "there exists a non-black raven", and assign each a nonzero probability. Now it makes sense to ask if a white shoe is supporting evidence. Does it support A at B’s expense? Or B at A’s expense? Or neither?

The propositions as stated are too vague to answer one way or another. We can make the propositions more specific, but there are infinitely many ways to do so, and the choices we make change the answer. See Chapter 5 of Jaynes.

A Card Trick

Instead of trying to flesh out hypotheses involving ravens, let us content ourselves with a simpler scenario. Suppose a manufacturer of playing cards has a faulty process that sometimes uses black ink instead of red ink to print the entire suit of hearts. We estimate one in ten packs of cards have black hearts instead of red hearts and is otherwise normal, while the other nine decks are perfectly fine.

We’re given a pack of cards from this manufacturer. Thus we believe the hypothesis A: "all hearts are red" with probability 0.9, and B: "there exists a non-red heart" with probability 0.1. We draw a card. It’s the four of clubs. What does this do to our beliefs?

Nothing. Neither hypothesis is affected by this irrelevant evidence. I believe this is at least intuitively clear to most people, and furthermore, had Hempel spoke of hearts and clubs instead of ravens and shoes, his joke would have been more obvious.

Hempel’s joke reminds us we must consider more than one hypothesis if we want to talk about supporting evidence. Assigning degrees of belief to a lone proposition is like awarding points in a competition with only one contestant.

This all seems obvious, but apparently it is not obvious enough. My probability and statistics textbook instructs us to consider only one hypothesis. Actually, it’s worse: at one point, it instructs us to devise an alternate hypothesis, but this second hypothesis is never mentioned again!


Ben Lynn blynn@cs.stanford.edu 💡