Efficient Binomial Coefficients

P. Goetgheluck introduces a fast algorithm for computing binomial coefficients with the following curiosity.

For a natural number n, consider the set:

\[ \{ (i \text{ OR } j, i \text{ AND } j) | i \in [0..n], j \in [0..n] \} \]

Or in Haskell:

[(i .|. j, i .&. j) | i <- [0..n], j <- [0..n]]

What does this describe? Let’s plot it:

import Data.Bits
import Data.Bool

f n = [(i .|. j, i .&. j) | i <- [0..n], j <- [0..n]]
plot n = unlines [[bool ' ' '*' $ (r, c) `elem` f n |
  c <- [0..n]] | r <- [0..n]]

main = putStr $ plot (20 :: Int)

The result is immediately recognizable:

*
**
* *
****
*   *
**  **
* * * *
********
*       *
**      **
* *     * *
****    ****
*   *   *   *
**  **  **  **
* * * * * * * *
****************
*               *
**              **
* *             * *
****            ****
*   *           *   *

At least, I immediately recognized the output because as a kid I came across a BASIC programming book that mentioned generating art from Pascal’s triangle. The above precisely matches the picture created by marking all odd numbers in Pascal’s triangle, which we can confirm with the following:

choose n 0 = 1
choose n k
  | n < k     = 0
  | otherwise = choose (n - 1) (k - 1) + choose (n - 1) k

plot' n = unlines [[bool ' ' '*' $ odd $ choose r c |
   c <- [0..n]] | r <- [0..n]]

[For memory, I once tried to plot one pixel per number. I was unaware of modular arithmetic at the time, so integer overflow quickly disfigured the output!]

This startling equivalence can be stated as follows: the number choose n k is even if and only if the subtraction n - k in binary requires at least one borrow. More generally, if we write

choose n k == 2^r * s

where s is odd, it turns out r equals the number of borrows in the subtraction n - k in binary.

Even more generally, for any prime p, the power of p in the factorization of choose n k is equal to the number of borrows in the subtraction n - k in base p.

We can exploit this fact to rapidly compute choose n k, but let’s first prove why it is true. Let f p n k be the power of p in choose n k. Firstly, the power of p in \(n!\) is:

sum [n `div` p^i | i <- [1..]]

Thus, from the well-known formula:

\[ {n \choose k} = \frac{n!}{k!(n - k)!} \]

we have:

f p n k = sum [n `div` p^i - k `div` p^i - (n - k) `div` p^i | i <- [1..]]

By the way, if we want to use these expressions in real programs, we could tell Haskell when enough is enough by replacing i <- [1..] with i <- takeWhile (\i -> p^i <= n) [1..].

Next we use:

(n `div` p) `div` p^i == n `div` p^(i+1)

to write an alternative definition of f:

f p n 0 = 0
f p n k | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where
          (n', cn) = n  `divMod` p
          bn       = n' `mod` p
          (k', ck) = k  `divMod` p
          bk       = k' `mod` p

The number of borrows in base p satisfies the same recursive relations, proving the result.

We can handle a few special cases to speed up the the function:

f p n 0 = 0
f p n k | k > n - k = f p n (n - k)
        | p > n - k = 1
        | 2 * p > n = 0
        -- In this case, there are at most 2 non-zero digits:
        | p * p > n = fromEnum $ cn < ck
        | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where (n', cn) = n  `divMod` p
              bn       = n' `mod` p
              (k', ck) = k  `divMod` p
              bk       = k' `mod` p

In Goetgheluck’s article, we have a loop instead of a recursion, and the special cases are only checked before entering the loop. Hopefully this difference is negligible.

On my laptop, the following naive algorithm beats the above for small examples like choose 1000 353 (which were large examples when the original article was published!)

choose n 0 = 1
choose n k = n * choose (n - 1) (k - 1) `div` k

We need bigger inputs to enjoy the fruits of our labour. On my laptop, the following program, which uses the primes package, took under 2 seconds to compute choose 1000000 353000, while the naive algorithm took over 15 seconds.

import Data.Numbers.Primes

f p n 0 = 0
f p n k | k > n - k = f p n (n - k)
        | p > n - k = 1
        | 2 * p > n = 0
        | p * p > n = fromEnum $ cn < ck
        | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where (n', cn) = n  `divMod` p
              bn       = n' `mod` p
              (k', ck) = k  `divMod` p
              bk       = k' `mod` p

choose n k = product $ map (\p -> p^f p n k) $ takeWhile (<= n) primes

main = print $ choose 1000000 353000

The exact-combinatorics package contains a benchmarked, tested, and better-optimized implementation of this algorithm.

Ben Lynn blynn@cs.stanford.edu 💡

Efficient Binomial Coefficients

Contents