[(i .|. j, i .&. j) | i <- [0..n], j <- [0..n]]
Efficient Binomial Coefficients
P. Goetgheluck introduces a fast algorithm for computing binomial coefficients with the following curiosity.
For a natural number n
, consider the set:
\[ \{ (i \text{ OR } j, i \text{ AND } j) | i \in [0..n], j \in [0..n] \} \]
Or in Haskell:
What does this describe? Let’s plot it:
import Data.Bits import Data.Bool f n = [(i .|. j, i .&. j) | i <- [0..n], j <- [0..n]] plot n = unlines [[bool ' ' '*' $ (r, c) `elem` f n | c <- [0..n]] | r <- [0..n]] main = putStr $ plot (20 :: Int)
The result is immediately recognizable:
* ** * * **** * * ** ** * * * * ******** * * ** ** * * * * **** **** * * * * ** ** ** ** * * * * * * * * **************** * * ** ** * * * * **** **** * * * *
At least, I immediately recognized the output because as a kid I came across a BASIC programming book that mentioned generating art from Pascal’s triangle. The above precisely matches the picture created by marking all odd numbers in Pascal’s triangle, which we can confirm with the following:
choose n 0 = 1 choose n k | n < k = 0 | otherwise = choose (n - 1) (k - 1) + choose (n - 1) k plot' n = unlines [[bool ' ' '*' $ odd $ choose r c | c <- [0..n]] | r <- [0..n]]
[For memory, I once tried to plot one pixel per number. I was unaware of modular arithmetic at the time, so integer overflow quickly disfigured the output!]
This startling equivalence can be stated as follows: the number choose n k
is
even if and only if the subtraction n - k
in binary requires at least one
borrow. More generally, if we write
choose n k == 2^r * s
where s
is odd, it turns out r
equals the number of borrows in the
subtraction n - k
in binary.
Even more generally, for any prime p
, the power of p
in the factorization
of choose n k
is equal to the number of borrows in the subtraction n - k
in base p
.
We can exploit this fact to rapidly compute choose n k
, but let’s first prove
why it is true. Let f p n k
be the power of p
in choose n k
.
Firstly, the power of p
in \(n!\) is:
sum [n `div` p^i | i <- [1..]]
Thus, from the well-known formula:
\[ {n \choose k} = \frac{n!}{k!(n - k)!} \]
we have:
f p n k = sum [n `div` p^i - k `div` p^i - (n - k) `div` p^i | i <- [1..]]
By the way, if we want to use these expressions in real programs, we could tell
Haskell when enough is enough by replacing i <- [1..]
with
i <- takeWhile (\i -> p^i <= n) [1..]
.
Next we use:
(n `div` p) `div` p^i == n `div` p^(i+1)
to write an alternative definition of f
:
f p n 0 = 0 f p n k | cn >= ck = f p n' k' | bn /= bk = 1 + f p n' k' | bn /= 0 = 1 + f p (n' - 1) k' | otherwise = 1 + f p n' (k' + 1) where (n', cn) = n `divMod` p bn = n' `mod` p (k', ck) = k `divMod` p bk = k' `mod` p
The number of borrows in base p
satisfies the same recursive relations,
proving the result.
We can handle a few special cases to speed up the the function:
f p n 0 = 0 f p n k | k > n - k = f p n (n - k) | p > n - k = 1 | 2 * p > n = 0 -- In this case, there are at most 2 non-zero digits: | p * p > n = fromEnum $ cn < ck | cn >= ck = f p n' k' | bn /= bk = 1 + f p n' k' | bn /= 0 = 1 + f p (n' - 1) k' | otherwise = 1 + f p n' (k' + 1) where (n', cn) = n `divMod` p bn = n' `mod` p (k', ck) = k `divMod` p bk = k' `mod` p
In Goetgheluck’s article, we have a loop instead of a recursion, and the special cases are only checked before entering the loop. Hopefully this difference is negligible.
On my laptop, the following naive algorithm beats the above for small examples
like choose 1000 353
(which were large examples when the original article was
published!)
choose n 0 = 1 choose n k = n * choose (n - 1) (k - 1) `div` k
We need bigger inputs to enjoy the fruits of our labour. On my laptop, the
following program, which uses the
primes package, took under 2 seconds to compute choose 1000000 353000
, while
the naive algorithm took over 15 seconds.
import Data.Numbers.Primes f p n 0 = 0 f p n k | k > n - k = f p n (n - k) | p > n - k = 1 | 2 * p > n = 0 | p * p > n = fromEnum $ cn < ck | cn >= ck = f p n' k' | bn /= bk = 1 + f p n' k' | bn /= 0 = 1 + f p (n' - 1) k' | otherwise = 1 + f p n' (k' + 1) where (n', cn) = n `divMod` p bn = n' `mod` p (k', ck) = k `divMod` p bk = k' `mod` p choose n k = product $ map (\p -> p^f p n k) $ takeWhile (<= n) primes main = print $ choose 1000000 353000
The exact-combinatorics package contains a benchmarked, tested, and better-optimized implementation of this algorithm.