-- Based on https://sametwice.com/4_line_mandelbrot. prec :: Int prec = 16384 infixl 7 # x # y = x * y `div` prec sqAdd (x, y) (a, b) = (a#a - b#b + x, 2*(a#b) + y) norm (x, y) = x#x + y#y douady p = null . dropWhile (\z -> norm z < 4*prec) . take 30 $ iterate (sqAdd p) (0, 0) orig = [fromEnum $ douady (616*x - 2*prec, 1502*y - 18022) | y <- [0..23], x <- [0..71]] chunksOf i ls = map (take i) (go ls) where go [] = [] go l = l : go (drop i l) putPic = putStr . unlines . map (concatMap show) . chunksOf 72 putPic orig
Perfect Codes
An interactive tour of Chapter 1 of David Mackay, Information Theory.
A picture is worth 1728 bits
For this demonstration, we need a list of bits. Following Mackay, we choose one that represents an image. We pick a 72x24 image of what should perhaps really be called the Hubbard or Douady set:
Make some noise
Imagine the above image was taken by a deep space satellite and sent back to Earth. The signal suffers greatly from interference due to the vast distance, which we simulate with the help of a PCG pseudo-random number generator:
data PCG = PCG Word64 Word64 pcg a b = PCG (a + b') b' where b' = 2*b + 1 next :: PCG -> (Word, PCG) next (PCG x inc) = (r, PCG x' inc) where x' = 6364136223846793005*x + inc r = (fromIntegral (x `xor` (x `shiftR` 18) `shiftR` 27) :: Word) `rotateR` (fromIntegral $ x `shiftR` 59) fromPCG p = map (fromIntegral . fst) $ tail $ iterate (next . snd) (undefined, p) rnds = tail $ fromPCG $ pcg 42 54 take 10 rnds
We use the list of generated numbers to disrupt communication. Our model is a
binary symmetric channel: the following noise
function generates a list of
bits, each of which is one with probability f
. Zero means we leave the
corresponding bit in the signal alone, otherwise we flip it.
noise f = map (\x -> fromEnum $ fromIntegral (x `mod` 32768) <= f * 32768) take 20 $ noise 0.1 rnds
Thus if about one-tenth of the bits are flipped, we might see:
(#+) = zipWith $ (fromEnum .) . (/=) send msg f = msg #+ noise f rnds putPic $ send orig 0.1
Replace 0.1 with 0.4 and run the code again (press Ctrl + Enter or click the
play button), and the image is almost unrecognizable. Increasing f
to 0.5
produces pure noise, as any received bit has the same chance of representing 0
or 1. As f
increases further and approaches 1, the image becomes clearer,
except we now see the dual image where every bit is flipped.
This last observation reminds me of those logic puzzles where one person always lies. A reliably unreliable source is as useful as a truthteller: just pretend they have a habit of adding "…NOT!" at the end of every utterance.
Majority rules
How can we deal with noise? What’s the first solution that comes to mind?
Restaurants solve this problem by repeating your order to confirm. However, this brings to mind Segal’s Law: if one copy says 0, and the other says 1, which is correct? In a restaurant, the customer can immediately reply to disambiguate, but in general, round trip communication can be expensive or impossible. For us, error detection is insufficient: we want error correction.
The easiest solution is to repeat each bit 3 times, and if there’s disagreement, we believe the majority, a trick popularized by the science fiction story The Minority Report.
This is a repetition code that we denote by \(R_3\). Repetition codes are as simple as you’d expect: they generalize to any length; they correct errors in a word so long as under half the bits are corrupt; if exactly half the bits of a codeword are flipped then we can detect but not correct the error.
rep n = (replicate n =<<) unrep n = map majority . chunksOf n where majority = fromEnum . (> div n 2) . sum rep 3 [1,0,1,1] unrep 3 $ rep 3 [1,0,1,1]
Every binary repetition code of length \(2e + 1\) has two codewords: all 0s and all 1s. Every word differs from one of the codewords by at most \(e\) bits.
The following simulates using \(R_3\) to send our image across a binary symmetric channel that flips bits with probability 0.1:
sendR n xs f = unrep n $ send (rep n xs) f putPic $ sendR 3 orig 0.1
Off By One
Let’s look at a different kind of code, one known as the the (7, 4) Hamming code.
The first number is the output size in bits, while the second is the input size in bits, thus in general, an \((n, k)\) code represents a \(k\)-bit word with an \(n\)-bit word. The rate of a code is the output size over the input size, namely \(k/n\).
The (7, 4) Hamming code adds 3 bits to the 4 input bits to produce a 7-bit codeword. Each of these 3 bits is the parity of various subsets of the 4 bits. As the notation suggests, this code is but one member of a family, with different input and output sizes. However, only certain sizes work, and we’ll soon see how 7 and 4 are determined from the number of parity bits via \(7 = 2^3 - 1\) and \(4 = 2^3 - 1 - 3\).
We represent the encoding operation with a matrix \( \newcommand{\GT}{\mathbf{G}^\top} \newcommand{\H}{\mathbf{H}} \newcommand{\w}{\mathbf{w}} \GT\) over \(\mathbb{Z}_2\). Like Mackay, we prefer to work with the transpose of the generator matrix \(\mathbf{G}\), partly so we can get away with writing just one simple matrix-vector multiplication routine. Given a vector \(\mathbf{v}\) of 4 bits, its corresponding codeword is \(\GT \mathbf{v}\).
identityMatrix n = take n $ map (take n) $ iterate (0:) $ 1 : repeat 0 parityMatrix n = transpose $ filter ((> 1) . sum) $ replicateM n [0,1] gt n = identityMatrix (2^n - n - 1) ++ parityMatrix n mv m v = dot v <$> m where dot = (((`mod` 2) . sum) .) . zipWith (*) putMat = putStr . unlines . map (unwords . map show) putMat $ gt 3 hamming74 = mv (gt 3) hamming74 [1,0,1,1]
The parity checks seem strange at first: we compute the parity of 3 of the 3-subsets of bits. Why not the fourth one? And how do we choose which one to omit?
It turns out it doesn’t matter which one we leave out. All that matters is that the 4 column vectors in the bottom 3 rows of \(\GT\) correspond to the possible ways to to pick at least two objects of a set of size 3. This leads to the constants 7 and 4.
The parity-check matrix \(\H\) resembles \(\GT\):
h n = zipWith (++) (parityMatrix n) $ identityMatrix n pcheck74 = mv (h 3)
Exercise: prove \(\H \w = 0\) exactly when \(\w\) is a codeword.
pcheck74 $ hamming74 [1,0,1,1]
Furthermore, if \(\w\) contains exactly one 1, then \(\w\) is one of 7 possibilities, and the syndrome \(\H \w\) is one of the 7 nonzero 3-bit vectors. Setting distinct bits leads to distinct syndromes:
pcheck74 [1,0,0,0,0,0,0] pcheck74 [0,0,0,1,0,0,0]
We decode and a correct a 7-bit word \(\w\) as follows. If \(\H \w = 0\) then \(\w\) is a valid codeword. If not, by linearity, we generate a lookup table that maps syndromes to a vector that indicates which bit of \(\w\) to flip to produce \(\w'\) such that \(\H \w' = 0\), implying \(\w'\) is a codeword. Then we return the first 4 bits.
ham n = concatMap (mv $ gt n) . chunksOf (2^n - n - 1) unham n = concatMap (take (nn - n) . correct) . chunksOf nn where nn = 2^n - 1 syns = zip (mv (h n) <$> identityMatrix nn) $ identityMatrix nn correct w = maybe w (w #+) $ lookup (mv (h n) w) syns sendH74 xs f = unham 3 $ send (ham 3 xs) f putPic $ sendH74 orig 0.1
As hinted above, for any given number \(n\) of parity bits, we can construct a \((2^n - 1, 2^n - 1 - n)\) Hamming code, that is:
-
There are \(n\) parity bits.
-
Codewords are \(2^n - 1\) bits long.
-
A codeword represents \(2^n - 1 - n\) bits.
-
Any word is at most one bit-flip away from a codeword.
When \(n = 1\), we get the (1, 0) Hamming code, a degenerate case. Each 1-bit word always decodes as the codeword 0, thus we’re communicating in unary; the only information we really send is the length of a message.
When \(n = 2\), we get the (3, 1) Hamming code, which is a fancy way of saying \(R_3\).
Perfect Code
A binary perfect code of length \(n\) is a code where for some \(e\), every \(n\)-bit word differs from exactly one codeword by at most \(e\) bits. This implies different codewords differ by at least \(2e + 1\) bits.
Such a code is perfect in the sense that it is guaranteed to correct up to \(e\) flipped bits. With other codes, a corrupt word may be equally close to distinct codewords. There may be no clear winner.
A little thought shows a necessary condition for a perfect code with \(W\) codewords is:
\[ W \sum_{i=0}^e {n \choose i} = 2^n \]
Thus \(W = 2^k\) for some \(k\) and we may write:
\[ \sum_{i=0}^e {n \choose i} = 2^{n-k} \]
The first few rows of Pascal’s triangle are:
pascal = iterate (\xs -> zipWith (+) (0:xs) (xs++[0])) [1] putMat $ take 8 pascal
We only need half of each row, because \(2e + 1 \le n\). The relevant cumulative sums:
cumu = zipWith take ((`div`2) <$> [1..]) $ scanl1 (+) <$> pascal putMat $ take 8 cumu
We see some nontrivial powers of 2, and it turns out they correspond to perfect codes, but unfortunately we’ve seen them all before. Since Hamming codes correct up to 1 bit, the powers of 2 in the second position like 4 and 8 are the Hamming codes. And the powers of 2 in the last position like 16 and 64 (and also 4) correspond to repetition codes.
Thus we compute the cumulative sums that correspond to codes that can correct at least 2 errors but less than \((n - 1) / 2\) errors.
cumu' = map (drop 2) $ zipWith take ((`div`2) <$> [2..]) $ scanl1 (+) <$> drop 2 pascal putMat $ take 16 cumu'
No powers of 2 appear in the first 16 rows. Golay searched further, and found an interesting linear code. We retrace his footsteps. We label each sum with its row and column, and detect powers of 2 using a well-known bit-twiddling trick that works on numbers that fit into machine words:
searchMe = zipWith (map . first . (,)) [2..] $ zip [2..] <$> cumu' isPow2 n = w .&. (w - 1) == 0 where w = fromInteger n :: Word filter (isPow2 . snd) $ concat $ take 100 searchMe
Thus in roughly 100 rows, we have two candidates. One is:
\[ {90 \choose 0} + {90 \choose 1} + {90 \choose 2} = 2^{12} \]
Golay starts by supposing a perfect linear code exists with \(n = 90\) and \(e = 2\). Consider the 90-bit words containing exactly one 1, which represent the 90 possible 1-bit errors. For some \(r\), exactly \(r\) of these 90 words have a syndrome containing an odd number of 1s, and the other \(90 - r\) have syndromes with an even number of 1s. By linearity, if a word is the sum of one of the \(r\) words and one of the other \(90 - r\) words, then its syndrome must also contain an odd number of 1s. Moreover, this exactly characterizes the syndromes with an odd number of 1s that correspond to 2-bit errors.
Half of the \(2^{12}\) possible syndromes contain an odd number of 1s, hence we have \(r + r(90 - r) = 2^{11}\). This has no integer solution, thus there cannot exist a 90-bit perfect linear code that corrects up to 2 bits.
What about the other candidate?
\[ {23 \choose 0} + {23 \choose 1} + {23 \choose 2} + {23 \choose 3} = 2^{11} \]
This corresponds to a curious creature known as \(G_{23}\), the binary Golay code, and it is mysteriously connected to other areas of mathematics:
-
The icosahedron.
-
Quadratic residues modulo 23.
-
The (7,4) Hamming code plus an extra bit that records the parity of the other 7: the extended (8,4) Hamming code.
-
The Mathieu groups \(M_{23}\) and \(M_{24}\), which are themselves members of a zoo known as the sporadic groups.
-
The Mogul variant of a mathematical coin turning game due to Lenstra (called "Turning Turtles" by Berlekamp, Conway, and Guy, Winning Ways, Chapter 14): two players, a row of 24 coins, and on your turn you lose unless you change one heads to tails, and optionally flip up to 6 of the coins to its right.
It seems easiest to construct the generator matrix of the binary Golay code by taking advantage of its cyclic nature: we shift a certain codeword based on factoring \(x^{23} - 1\) over \(GF(2)\), adding when necessary to cancel out 1s in unwanted places.
gen = replicate 11 0 ++ [1,1,0,0,0,1,1,1,0,1,0,1] ro x = tail x ++ [head x] zero11 v = if v!!11 == 1 then gen #+ v else v g23GT = transpose $ reverse $ take 12 $ iterate (zero11 . ro) gen putMat g23GT
Coding Contest
Time for a showdown. We simulate a binary symmetric channel with a noise level of 5%, and compare the codes \(R_3 = H(3, 1)\), \(H(7, 4)\), and \(G_{23}\):
g23H = zipWith (++) (drop 12 g23GT) (identityMatrix 11) wt n 0 = [replicate n 0] wt 0 _ = [] wt n k = ((1:) <$> wt (n - 1) (k - 1)) ++ ((0:) <$> wt (n - 1) k) unbits = sum . zipWith (*) (iterate (2*) 1) tab23 = zip (unbits . mv g23H <$> errs) errs where errs = wt 23 =<< [0..3] golay = concatMap (mv g23GT) . chunksOf 12 unGolay = concatMap (take 12 . correct) . chunksOf 23 where correct w = maybe w (w #+) $ lookup (unbits $ mv g23H w) tab23 sendG xs f = unGolay $ send (golay xs) f putStrLn "repetition-3:" putPic $ sendR 3 orig 0.05 putStrLn "Hamming(7, 4):" putPic $ sendH74 orig 0.05 putStrLn "Golay:" putPic $ sendG orig 0.05
Eyeballing it, we see \(R_3\) and \(G_{23}\) seem equally effective, while the Hamming code is noticeably worse, which we expect since it can only correct up to one bad bit per 7 bits. However, the rates \(4/7\) and \(12/23\) differ by less than 0.05, and both substantially exceed \(1/3\), so the winner is Golay.
By adding one parity check bit to \(G_{23}\), we get the extended binary Golay code \(G_{24}\), which can also detect, but not correct, 4-bit errors. The \(G_{24}\) code is used in deep space and radio communication.
Perfect is the enemy of good
It turns out we have already described all perfect codes over \(GF(2)\) (binary codes). There are two families, repetition and Hamming, and one outlier hermit, the Golay code. All have extended variants.
What if we consider other fields? Over \(GF(q)\) our constraint generalizes to:
\[ \sum_{i=0}^e {n \choose i}(q - 1)^i = q^{n-k} \]
It so happens that:
\[ {11\choose 0} + {11\choose 1} \cdot 2 + {11\choose 2}\cdot 2^2 = 3^5 \]
and it so happens that there exists a corresponding perfect linear code. It’s called the ternary Golay code, though it was first published by Finnish football pool enthusiast Juhani Virtakallio. As the equation says, this code has a blocklength of 11 and encodes 6 trits (ternary digits) at a time by adding 5 parity checks. It has connections with the Mathieu group \(M_{11}\).
Other than the above, there are no perfect codes. Over any field. Linear or non-linear. See Aimo Tietäväinen, On the Nonexistence of Perfect Codes Over Finite Fields.
This is a pity, because as the blocklength increases, the rate of repetition codes approaches 0, and the rate of Hamming codes approaches 1. Shannon’s noisy-channel coding theorem means that every noisy channel has a capacity \(C\) such that for any given probabiilty \(\epsilon\) and \(R < C\), there exists a code of some blocklength \(N\) with rate at least \(R\) and with a probability of block error less than \(\epsilon\).
For example, a binary symmetric channel with noise level \(f\) has capacity:
\[ 1 + f \lg f + (1 - f) \lg (1 - f) \]
When \(f = 0.1\), we have \(C = 0.531…\) so in theory, when the noise level is 10%, there exists a code with a really large blocklength that doesn’t even double the size of the message, yet decodes correctly with a really high probability. But whatever this code is, it’s neither repetition nor Hamming, thus it cannot be perfect. In general, we must work much harder to find good codes.