= From Atbash to Enigma =
2500 years of cryptography in a few lines of https://www.haskell.org/[Haskell],
from ancient ciphers to the infamous
https://en.wikipedia.org/wiki/Enigma_machine[Enigma machine]:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ringstellung:
Grundstellung:
Input:
Output:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Our code can be run by downloading link:enigma.lhs[the source of this page]:
------------------------------------------------------------------------------
$ wget https://crypto.stanford.edu/~blynn/haskell/enigma.lhs
$ ghci enigma.lhs
------------------------------------------------------------------------------
Preliminary administrivia:
\begin{code}
module Enigma where
import Data.Bool
import Data.Char
import Data.List
import Data.Maybe
\end{code}
== Writing Wrong ==
The
https://en.wikipedia.org/wiki/Substitution_cipher[_substitution cipher_]
is one of the oldest encyption schemes. The idea is so simple that I
inadvertently invented it myself long ago: my handwriting is so bad that
readers often confuse some of my letters for others! Substitution ciphers just
do this deliberately.
We work with the uppercase Latin alphabet. Our code leaves other characters
untouched.
\begin{code}
abc = ['A'..'Z']
\end{code}
A substitution cipher is defined by a permutation \(\pi\) of the alphabet.
The encryption of a letter \(x\) is \(\pi(x)\). We represent permutations with
_words_, that is, we write the permutation \(\pi\) as \(p_A p_B ... p_Z\) where
\(p_x = \pi(x)\).
Given a permutation `p`, we can write functions to encrypt and decrypt
with the corresponding substitution cipher as follows:
\begin{code}
sub p x = fromMaybe x $ lookup x $ zip abc p
unsub p x = fromMaybe x $ lookup x $ zip p abc
\end{code}
The _keyword cipher_ is a folk method for constructing a permutation.
We take a keyword and append the alphabet, removing any duplicate letters.
To guard against bad inputs we filter so that only uppercase letters are kept:
\begin{code}
instaperm k = nub $ filter isUpper k ++ abc
prop_permExample = instaperm "SWORDFISH" == "SWORDFIHABCEGJKLMNPQTUVXYZ"
prop_keywordExample =
(sub (instaperm "SWORDFISH") <$> "SCOLD HIRE") == "POKER HAND"
prop_unsubUndoesSub k =
(unsub (instaperm k) . sub (instaperm k) <$> abc) == abc
\end{code}
The https://en.wikipedia.org/wiki/Atbash[_Atbash cipher_] for the Latin
alphabet is the substitution cipher using the permutation whose word
representation is the reverse of the alphabet:
\begin{code}
atbash = sub $ reverse abc
prop_atbashExample = map atbash "SLIM GIRL" == "HORN TRIO"
\end{code}
We chose our examples carefully. Few English words encrypt to other English
words. [Exercise: write code to find them all from a given dictionary.]
A https://en.wikipedia.org/wiki/Caesar_cipher[_Caesar cipher_] or _Caesar
shift_ is a substitution cipher where each letter is replaced by the letter
that is a fixed number of positions ahead, wrapping around the alphabet if
necessary. For example, if A maps to D, then B maps to E, C to F, and so on.
We represent a Caesar shift by the letter to which A is mapped. This letter
is the secret key.
\begin{code}
shift k = sub $ dropWhile (/= k) $ cycle abc
unshift k = unsub $ dropWhile (/= k) $ cycle abc
prop_rot13Example = (shift 'N' <$> "ABJURER") == "NOWHERE"
\end{code}
Shifting by one letter comes in handy, so we give this special case a short
name:
\begin{code}
bump = shift 'B'
prop_unshiftAtbash = and
[unshift k c == bump (shift (atbash k) c) | k <- abc, c <- abc]
\end{code}
https://en.wikipedia.org/wiki/HAL_9000#Origin_of_name[A famous coincidence]:
\begin{code}
prop_2001 = map bump "HAL" == "IBM"
\end{code}
== Polyalphabetic Substitution ==
The
https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher[_Vigenère cipher_] is a
repeated sequence of Caesar shifts. We repeat a given key \(k_1...k_n\) until
it is as long as the message, and shift each plaintext letter by the
corresponding letter in the extended key:
\begin{code}
vigenere = zipWith shift . cycle
unvigenere = zipWith unshift . cycle
prop_vigenereExample = vigenere "LEMON" "ATTACKATDAWN" == "LXFOPVEFRNHR"
\end{code}
Our code is inconsistent. Earlier, our functions worked on one character at a
time, but now they expect entire strings. This might seem unavoidable because
of _polyalphabetic substitution_, that is, because the permutation used depends
on the position of the character in our plaintext.
However, we can still write an encryption function that operates on a single
character at a time as long as we hang on to some extra state, namely, which
letter of the keyword to use next. And `Data.List.mapAccumL` is tailor-made for
calling such a function in order to encrypt a string:
\begin{code}
vigenereChar (k:ks) x = (ks, shift k x)
vigenere' ks = snd . mapAccumL vigenereChar (cycle ks)
\end{code}
According to Wikipedia, this cipher is poorly named. Apparently, Giovan
Battista Bellaso originally described this cipher in 1553, and Blaise de
Vigenère in fact described the
https://en.wikipedia.org/wiki/Autokey_cipher[autokey cipher] in 1586:
\begin{code}
autokey ks xs = zipWith shift xs $ ks ++ xs
unautokey ks xs = m where m = zipWith unshift (ks ++ m) xs
prop_autokeyExample = autokey "QUEENLY" "ATTACKATDAWN" == "QNXEPVYTWTWP"
prop_unautokeyExample = unautokey "QUEENLY" "QNXEPVYTWTWP" == "ATTACKATDAWN"
\end{code}
== One-rotor Enigma ==
Enigma machines contain _rotors_, or _wheels_. Each rotor is a hard-wired
substitution cipher. Literally: there are wires fixed in position which
represent a particular permutation of the alphabet.
Let's take https://en.wikipedia.org/wiki/Enigma_rotor_details[a well-known
rotor]:
\begin{code}
pI = "EKMFLGDQVZNTOWYHXUSPAIBRCJ"
\end{code}
On the first keypress, the rotor rotates by one letter, so that the wire from A
to E now goes from Z to D, the wire from B to K now goes from A to J, and so
on. Current flows from the letter that was typed to the encryption of that
letter through the wire that connects them.
This repeats for subsequent letters: the rotor rotates by one letter, then
electric current flows from the letter that was struck to determine its
encryption. Thus encryption for one wheel can be described as follows:
\begin{code}
oneRotorEnigma = unvigenere abc . map (sub pI) . vigenere abc
prop_oneRotorEnigmaExample = oneRotorEnigma "AAAAA" == "EJKCH"
\end{code}
We use letters to denote how far a wheel has rotated, from A for 0 rotations
to Z for 25 rotations.
Above, we really should have used `bump abc` instead of `abc` because the rotor
turns before the first letter is enciphered. On a real machine, if we started
with the wheel in the A position (no rotations), we would get "JKCHB"; we would
get "EJKCH" by starting in the Z position. However, this is a trivial
off-by-one issue that we'll fix later.
Assuming the adversary obtains a copy of the wheels, the secret key
is the choice of wheel and its initial position.
== Early Enigma ==
The earliest Enigma machines consisted of three rotors connected in series. The
first rotor rotated by one letter for each keypress. The second rotor rotated
when the first rotor completed a full revolution, that is, every 26 keypresses.
Similarly, the third rotor rotated when the second rotor completed a full
revolution, that is, every 26^2^ keypresses:
\begin{code}
pII = "AJDKSIRUXBLHWTMCQGZNPYFVOE"
pIII = "BDFHJLCPRTXVZNYEIWGAKMUSQO"
abcDup n = concatMap (replicate n) abc
rotor1 f = unvigenere abc . f . vigenere abc
rotor2 f = unvigenere (abcDup 26) . f . vigenere (abcDup 26)
rotor3 f = unvigenere (abcDup $ 26^2) . f . vigenere (abcDup $ 26^2)
almostEarlyEnigma = rotor3 f3 . rotor2 f2 . rotor1 f1
where [f1, f2, f3] = map . sub <$> [pI, pII, pIII]
\end{code}
Actually, the above is slightly incorrect for a couple of reasons we shall
explore later (excluding the off-by-one bug described above), but suffices
for a first approximation.
Assuming the adversary obtains a copy of the wheels, the secret key
is the choices of the wheels, their ordering, and their initial positions.
== Self-inverse Engima ==
From a user's perspective, a drawback of the original Enigma machine is that
decryption is different to encryption. If the encryption function were its own
inverse (an
https://en.wikipedia.org/wiki/Involution_(mathematics)[_involution_]), then the
machine would be simpler to use: whether encrypting or decrypting, just set up
the rotors and type away.
Thus a _reflector_ was introduced. This is a hard-wired substitution cipher
that is its own inverse. For example, if A maps to Y, then Y must map to A.
For reasons we will explain below, the reflector must map each letter to a
distinct letter. For example, despite being self-inverse, the identity
permutation is an invalid reflector.
In mathematical terms, the permutation is a product of thirteen 2-cycles.
\begin{code}
reflectorB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
prop_reflectorNoFixed = and $ zipWith (/=) abc $ map (sub reflectorB) abc
prop_reflectorSelfInverse = iterate (map (sub reflectorB)) abc!!2 == abc
\end{code}
Then on each keypress, after the rotors have turned, the electrical current
is sent through the wheels in one direction to the reflector and then sent
back through the wheels in the opposite direction. This why the reflector
permutation must have no fixed points: current can only travel in one
direction on a wire.
The reflector makes the Enigma cipher its own inverse: the letter on one end
of the path taken by the current swaps with the letter on the other end.
The encryption of a letter must be a different letter, a weakness gleefully
exploited by cryptanalysts.
In mathematical terms, two permutations have the same cycle structure if and
only if they are conjugates, though here we only need one direction of this
fact.
Because of the physical layout of the machine, if we use the "default"
setting and place the wheels I, II, and III from left to right, then the
current goes through wheel III, then II, then I, then the
reflector, then back through I, then II, then III:
\begin{code}
almostEnigma = rotor1 b1 . rotor2 b2 . rotor3 b3 .
reflect . rotor3 f3 . rotor2 f2 . rotor1 f1
where
[f1, f2, f3] = map . sub <$> [pIII, pII, pI]
[b1, b2, b3] = map . unsub <$> [pIII, pII, pI]
reflect = map $ sub reflectorB
prop_almostEnigmaAAAAAA = almostEnigma "AAAAAA" == "UBDZGO"
prop_almostEnigmaSelfInverse s = (almostEnigma . almostEnigma) s == s
\end{code}
Our code almost simulates the Enigma I with its wheels set to AAZ (not AAA due
to the off-by-one bug).
== Dubstep ==
We've almost recreated an Enigma machine. Some differences stem from mechanical
engineering.
In a computer program, turning the second rotor for every 26 turns of the first
rotors might be implemented with a counter. With physical rotors, an elegant
solution is to carve a notch in the wheel that causes the next wheel to turn.
This notch is always in the same place, so we do not always reach it on the
26th keystroke. Instead, the first time we reach the notch depends on the
starting position of the rotor (and afterwards we reach the notch again every
26 turns).
A related problem caused by the notches is the _double stepping anomaly_.
We ignore http://www.intelligenia.org/downloads/rotors1.pdf[the mechanical
details], and just state its effects. If we reach the notch on the middle
wheel, then the middle wheel turns when the right wheel turns. Hence the name
"double stepping": after a keystroke causes the middle wheel to turn to its
notch, the next keystroke causes it to turn again, past its notch (which in
turn will cause the left wheel to turn).
It's as if we had a malfunctioning 3-digit counter:
...
0 8 7
0 8 8
0 8 9
0 9 0
1 0 1
1 0 2
...
Some Enigma variants feature rotors with multiple notches.
== Round and Round ==
One more subtlety. We have been using letters to indicate how far a wheel has
rotated. These are called the _indicator settings_ or the _Grundstellung_. On
Enigma machines, each rotor is labeled with the letters of the alphabet on an
index ring and the indicator settings appear in a row of little windows.
It turns out we can also rotate the wiring relative to the index ring, and we
also denote the extent of such a rotation with a letter. These are called the
_ring settings_ or the _Ringstellung_. Grundstellung and Ringstellung
rotations are measured in opposite directions.
We can account for the ring settings by modifying our shift offset. However,
the notch is a feature of the index ring, and not the wiring. The propagation
of rotation to the next wheel always occurs for the same letter in the
indicator window, but the wiring inside the current wheel may be in a different
position. Our code must handle this correctly.
== Enigma Variations ==
To make life harder for codebreakers, the German military augmented commercial
Enigma with an "aftermarket" part known as the _Steckerbrett_ or the
_plugboard_. This allowed users to plug in cables to swap up to 13 pairs of
letters just before a signal enters the rotors and just after it leaves.
In other words, it is a self-inverse permutation applied before and after
the standard Enigma cipher. Unlike the reflector, the Steckerbrett may have
fixed points, that is, it can be any self-inverse permutation.
All these extra details encourage us to introduce a data structure to hold
the state of an Enigma machine:
\begin{code}
data Enigma = Enigma
{ rotors :: [(String, String)]
, reflector :: String
, grundstellung :: String
, ringstellung :: String
, steckerbrett :: String
} deriving (Eq, Show)
\end{code}
The `rotors` list holds descriptions of the rotors from left to right on
a physical machine; the electrical signal from a keystroke enters the
rightmost wheel first. Each rotor is a pair of strings: the first string
is a permutation of the alphabet as a word, and the second describes all
notches on the rotor.
Here are http://www.cryptomuseum.com/crypto/enigma/wiring.htm[components
from real Enigma machines]:
\begin{code}
wI = ("EKMFLGDQVZNTOWYHXUSPAIBRCJ", "Q")
wII = ("AJDKSIRUXBLHWTMCQGZNPYFVOE", "E")
wIII = ("BDFHJLCPRTXVZNYEIWGAKMUSQO", "V")
wIV = ("ESOVPZJAYQUIRHXLNFTGKDCMWB", "J")
wV = ("VZBRGITYUPSDNHLXAWMJQOFECK", "Z")
ukwA = "EJMZALYXVBWFCRQUONTSPIKHGD"
ukwB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
ukwC = "FVPJIAOYEDRZXWGCTKUQSBNMHL"
defaultEnigma = Enigma
{ rotors = [wI, wII, wIII]
, reflector = ukwB
, grundstellung = "AAA"
, ringstellung = "AAA"
, steckerbrett = abc
}
\end{code}
UKW stands for Umkehrwalze, the German term for the reflector. The secret key
consists of the ring settings, indicator settings, reflector choice, rotor
choices, rotor order, and plugboard cables.
One iteration of the turning of the wheels can be described as follows:
\begin{code}
turn m = m { grundstellung =
[ bool g1 (bump g1) $ g2 `elem` n2
, bool g2 (bump g2) $ g2 `elem` n2 || g3 `elem` n3
, bump g3
]} where
[g1, g2, g3] = grundstellung m
[n1, n2, n3] = snd <$> rotors m
\end{code}
The `zap` function follows the current through the wires when a key is
struck on an Enigma machine `m` to find its encryption:
\begin{code}
conjugateSub p k = unshift k . sub p . shift k
rotorSubs m = zipWith conjugateSub (fst <$> rotors m) $
zipWith unshift (ringstellung m) $ grundstellung m
zap m = st . unsub p . sub (reflector m) . sub p . st where
p = foldr1 (.) (rotorSubs m) <$> abc
st = sub $ steckerbrett m
\end{code}
It remains to write wrappers. For an uppercase letter, the `enigmaChar`
function advances the machine one iteration then finds the encryption of the
letter. Otherwise we just leave the machine alone and return the input
character unchanged. The `enigma` function passes this function to `mapAccumL`
to encrypt strings.
\begin{code}
enigmaChar m k = bool (m, k) (m', zap m' k) $ isUpper k where m' = turn m
enigma m = snd . mapAccumL enigmaChar m
prop_enigmaExample =
enigma (defaultEnigma { ringstellung = "BBB" }) "AAAAA" == "EWTYX"
\end{code}
The top of this webpage features a simulation of an Enigma machine with
rotors I, II, and III from left to right, the B reflector, and no plugboard.
The ring and indicator settings are initially both AAA, but these can be
adjusted by dragging the black rings or boxes on the wheels, or by typing
in the text areas.
I built it on top of the code throughout this article, also with Haskell, and
compiled it to JavaScript with https://www.haste-lang.org/[Haste]. Hopefully,
my simulation agrees with other online Enigma simulations:
* http://enigmaco.de/enigma/enigma.html
* http://enigma.louisedade.co.uk/enigma.html