type State = Int type Dfa = (State, State -> Char -> State, State -> Bool) runDfa :: Dfa -> [Char] -> Bool runDfa (start, step, accepts) s = go start s where go state s = case s of "" -> accepts state c:rest -> go (step state c) rest
Ancient Greek computer science
To understand X, Socrates began by asking: "What is X?". For example: "What is justice?", or "What is virtue?"
His interlocutor typically gave an impressive but imprecise definition, to which Socrates asked a series of follow-up questions. Sooner or later, he’d find a contradiction. Both parties gained a deeper understanding of the topic, but even after several iterations, the main conclusion most often seemed to be "X is hard to define". Awfully little to extract from such a long process.
I first encountered an optimization in a book by Douglas Adams. Reply mathematically! When asked: "What is the answer to the meaning of life, the universe, and everything?", we say "42" and pre-empt those meddlesome contradictions. For if mathematics is flawed, then so is logical reasoning, and hence the Socratic method itself. Fight logic with logic.
Now the problem is to find a mathematical definition that reflects what we have in mind. This can be difficult, but often in computer science, mathematical definitions work well, and even guide the implementation.
What is a compiler?
We define a compiler to be a translator from one programming language to another. More mathematically, a compiler is a function that maps a program in the source language to a program in the target language:
compile :: Language -> TargetLanguage
But not just any function will do. A compiler must preserve the meaning of the source program.
What is a programming language?
Running a program produces an output from an input, so we define a programming language by means of an interpreter, which takes a program to a function mapping input values to output values:
interpret :: Language -> (Value -> Value) interpretTarget :: TargetLanguage -> (Value -> Value)
By convention, the arrow (->) associates to the right, so we write:
interpret :: Language -> Value -> Value interpretTarget :: TargetLanguage -> Value -> Value
Our condition for a compiler correctness is that for any program p in Language, and for any input in Value:
(interpret p) input == (interpretTarget (compile p)) input
By convention, juxtaposition denotes function application and associates to the left, which plays nicely with our other convention, so we write:
interpret p input == interpretTarget (compile p) input
So far so good, but there ought to be some indication that these interpreters are running on computers.
What is a computer?
We could leave it as is:
compute :: Value -> Value
This works if we say the input includes the entire state of the computer: every register, every single bit of memory, and so on. However, we’ve bitten off far more than we can chew; how can we reason about a complicated gigantic function?
What if we shrink the domain and range?
compute :: Char -> Char
Here, the set Char is relatively small, say the set of all 32-bit words.
A list of Char can hold inputs and outputs of arbitrary length:
type Value = [Char]
Unfortunately, all our compute function can do with such a list is to perform the same operation to each element of the list. We have modeled something like a circuit, which performs the same task on a fixed number of bits at a time.
Deterministic Finite Automata
Humans and computers are more than mere circuits because we retain information from one step to use in future steps. Even though we can only read a few words at a time, we can absorb a complex story from a book. This is because after starting in one state of mind, processing a few bits can put us in another state of mind.
This leads us to consider deterministic finite automata, or finite state machines:
Briefly, the machine begins in some start state, then updates its state according to the current state and the next input Char, and repeats until the entire input has been processed. If it ends in a state designated as an accepting state, we say the machine accepts the input.
For example, the following DFA accepts strings consisting of an odd number of 'x’s:
oddXs :: Dfa oddXs = (0, step, \state -> state == 1) where step 0 'x' = 1 step 1 'x' = 0 step _ _ = 2 -- This should return True. demoOddXs :: Bool demoOddXs = runDfa oddXs "xxxxx"
How many states should we allow? We could say there is one state for every possible configuration of the computer save the input, but then we’d simply be pushing a problem elsewhere. How can we reason about a automaton with so many states? Thus the number of states should be relatively small.
We quickly find DFAs are incapable of elementary tasks such as checking if a bunch of parentheses are balanced, or if some number of As has been followed by an equal number of Bs.
Deterministic Push-Down Automata
DFAs are hobbled by lack of memory. Let’s fix this. Perhaps the simplest kind of memory is a stack, as evidenced by the popularity of stack-based virtual machines. We model such machines with deterministic push-down automata:
type Diff = (State, [Char]) data DpdaOp = Reject | BlindPush Diff | Push (Char -> Diff) type Dpda = (State, Char, State -> Char -> DpdaOp, State -> Bool) runDpda :: Dpda -> [Char] -> Bool runDpda (start, stackInit, step, accepts) s = go start [stackInit] s where go state stack s = case stack of  -> False top:rest -> case step state top of Reject -> False BlindPush next -> apply next s Push f -> case s of "" -> accepts state c:cs -> apply (f c) cs where apply (nextState, stackDiff) = go nextState (stackDiff ++ rest)
Informally, the machine starts in a given state with a given symbol on the stack. On each step, it examines the stack. If empty, then the machine rejects, otherwise it pops off the top of the stack. The current state and the popped-off symbol determine the action for this step, which is one of:
Reject: halt the machine and reject the input.
BlindPush: ignore the input, enter a given state and push a given sequence of symbols on the stack.
Push: if at the end of input, then halt, accepting if in a given set of states. Otherwise consume one character of the input, which may be used to determine the next state and the symbols to push.
For example, the following DPDA accepts strings consisting of balanced parentheses. As it processes characters from the input string, it records the current nesting depth in unary on the stack.
balanced :: Dpda balanced = (0, '-', step, \state -> state == 0) where step 0 '-' = Push (\c -> case c of '(' -> (1, "1-") _ -> (2, "") ) step 1 '-' = BlindPush (0, "-") step 1 '1' = Push (\c -> case c of '(' -> (1, "11") ')' -> (1, "") _ -> (2, "") ) step _ _ = Reject -- This should return True. demoBalanced :: Bool demoBalanced = runDpda balanced "(()())"
It turns out DPDAs are incapable of basic tasks such as checking if the input is a palindrome, or if some number of As has been followed by an equal number of Bs followed by an equal number of Cs.
DPDAs flounder because the values popped off the stack are lost. What if we could save them, say, on a second stack?
Also, the design of DPDAs smells. We have Char values on a stack, and another bunch of Char values in our input string. The former tells us whether to bother reading from the latter. There are two edge cases to worry about: the empty stack, and the end of input. More than one condition causes the machine to halt. Can we simplify?
Let’s try starting with two stacks that initially hold the input. We add a current state and a current Char value that determine our next move, which is either to halt, or to do all of the following:
push a Char to one of the stacks;
pop off a Char from the other, which becomes our next current value; and
enter the next state.
This seems promising, but we have limited memory because each push corresponds to a pop. We could permit pushing several values at once, but this makes our machine uglier.
Better to allow "bottomless" stacks. Rather than rejecting, we define the result of popping an empty stack to be a special Char value that we call a blank.
We have just described Turing machines. Usually, the two stacks and current value are explained in terms of a movable read/write head on a tape that is infinitely long in either direction, which is disconcerting since it’s impossible to cram an infinite tape into a real computer, or for that matter, into our universe. Nonetheless we adopt mainstream nomenclature in our code. For example, we call a stack a one-way tape.
data OneWayTape = Char :> OneWayTape type Tape = (OneWayTape, Char, OneWayTape) data Direction = L | R data TMOp = Halt | Step (State, (Char, Direction)) type TM = (State, State -> Char -> TMOp) runTM :: TM -> Tape -> Tape runTM (start, step) tape = go start tape where go state (l, c, r) = case step state c of Halt -> (l, c, r) Step (nextState, (c', dir)) -> go state (case dir of L -> case l of x :> xs -> (xs , x, c' :> r) R -> case r of x :> xs -> (c' :> l, x, xs ) )
We represent a blank with the space character. We encode the input string by pushing its characters on one of the stacks, or in conventional terms, we write the string immediately to the right of tape head. We expect the output to be in the same format.
blanks :: OneWayTape blanks = ' ' :> blanks toTape :: [Char] -> Tape toTape s = (blanks, ' ', go s) where go "" = blanks go (c:rest) = c :> go rest fromTape :: Tape -> [Char] fromTape (_, _, xs) = go xs where go (x :> xt) = if x == ' ' then "" else x:go xt evalTM :: TM -> String -> String evalTM p s = fromTape (runTM p (toTape s))
We ditch the notion of accepting states; we could trivially support them by tweaking any given TM to write a symbol representing acceptance before halting.
Turing machines are a decent mathematical answer to "What is a computer?" Thus we revise our definition of an interpreter, which in turn defines a programming language:
interpret :: Language -> TM
What is a compiler?
Let’s put all the pieces together. A language is defined by a function that maps a given program to a Turing machine. Given two language specifications:
interpret :: Language -> TM interpretTarget :: TargetLanguage -> TM
a compiler from the first language to the second is a function:
compile :: Language -> TargetLanguage
such that for all source programs p and for all strings s:
evalTM (interpret p) s == evalTM (interpretTarget (compile p)) s
What else is there?
Researchers have long used interpreters to define programming languages. McCarthy’s paper on LISP and Landin’s paper on ALGOL 60 are classics. This technique grew more mathematical over time, evolving into formal semantics. Knuth vividly recounts the history in a paper on attribute grammars. (Attribute grammars are a venerable branch of formal semantics; see Backhouse, A Functional Semantics of Attribute Grammars for a modern take.)
Incredibly, formal semantics exist for a substantial subset of C. Attempting to formalize all of C is a fool’s errand. For example:
Officially, it is illegal to optimize x=0; x=0; but this rule is flouted in practice.
Pointer semantics are ill-defined. Try for yourself online! We sympathize with: "…the ISO standard is a prose document, as is typical for industry standards. The lack of mathematical precision, while also typical for industry standards, has surely contributed to the accumulated confusion…"
There is even a certified C compiler, that is, a C compiler that a computer can prove is correct.
Conal Elliot gave an inspiring talk on denotational design. He advocates mathematical thinking beyond its traditional strongholds, and in particular shows how it applies to manipulating images. See also Functional Reactive Programming and Functional Reactive Programming, more elegantly specified.
Richard Montague believed that with enough research, we can show natural languages are no different to formal languages, so perhaps there exists a non-mathematical path to precision.
Decades ago, some argued against getting too formal with programming languages. Later, some cryptographers argued similarly against provable security (see also its sequel). This stance seems incompatible with Montague’s.