-- Module exports. module Ast where import Base import Map data Type = TC String | TV String | TAp Type Type deriving Eq arr a b = TAp (TAp (TC "->") a) b data Extra = Basic String | Const Int | ChrCon Char | StrCon String | Link String String Qual data Pat = PatLit Ast | PatVar String (Maybe Pat) | PatCon String [Pat] data Ast = E Extra | V String | A Ast Ast | L String Ast | Pa [([Pat], Ast)] | Proof Pred data Constr = Constr String [(String, Type)] data ModExport = ExportVar String | ExportCon String [String] data Pred = Pred String Type deriving Eq data Qual = Qual [Pred] Type instance Show Type where showsPrec _ = \case TC s -> (s++) TV s -> (s++) TAp (TAp (TC "->") a) b -> showParen True $ shows a . (" -> "++) . shows b TAp a b -> showParen True $ shows a . (' ':) . shows b instance Show Pred where showsPrec _ (Pred s t) = (s++) . (' ':) . shows t . (" => "++) instance Show Qual where showsPrec _ (Qual ps t) = foldr (.) id (map shows ps) . shows t instance Show Extra where showsPrec _ = \case Basic s -> (s++) Const i -> shows i ChrCon c -> shows c StrCon s -> shows s Link im s _ -> (im++) . ('.':) . (s++) instance Show Pat where showsPrec _ = \case PatLit e -> shows e PatVar s mp -> (s++) . maybe id ((('@':) .) . shows) mp PatCon s ps -> (s++) . foldr (.) id (((' ':) .) . shows <$> ps) showVar s@(h:_) = showParen (elem h ":!#$%&*+./<=>?@\\^|-~") (s++) instance Show Ast where showsPrec prec = \case E e -> shows e V s -> showVar s A x y -> showParen (1 <= prec) $ shows x . (' ':) . showsPrec 1 y L s t -> showParen True $ ('\\':) . (s++) . (" -> "++) . shows t Pa vsts -> ("\\cases{"++) . foldr (.) id (intersperse (';':) $ map (\(vs, t) -> foldr (.) id (intersperse (' ':) $ map (showParen True . shows) vs) . (" -> "++) . shows t) vsts) . ('}':) Proof p -> ("{Proof "++) . shows p . ("}"++) data Instance = Instance -- Type, e.g. Int for Eq Int. Type -- Dictionary name, e.g. "{Eq Int}" String -- Context. [Pred] -- Method definitions (Map String Ast) data Assoc = NAssoc | LAssoc | RAssoc deriving Eq data Neat = Neat { typeclasses :: Map String [String] , instances :: Map String [Instance] , topDefs :: [(String, Ast)] , topDecls :: Map String Qual -- | Typed ASTs, ready for compilation, including ADTs and methods, -- e.g. (==), (Eq a => a -> a -> Bool, select-==) , typedAsts :: Map String (Qual, Ast) , dataCons :: Map String [Constr] , type2Cons :: Map String [String] , ffiImports :: Map String Type , ffiExports :: Map String String , moduleImports :: [String] , moduleExports :: Maybe [String] , opFixity :: Map String (Int, Assoc) } neatEmpty = Neat Tip Tip [] Tip Tip Tip Tip Tip Tip [] Nothing Tip patVars = \case PatLit _ -> [] PatVar s m -> s : maybe [] patVars m PatCon _ args -> concat $ patVars <$> args typeVars = \case TC _ -> [] TV v -> [v] TAp x y -> typeVars x `union` typeVars y depthFirstSearch = (foldl .) \relation st@(visited, sequence) vertex -> if vertex `elem` visited then st else second (vertex:) $ depthFirstSearch relation (vertex:visited, sequence) (relation vertex) spanningSearch = (foldl .) \relation st@(visited, setSequence) vertex -> if vertex `elem` visited then st else second ((:setSequence) . (vertex:)) $ depthFirstSearch relation (vertex:visited, []) (relation vertex) scc ins outs = spanning . depthFirst where depthFirst = snd . depthFirstSearch outs ([], []) spanning = snd . spanningSearch ins ([], [])
Client-side compiler
Our next goal is a browser-based edition of our compiler.
Crossly
We first add a few features related to WebAssembly. Firstly, the wasm
command
compiles Haskell to C intended to be compiled to wasm. The resulting binary
assumes the host environment supplies a few functions such as env.getchar
.
Secondly, the warts
command prints the RTS generated by the compiler, also as
C code intended to be compiled to wasm. We later use this to build compilers
that can go directly from Haskell to wasm.
We do some housekeeping. Given a Neat
, type inference had produced a tuple of
a particular type that contained the data needed by the next phase. We change
it to produce a new Neat
with an updated typedAsts
field, so there’s one
fewer data type to occupy our thoughts and APIs. We no longer need to pick out
specific fields to pass to the next phase, as we simply pass everything.
We take a first stab at top-level type declarations. We treat them similarly to default typeclass methods, in that during type inference, we trust the symbol has its annotated type, and only afterwards that we verify the annotated type matches the inferred type. It’s more complex because we must process an entire strongly connected component at a time.
Adding modules has made a mess of our various functions for looking up data constructors, top-level variables, typeclasses, and so on. We reorganize them a little to standardize the logic for searching through the list of imports. This makes it easier to add support for lists of export symbols. We also replace the giant global operator precedence table with local tables.
We experiment with hash consing which reduces heap usage by maximizing sharing. However, it may cost too much, as our compiler has grown appreciably slower.
-- Separate fixity phase. -- Export lists. module Parser where import Base import Ast import Map -- Parser. data ParserState = ParserState { readme :: [(Char, (Int, Int))] , landin :: String , indents :: [Int] } data Parser a = Parser { getParser :: ParserState -> Either String (a, ParserState) } instance Functor Parser where fmap f x = pure f <*> x instance Applicative Parser where pure x = Parser \inp -> Right (x, inp) (Parser f) <*> (Parser x) = Parser \inp -> do (fun, t) <- f inp (arg, u) <- x t pure (fun arg, u) instance Monad Parser where return = pure (Parser x) >>= f = Parser \inp -> do (a, t) <- x inp getParser (f a) t instance Alternative Parser where empty = bad "" x <|> y = Parser \inp -> either (const $ getParser y inp) Right $ getParser x inp notFollowedBy p = do saved <- Parser \pasta -> Right (pasta, pasta) ret <- p *> pure (bad "") <|> pure (pure ()) Parser \_ -> Right ((), saved) ret parse f str = getParser f $ ParserState (rowcol str (1, 1)) [] [] where rowcol s rc = case s of [] -> [] h:t -> (h, rc) : rowcol t (advanceRC (ord h) rc) advanceRC n (r, c) | n `elem` [10, 11, 12, 13] = (r + 1, 1) | n == 9 = (r, (c + 8)`mod`8) | True = (r, c + 1) indentOf pasta = case readme pasta of [] -> 1 (_, (_, c)):_ -> c ins c pasta = pasta { landin = c:landin pasta } angle n pasta = case indents pasta of m:ms | m == n -> ins ';' pasta | n + 1 <= m -> ins '}' $ angle n pasta { indents = ms } _ -> pasta curly n pasta = case indents pasta of m:ms | m + 1 <= n -> ins '{' pasta { indents = n:m:ms } [] | 1 <= n -> ins '{' pasta { indents = [n] } _ -> ins '{' . ins '}' $ angle n pasta sat f = Parser \pasta -> case landin pasta of c:t -> if f c then Right (c, pasta { landin = t }) else Left "unsat" [] -> case readme pasta of [] -> case indents pasta of [] -> Left "EOF" m:ms | m /= 0 && f '}' -> Right ('}', pasta { indents = ms }) _ -> Left "unsat" (h, _):t | f h -> let p' = pasta { readme = t } in case h of '}' -> case indents pasta of 0:ms -> Right (h, p' { indents = ms }) _ -> Left "unsat" '{' -> Right (h, p' { indents = 0:indents p' }) _ -> Right (h, p') _ -> Left "unsat" char c = sat (c ==) rawSat f = Parser \pasta -> case readme pasta of [] -> Left "EOF" (h, _):t -> if f h then Right (h, pasta { readme = t }) else Left "unsat" eof = Parser \pasta -> case pasta of ParserState [] [] _ -> Right ((), pasta) _ -> badpos pasta "want eof" comment = rawSat ('-' ==) *> some (rawSat ('-' ==)) *> (rawSat isNewline <|> rawSat (not . isSymbol) *> many (rawSat $ not . isNewline) *> rawSat isNewline) *> pure True spaces = isNewline <$> rawSat isSpace whitespace = do offside <- or <$> many (spaces <|> comment) Parser \pasta -> Right ((), if offside then angle (indentOf pasta) pasta else pasta) hexValue d | d <= '9' = ord d - ord '0' | d <= 'F' = 10 + ord d - ord 'A' | d <= 'f' = 10 + ord d - ord 'a' isNewline c = ord c `elem` [10, 11, 12, 13] isSymbol = (`elem` "!#$%&*+./<=>?@\\^|-~:") isSmall c = c <= 'z' && 'a' <= c || c == '_' small = sat isSmall large = sat \x -> (x <= 'Z') && ('A' <= x) hexit = sat \x -> (x <= '9') && ('0' <= x) || (x <= 'F') && ('A' <= x) || (x <= 'f') && ('a' <= x) digit = sat \x -> (x <= '9') && ('0' <= x) decimal = foldl (\n d -> 10*n + ord d - ord '0') 0 <$> some digit hexadecimal = foldl (\n d -> 16*n + hexValue d) 0 <$> some hexit nameTailChar = small <|> large <|> digit <|> char '\'' nameTailed p = liftA2 (:) p $ many nameTailChar escape = char '\\' *> (sat (`elem` "'\"\\") <|> char 'n' *> pure '\n' <|> char '0' *> pure '\0' <|> char 'x' *> (chr <$> hexadecimal)) tokOne delim = escape <|> rawSat (delim /=) charSeq = mapM char tokChar = between (char '\'') (char '\'') (tokOne '\'') quoteStr = between (char '"') (char '"') $ many $ many (charSeq "\\&") *> tokOne '"' quasiquoteStr = charSeq "[r|" *> quasiquoteBody quasiquoteBody = charSeq "|]" *> pure [] <|> (:) <$> rawSat (const True) <*> quasiquoteBody tokStr = quoteStr <|> quasiquoteStr integer = char '0' *> (char 'x' <|> char 'X') *> hexadecimal <|> decimal literal = lexeme . fmap E $ Const <$> integer <|> ChrCon <$> tokChar <|> StrCon <$> tokStr varish = lexeme $ nameTailed small bad s = Parser \pasta -> badpos pasta s badpos pasta s = Left $ loc $ ": " ++ s where loc = case readme pasta of [] -> ("EOF"++) (_, (r, c)):_ -> ("row "++) . shows r . (" col "++) . shows c varId = do s <- varish if elem s ["export", "case", "class", "data", "default", "deriving", "do", "else", "foreign", "if", "import", "in", "infix", "infixl", "infixr", "instance", "let", "module", "newtype", "of", "then", "type", "where", "_"] then bad $ "reserved: " ++ s else pure s varSymish = lexeme $ (:) <$> sat (\c -> isSymbol c && c /= ':') <*> many (sat isSymbol) varSym = lexeme $ do s <- varSymish if elem s ["..", "=", "\\", "|", "<-", "->", "@", "~", "=>"] then bad $ "reserved: " ++ s else pure s conId = lexeme $ nameTailed large conSymish = lexeme $ liftA2 (:) (char ':') $ many $ sat isSymbol conSym = do s <- conSymish if elem s [":", "::"] then bad $ "reserved: " ++ s else pure s special c = lexeme $ sat (c ==) comma = special ',' semicolon = special ';' lParen = special '(' rParen = special ')' lBrace = special '{' rBrace = special '}' lSquare = special '[' rSquare = special ']' backquoted = between (char '`' *> whitespace) (char '`' *> whitespace) lexeme f = f <* whitespace lexemePrelude = whitespace *> Parser \pasta -> case getParser (res "module" <|> (:[]) <$> char '{') pasta of Left _ -> Right ((), curly (indentOf pasta) pasta) Right _ -> Right ((), pasta) curlyCheck f = do Parser \pasta -> Right ((), pasta { indents = 0:indents pasta }) r <- f Parser \pasta -> let pasta' = pasta { indents = tail $ indents pasta } in case readme pasta of [] -> Right ((), curly 0 pasta') ('{', _):_ -> Right ((), pasta') (_, (_, col)):_ -> Right ((), curly col pasta') pure r conOf (Constr s _) = s specialCase (h:_) = '|':conOf h mkCase t cs = insert (specialCase cs) ( Qual [] $ arr t $ foldr arr (TV "case") $ map (\(Constr _ sts) -> foldr arr (TV "case") $ snd <$> sts) cs , E $ Basic "I") mkStrs = snd . foldl (\(s, l) u -> ('@':s, s:l)) ("@", []) scottEncode _ ":" _ = E $ Basic "CONS" scottEncode vs s ts = foldr L (foldl (\a b -> A a (V b)) (V s) ts) (ts ++ vs) scottConstr t cs (Constr s sts) = foldr (.) (insertWith (error $ "constructor conflict: " ++ s) s (Qual [] $ foldr arr t ts , scottEncode (map conOf cs) s $ mkStrs ts) ) [insertWith (error $ "field conflict: " ++ field) field (Qual [] $ t `arr` ft, L s $ foldl A (V s) $ inj $ proj field) | (field, ft) <- sts, field /= ""] where ts = snd <$> sts proj fd = foldr L (V fd) $ fst <$> sts inj x = map (\(Constr s' _) -> if s' == s then x else V "undefined") cs mkAdtDefs t cs = foldr (.) (mkCase t cs) $ scottConstr t cs <$> cs mkFFIHelper n t acc = case t of TC s -> acc TAp (TC "IO") _ -> acc TAp (TAp (TC "->") x) y -> L (show n) $ mkFFIHelper (n + 1) y $ A (V $ show n) acc updateDcs cs dcs = foldr (\(Constr s _) m -> insert s cs m) dcs cs addAdt t cs ders neat = foldr derive neat' ders where neat' = neat { typedAsts = mkAdtDefs t cs $ typedAsts neat , dataCons = updateDcs cs $ dataCons neat , type2Cons = insert (typeName t) (concatMap cnames cs) $ type2Cons neat } typeName = \case TAp x _ -> typeName x TC c -> c cnames (Constr s sts) = s : concatMap (\(s, _) -> if s == "" then [] else [s]) sts derive "Eq" = addInstance "Eq" (mkPreds "Eq") t [("==", Pa $ map eqCase cs )] derive "Show" = addInstance "Show" (mkPreds "Show") t [("showsPrec", L "prec" $ Pa $ map showCase cs )] derive der = error $ "bad deriving: " ++ der prec0 = A (V "ord") (E $ ChrCon '\0') showCase (Constr con args) = let as = show <$> [1..length args] in ([PatCon con $ mkPatVar "" <$> as], case args of [] -> A (V "++") (E $ StrCon con) _ -> case con of ':':_ -> A (A (V "showParen") $ V "True") $ foldr1 (\f g -> A (A (V ".") f) g) [ A (A (V "showsPrec") prec0) (V "1") , A (V "++") (E $ StrCon $ ' ':con++" ") , A (A (V "showsPrec") prec0) (V "2") ] _ -> A (A (V "showParen") $ A (A (V "<=") prec0) $ V "prec") $ A (A (V ".") $ A (V "++") (E $ StrCon con)) $ foldr (\f g -> A (A (V ".") f) g) (L "x" $ V "x") $ map (\a -> A (A (V ".") (A (V ":") (E $ ChrCon ' '))) $ A (A (V "showsPrec") prec0) (V a)) as ) mkPreds classId = Pred classId . TV <$> typeVars t mkPatVar pre s = PatVar (pre ++ s) Nothing eqCase (Constr con args) = let as = show <$> [1..length args] in ([PatCon con $ mkPatVar "l" <$> as], Pa [ ([PatCon con $ mkPatVar "r" <$> as], foldr (\x y -> (A (A (V "&&") x) y)) (V "True") $ map (\n -> A (A (V "==") (V $ "l" ++ n)) (V $ "r" ++ n)) as) , ([PatVar "_" Nothing], V "False")]) addClass classId v (sigs, defs) neat = if not $ member classId $ typeclasses neat then neat { typeclasses = insert classId (keys sigs) $ typeclasses neat , typedAsts = selectors $ typedAsts neat , topDefs = defaults ++ topDefs neat } else error $ "duplicate class: " ++ classId where vars = take (size sigs) $ show <$> [0..] selectors = foldr (.) id $ zipWith (\var (s, Qual ps t) -> insertWith (error $ "method conflict: " ++ s) s (Qual (Pred classId v:ps) t, L "@" $ A (V "@") $ foldr L (V var) vars)) vars $ toAscList sigs defaults = map (\(s, t) -> if member s sigs then ("{default}" ++ s, t) else error $ "bad default method: " ++ s) $ toAscList defs addInstance classId ps ty ds neat = neat { instances = insertWith (++) classId [Instance ty name ps (fromList ds)] $ instances neat } where name = '{':classId ++ (' ':shows ty "}") addTopDecl (s, t) neat = neat { topDecls = insert s t $ topDecls neat } addForeignImport foreignname ourname t neat = neat { typedAsts = insertWith (error $ "import conflict: " ++ ourname) ourname (Qual [] t, mkFFIHelper 0 t $ A (E $ Basic "F") $ A (E $ Basic "NUM") $ E $ Link "{foreign}" foreignname $ Qual [] t) $ typedAsts neat , ffiImports = insertWith (error $ "duplicate import: " ++ foreignname) foreignname t $ ffiImports neat } addForeignExport e f neat = neat { ffiExports = insertWith (error $ "duplicate export: " ++ e) e f $ ffiExports neat } addDefs ds neat = neat { topDefs = ds ++ topDefs neat } addImport im neat = neat { moduleImports = im:moduleImports neat } addFixities os prec neat = neat { opFixity = foldr (\o tab -> insert o prec tab) (opFixity neat) os } parseErrorRule = Parser \pasta -> case indents pasta of m:ms | m /= 0 -> Right ('}', pasta { indents = ms }) _ -> badpos pasta "missing }" res w@(h:_) = reservedSeq *> pure w <|> bad ("want \"" ++ w ++ "\"") where reservedSeq = if elem w ["let", "where", "do", "of"] then curlyCheck $ lexeme $ charSeq w *> notFollowedBy nameTailChar else lexeme $ charSeq w *> notFollowedBy (if isSmall h then nameTailChar else sat isSymbol) paren = between lParen rParen braceSep f = between lBrace (rBrace <|> parseErrorRule) $ foldr ($) [] <$> sepBy ((:) <$> f <|> pure id) semicolon joinIsFail t = A (L "join#" t) (V "fail#") addLets ls x = L "let" $ foldr L (L "in" bodies) $ fst <$> ls where bodies = foldr A x $ joinIsFail . snd <$> ls qconop = conSym <|> res ":" <|> backquoted conId qconsym = conSym <|> res ":" op = qconsym <|> varSym <|> backquoted (conId <|> varId) con = conId <|> paren qconsym var = varId <|> paren varSym tycon = do s <- conId pure $ if s == "String" then TAp (TC "[]") (TC "Char") else TC s aType = lParen *> ( rParen *> pure (TC "()") <|> (foldr1 (TAp . TAp (TC ",")) <$> sepBy1 _type comma) <* rParen) <|> tycon <|> TV <$> varId <|> (lSquare *> (rSquare *> pure (TC "[]") <|> TAp (TC "[]") <$> (_type <* rSquare))) bType = foldl1 TAp <$> some aType _type = foldr1 arr <$> sepBy bType (res "->") fixityDecl w a = do res w n <- lexeme integer os <- sepBy op comma pure $ addFixities os (n, a) fixity = fixityDecl "infix" NAssoc <|> fixityDecl "infixl" LAssoc <|> fixityDecl "infixr" RAssoc cDecls = first fromList . second fromList . foldr ($) ([], []) <$> braceSep cDecl cDecl = first . (:) <$> genDecl <|> second . (++) <$> defSemi genDecl = (,) <$> var <* res "::" <*> (Qual <$> (scontext <* res "=>" <|> pure []) <*> _type) classDecl = res "class" *> (addClass <$> conId <*> (TV <$> varId) <*> (res "where" *> cDecls)) simpleClass = Pred <$> conId <*> _type scontext = (:[]) <$> simpleClass <|> paren (sepBy simpleClass comma) instDecl = res "instance" *> ((\ps cl ty defs -> addInstance cl ps ty defs) <$> (scontext <* res "=>" <|> pure []) <*> conId <*> _type <*> (res "where" *> braceDef)) letin = addLets <$> between (res "let") (res "in") braceDef <*> expr ifthenelse = (\a b c -> A (A (A (V "if") a) b) c) <$> (res "if" *> expr) <*> (res "then" *> expr) <*> (res "else" *> expr) listify = foldr (\h t -> A (A (V ":") h) t) (V "[]") alts = joinIsFail . Pa <$> braceSep ((\x y -> ([x], y)) <$> pat <*> gdSep "->") cas = flip A <$> between (res "case") (res "of") expr <*> alts lamCase = curlyCheck (res "case") *> alts nalts = joinIsFail . Pa <$> braceSep ((,) <$> many apat <*> gdSep "->") lamCases = curlyCheck (res "cases") *> nalts lam = res "\\" *> (lamCase <|> lamCases <|> liftA2 onePat (some apat) (res "->" *> expr)) flipPairize y x = A (A (V ",") x) y moreCommas = foldr1 (A . A (V ",")) <$> sepBy1 expr comma thenComma = comma *> ((flipPairize <$> moreCommas) <|> pure (A (V ","))) parenExpr = (&) <$> expr <*> (((\v a -> A (V v) a) <$> op) <|> thenComma <|> pure id) rightSect = ((\v a -> L "@" $ A (A (V v) $ V "@") a) <$> (op <|> (:"") <$> comma)) <*> expr section = lParen *> (parenExpr <* rParen <|> rightSect <* rParen <|> rParen *> pure (V "()")) maybePureUnit = maybe (V "pure" `A` V "()") id stmt = (\p x -> Just . A (V ">>=" `A` x) . onePat [p] . maybePureUnit) <$> pat <*> (res "<-" *> expr) <|> (\x -> Just . maybe x (\y -> (V ">>=" `A` x) `A` (L "_" y))) <$> expr <|> (\ds -> Just . addLets ds . maybePureUnit) <$> (res "let" *> braceDef) doblock = res "do" *> (maybePureUnit . foldr ($) Nothing <$> braceSep stmt) compQual = (\p xs e -> A (A (V "concatMap") $ onePat [p] e) xs) <$> pat <*> (res "<-" *> expr) <|> (\b e -> A (A (A (V "if") b) e) $ V "[]") <$> expr <|> addLets <$> (res "let" *> braceDef) sqExpr = between lSquare rSquare $ ((&) <$> expr <*> ( res ".." *> ( (\hi lo -> (A (A (V "enumFromTo") lo) hi)) <$> expr <|> pure (A (V "enumFrom")) ) <|> res "|" *> ((. A (V "pure")) . foldr (.) id <$> sepBy1 compQual comma) <|> (\t h -> listify (h:t)) <$> many (comma *> expr) ) ) <|> pure (V "[]") fbind = A <$> (V <$> var) <*> (res "=" *> expr) fBinds v = (do fbs <- between lBrace rBrace $ sepBy1 fbind comma pure $ A (E $ Basic "{=") $ foldr A (E $ Basic "=}") $ v:fbs ) <|> pure v atom = ifthenelse <|> doblock <|> letin <|> sqExpr <|> section <|> cas <|> lam <|> (paren comma *> pure (V ",")) <|> V <$> (con <|> var) <|> literal >>= fBinds aexp = foldl1 A <$> some atom chain a = \case [] -> a A f b:rest -> case rest of [] -> A (A f a) b _ -> L "(" $ A (A (A f a) b) $ foldr A (V ")") rest _ -> error "unreachable" expr = chain <$> aexp <*> many (A <$> (V <$> op) <*> aexp) gcon = conId <|> paren (qconsym <|> (:"") <$> comma) <|> lSquare *> rSquare *> pure "[]" apat = PatVar <$> var <*> (res "@" *> (Just <$> apat) <|> pure Nothing) <|> flip PatVar Nothing <$> (res "_" *> pure "_") <|> flip PatCon [] <$> gcon <|> PatLit <$> literal <|> foldr (\h t -> PatCon ":" [h, t]) (PatCon "[]" []) <$> between lSquare rSquare (sepBy pat comma) <|> paren (foldr1 pairPat <$> sepBy1 pat comma <|> pure (PatCon "()" [])) where pairPat x y = PatCon "," [x, y] patChain a = \case [] -> a PatCon f [b]:rest -> case rest of [] -> PatCon f [a, b] _ -> PatCon "{+" $ PatCon f [a, b] : rest _ -> error "unreachable" patAtom = PatCon <$> gcon <*> many apat <|> apat pat = patChain <$> patAtom <*> many (PatCon <$> qconop <*> ((:[]) <$> patAtom)) maybeWhere p = (&) <$> p <*> (res "where" *> (addLets <$> braceDef) <|> pure id) gdSep s = maybeWhere $ res s *> expr <|> foldr ($) (V "join#") <$> some (between (res "|") (res s) guards <*> expr) guards = foldr1 (\f g -> \yes no -> f (g yes no) no) <$> sepBy1 guard comma guard = guardPat <$> pat <*> (res "<-" *> expr) <|> guardExpr <$> expr <|> guardLets <$> (res "let" *> braceDef) guardExpr x yes no = case x of V "True" -> yes _ -> A (A (A (V "if") x) yes) no guardPat p x yes no = A (Pa [([p], yes), ([PatVar "_" Nothing], no)]) x guardLets defs yes no = addLets defs yes onePat vs x = joinIsFail $ Pa [(vs, x)] defOnePat vs x = Pa [(vs, x)] opDef x f y rhs = [(f, defOnePat [x, y] rhs)] leftyPat p expr = case pvars of [] -> [] (h:t) -> let gen = '@':h in (gen, expr):map (\v -> (v, A (Pa [([p], V v)]) $ V gen)) pvars where pvars = filter (/= "_") $ patVars p def = liftA2 (\l r -> [(l, r)]) var (liftA2 defOnePat (many apat) $ gdSep "=") <|> (pat >>= \x -> opDef x <$> varSym <*> pat <*> gdSep "=" <|> leftyPat x <$> gdSep "=") coalesce = \case [] -> [] h@(s, x):t -> case t of [] -> [h] (s', x'):t' -> let f (Pa vsts) (Pa vsts') = Pa $ vsts ++ vsts' f _ _ = error "bad multidef" in if s == s' then coalesce $ (s, f x x'):t' else h:coalesce t defSemi = coalesce . concat <$> sepBy1 def (some semicolon) braceDef = concat <$> braceSep defSemi simpleType c vs = foldl TAp (TC c) (map TV vs) conop = conSym <|> backquoted conId fieldDecl = (\vs t -> map (, t) vs) <$> sepBy1 var comma <*> (res "::" *> _type) constr = (\x c y -> Constr c [("", x), ("", y)]) <$> bType <*> conop <*> bType <|> Constr <$> conId <*> ( concat <$> between lBrace rBrace (fieldDecl `sepBy` comma) <|> map ("",) <$> many aType) dclass = conId _deriving = (res "deriving" *> ((:[]) <$> dclass <|> paren (dclass `sepBy` comma))) <|> pure [] adt = addAdt <$> between (res "data") (res "=") (simpleType <$> conId <*> many varId) <*> sepBy constr (res "|") <*> _deriving impDecl = addImport <$> (res "import" *> conId) tops = braceSep $ adt <|> classDecl <|> addTopDecl <$> genDecl <|> instDecl <|> res "foreign" *> ( res "import" *> var *> (addForeignImport <$> lexeme tokStr <*> var <*> (res "::" *> _type)) <|> res "export" *> var *> (addForeignExport <$> lexeme tokStr <*> var) ) <|> addDefs <$> defSemi <|> fixity <|> impDecl export_ = ExportVar <$> varId <|> ExportCon <$> conId <*> ( paren ((:[]) <$> res ".." <|> sepBy (var <|> con) comma) <|> pure [] ) exports = Just <$> paren (export_ `sepBy` comma) <|> pure Nothing haskell = between lexemePrelude eof $ some $ mayModule tops mayModule p = res "module" *> ((,) <$> conId <*> ((,) <$> exports <* res "where" <*> p)) <|> ("Main",) . (Nothing,) <$> p parseProgram s = fmap fst $ parse haskell s
-- Export lists. -- Detect missing instances. -- Top-level type annotations. module Typer where import Base import Map import Ast import Parser import Unify -- Pattern compiler. rewritePats searcher = \case [] -> pure $ V "join#" vsxs@((as0, _):_) -> case as0 of [] -> pure $ foldr1 (A . L "join#") $ snd <$> vsxs _ -> do let k = length as0 n <- get put $ n + k let vs = take k $ (`shows` "#") <$> [n..] cs <- flip mapM vsxs \(a:at, x) -> (a,) <$> foldM (\b (p, v) -> rewriteCase searcher v Tip [(p, b)]) x (zip at $ tail vs) flip (foldr L) vs <$> rewriteCase searcher (head vs) Tip cs patEq lit b x y = A (L "join#" $ A (A (A (V "if") (A (A (V "==") lit) b)) x) $ V "join#") y rewriteCase searcher caseVar tab = \case [] -> flush $ V "join#" ((v, x):rest) -> go v x rest where rec = rewriteCase searcher caseVar go v x rest = case v of PatLit lit -> flush =<< patEq lit (V caseVar) x <$> rec Tip rest PatVar s m -> let x' = fill s (V caseVar) x in case m of Nothing -> flush =<< A (L "join#" x') <$> rec Tip rest Just v' -> go v' x' rest PatCon con args -> rec (insertWith (flip (.)) con ((args, x):) tab) rest flush onFail = case toAscList tab of [] -> pure onFail -- TODO: Check rest of `tab` lies in cs. (firstC, _):_ -> do let cs = either error id $ findCon searcher firstC jumpTable <- mapM (\(Constr s ts) -> case mlookup s tab of Nothing -> pure $ foldr L (V "join#") $ const "_" <$> ts Just f -> rewritePats searcher $ f [] ) cs pure $ A (L "join#" $ foldl A (A (V $ specialCase cs) $ V caseVar) jumpTable) onFail resolveFieldBinds searcher t = go t where go t = case t of E _ -> t V _ -> t A (E (Basic "{=")) (A rawExpr fbsAst) -> let expr = go rawExpr fromAst t = case t of A (A (V f) body) rest -> (f, go body):fromAst rest E (Basic "=}") -> [] fbs@((firstField, _):_) = fromAst fbsAst (con, fields) = findField searcher firstField cs = either error id $ findCon searcher con newValue = foldl A (V con) [maybe (V $ "[old]"++f) id $ lookup f fbs | (f, _) <- fields] initValue = foldl A expr [maybe (V "undefined") id $ lookup f fbs | (f, _) <- fields] updater = foldr L newValue $ ("[old]"++) . fst <$> fields inj x = map (\(Constr con' _) -> if con' == con then x else V "undefined") cs allPresent = all (`elem` (fst <$> fields)) $ fst <$> fbs isCon = case expr of V (h:_) -> 'A' <= h && h <= 'Z' _ -> False in if allPresent then if isCon then initValue else foldl A (A (V $ specialCase cs) expr) $ inj updater else error "bad fields in update" A x y -> A (go x) (go y) L s x -> L s $ go x fixFixity findPrec t = case t of E _ -> pure t V _ -> pure t A x y -> A <$> go x <*> go y L s b | s == "(" -> infixer findPrec =<< go b | True -> L s <$> go b Pa vsxs -> Pa <$> mapM (\(ps, a) -> (,) <$> mapM pgo ps <*> go a) vsxs where go = fixFixity findPrec pgo = patFixFixity findPrec data OpTree = OpLeaf Ast | OpNode String Ast OpTree infixer findPrec (A (A (A (V s) x) y) t) = go (OpNode s x (OpLeaf y)) t where go acc = \case A (A (V s) z) rest -> do acc' <- ins s z acc go acc' rest V ")" -> pure $ decode acc _ -> error "unreachable" ins s z t = case t of OpNode s' x y -> isStronger findPrec s s' >>= \case True -> OpNode s' x <$> ins s z y False -> pure $ OpNode s (decode t) (OpLeaf z) OpLeaf x -> pure $ OpNode s x (OpLeaf z) decode = \case OpNode f x y -> A (A (V f) x) (decode y) OpLeaf x -> x isStronger findPrec s s' = if prec <= prec' then if prec == prec' then if assoc == assoc' then case assoc of LAssoc -> pure False RAssoc -> pure True NAssoc -> Left $ "adjacent NAssoc: " ++ s ++ " vs " ++ s' else Left $ "assoc mismatch: " ++ s ++ " vs " ++ s' else pure False else pure True where (prec, assoc) = findPrec s (prec', assoc') = findPrec s' patFixFixity findPrec p = case p of PatLit _ -> pure p PatVar s m -> PatVar s <$> maybe (pure Nothing) (fmap Just . go) m PatCon "{+" args -> patFixer findPrec args PatCon con args -> PatCon con <$> mapM go args where go = patFixFixity findPrec data PopTree = PopLeaf Pat | PopNode String Pat PopTree patFixer findPrec (PatCon f [a, b]:rest) = go seed rest where seed = PopNode f a (PopLeaf b) go acc = \case [] -> pure $ decode acc PatCon s [z]:rest -> do acc' <- ins s z acc go acc' rest ins s z t = case t of PopNode s' x y -> isStronger findPrec s s' >>= \case True -> PopNode s' x <$> ins s z y False -> pure $ PopNode s (decode t) (PopLeaf z) PopLeaf x -> pure $ PopNode s x (PopLeaf z) decode = \case PopNode f x y -> PatCon f [x, decode y] PopLeaf x -> x secondM f (a, b) = (a,) <$> f b patternCompile searcher t = astLink searcher $ resolveFieldBinds searcher $ evalState (go $ either error id $ fixFixity (findPrec searcher) t) 0 where go t = case t of E _ -> pure t V _ -> pure t A x y -> liftA2 A (go x) (go y) L s x -> L s <$> go x Pa vsxs -> mapM (secondM go) vsxs >>= rewritePats searcher -- Type inference. instantiate' t n tab = case t of TC s -> ((t, n), tab) TV s -> case lookup s tab of Nothing -> let va = TV $ show n in ((va, n + 1), (s, va):tab) Just v -> ((v, n), tab) TAp x y -> let ((t1, n1), tab1) = instantiate' x n tab ((t2, n2), tab2) = instantiate' y n1 tab1 in ((TAp t1 t2, n2), tab2) instantiatePred (Pred s t) ((out, n), tab) = first (first ((:out) . Pred s)) (instantiate' t n tab) instantiate (Qual ps t) n = first (Qual ps1) $ fst $ instantiate' t n1 tab where ((ps1, n1), tab) = foldr instantiatePred (([], n), []) ps proofApply sub a = case a of Proof (Pred cl ty) -> Proof (Pred cl $ apply sub ty) A x y -> A (proofApply sub x) (proofApply sub y) L s t -> L s $ proofApply sub t _ -> a typeAstSub sub (t, a) = (apply sub t, proofApply sub a) maybeFix s x = if go x then A (E $ Link "#" "fix" $ Qual [] $ arr (arr (TV "a") (TV "a")) (TV "a")) (L s x) else x where go = \case V v -> s == v A x y -> go x || go y L v x -> s /= v && go x _ -> False nonemptyTails [] = [] nonemptyTails xs@(x:xt) = xs : nonemptyTails xt fv f bound = \case V s | not (elem s bound) && f s -> [s] A x y -> fv f bound x `union` fv f bound y L s t -> fv f (s:bound) t _ -> [] fill s a t = case t of E _ -> t V v -> if s == v then a else t A x y -> A (fill s a x) (fill s a y) L v u -> if s == v then t else L v $ fill s a u simulFill tab = go [] where go bnd t = case t of V s | not $ elem s bnd -> maybe t id $ lookup s tab A x y -> A (go bnd x) (go bnd y) L s t' -> L s $ go (s:bnd) t' _ -> t triangulate vs defs x = foldr triangle x components where tab = zip vs defs ios = foldr (\(s, t) (ins, outs) -> let dsts = fv (`elem` vs) [] t in (foldr (\dst -> insertWith union dst [s]) ins dsts, insertWith union s dsts outs)) (Tip, Tip) tab components = scc (\k -> maybe [] id $ mlookup k $ fst ios) (\k -> maybe [] id $ mlookup k $ snd ios) vs triangle names expr = let tnames = nonemptyTails names appem vs = foldl1 A $ V <$> vs suball x = simulFill (zip (init names) $ appem <$> init tnames) x redef tns x = foldr L (suball x) tns in foldr (\(x:xt) t -> A (L x t) $ maybeFix x $ redef xt $ maybe (error $ "oops: " ++ x) id $ lookup x tab) (suball expr) tnames decodeLets x = decodeVars id x where decodeVars f = \case L "in" t -> decodeBodies id vs t L v t -> decodeVars (f . (v:)) t where vs = f [] decodeBodies g [] x = ((vs, g []), x) decodeBodies g (_:t) (A x y) = decodeBodies (g . (x:)) t y infer msg typed loc ast csn@(cs, n) = case ast of E x -> Right $ case x of Basic bug -> error bug Const n -> ((TC "Int", ast), csn) ChrCon _ -> ((TC "Char", ast), csn) StrCon _ -> ((TAp (TC "[]") (TC "Char"), ast), csn) Link im s q -> insta q V s -> maybe (Left $ "undefined: " ++ s) Right $ either (\t -> ((t, ast), csn)) insta <$> lookup s loc <|> insta . fst <$> mlookup s typed A x y -> rec loc x (cs, n + 1) >>= \((tx, ax), csn1) -> rec loc y csn1 >>= \((ty, ay), (cs2, n2)) -> unifyMsg msg tx (arr ty va) cs2 >>= \cs -> Right ((va, A ax ay), (cs, n2)) L "let" lets -> do let ((vars, defs), x) = decodeLets lets rec loc (triangulate vars defs x) csn L s x -> first (\(t, a) -> (arr va t, L s a)) <$> rec ((s, Left va):loc) x (cs, n + 1) where rec = infer msg typed va = TV $ show n insta ty = ((ty1, foldl A ast (map Proof preds)), (cs, n1)) where (Qual preds ty1, n1) = instantiate ty n findInstance searcher qn@(q, n) p@(Pred cl ty) insts = case insts of [] -> case ty of TV _ -> let v = '*':show n in Right (((p, v):q, n + 1), V v) _ -> Left $ "no instance: " ++ show p (modName, Instance h name ps _):rest -> case match h ty of Nothing -> findInstance searcher qn p rest Just subs -> foldM (\(qn1, t) (Pred cl1 ty1) -> second (A t) <$> findProof searcher (Pred cl1 $ apply subs ty1) qn1) (qn, if modName == "" then V name else E $ Link modName name undefined) ps findProof searcher pred@(Pred classId t) psn@(ps, n) = case lookup pred ps of Nothing -> findInstance searcher psn pred $ findInstances searcher classId Just s -> Right (psn, V s) prove searcher psn a = case a of Proof pred -> findProof searcher pred psn A x y -> prove searcher psn x >>= \(psn1, x1) -> second (A x1) <$> prove searcher psn1 y L s t -> second (L s) <$> prove searcher psn t _ -> Right (psn, a) data Dep a = Dep ([String] -> Either String ([String], a)) instance Functor Dep where fmap f = \(Dep mf) -> Dep \g -> do (g', x) <- mf g pure (g', f x) instance Applicative Dep where pure x = Dep \g -> Right (g, x) (Dep mf) <*> (Dep mx) = Dep \g -> do (g', f) <- mf g (g'', x) <- mx g' pure (g'', f x) addDep s = Dep \deps -> Right (if s `elem` deps then deps else s : deps, ()) badDep s = Dep $ const $ Left s runDep (Dep f) = f [] unifyMsg s a b c = either (Left . (s++) . (": "++)) Right $ unify a b c app01 s x y = maybe (A (L s x) y) snd $ go x where go expr = case expr of V v -> Just $ if s == v then (True, y) else (False, expr) A l r -> do (a, l') <- go l (b, r') <- go r if a && b then Nothing else pure (a || b, A l' r') L v t -> if v == s then Just (False, expr) else second (L v) <$> go t _ -> Just (False, expr) optiApp t = case t of A x y -> let x' = optiApp x y' = optiApp y in case x' of L s v -> app01 s v y' _ -> A x' y' L s x -> L s (optiApp x) _ -> t inferno searcher decls typed defmap syms = let anno s = maybe (Left $ TV $ ' ':s) Right $ mlookup s decls loc = zip syms $ anno <$> syms principal ((acc, preds), (subs, n)) s = do expr <- maybe (Left $ "missing: " ++ s) Right (mlookup s defmap) ((t, a), (ms, n)) <- infer s typed loc expr (subs, n) case mlookup s decls of Nothing -> do soln <- unifyMsg s (TV (' ':s)) t ms Right (((s, (t, a)):acc, preds), (soln, n)) Just qAnno -> do let (Qual pAnno tAnno, n1) = instantiate qAnno n soln <- maybe (Left $ s ++ ": match failed: " ++ show qAnno ++ " vs " ++ show (apply ms t)) Right $ match (apply ms t) tAnno Right (((s, (t, a)):acc, pAnno ++ preds), (soln @@ ms, n1)) gatherPreds (acc, psn) (s, (t, a)) = do (psn, a) <- prove searcher psn $ optiApp a pure ((s, (t, a)):acc, psn) in do ((stas, preds), (soln, _)) <- foldM principal (([], []), (Tip, 0)) syms let ps = zip preds $ ("anno*"++) . show <$> [0..] (stas, (ps, _)) <- foldM gatherPreds ([], (ps, 0)) $ second (typeAstSub soln) <$> stas let preds = fst <$> ps dicts = snd <$> ps tab = map (\s -> (s, foldl A (V s) $ V <$> dicts)) syms applyDicts (s, (t, a)) = (s, (Qual preds t, foldr L (simulFill tab a) dicts)) pure $ map applyDicts stas inferDefs searcher defs decls typed = do let insertUnique m (s, (_, t)) = case mlookup s m of Nothing -> Right $ insert s t m _ -> Left $ "duplicate: " ++ s addEdges (sym, (deps, _)) (ins, outs) = (foldr (\dep es -> insertWith union dep [sym] es) ins deps, insertWith union sym deps outs) graph = foldr addEdges (Tip, Tip) defs defmap <- foldM insertUnique Tip defs let ins k = maybe [] id $ mlookup k $ fst graph outs k = maybe [] id $ mlookup k $ snd graph inferComponent typed syms = foldr (uncurry insert) typed <$> inferno searcher decls typed defmap syms foldM inferComponent typed $ scc ins outs $ keys defmap dictVars ps n = (zip ps $ map (('*':) . show) [n..], n + length ps) inferTypeclasses searcher iMap typed = foldM inferInstance typed [(classId, inst) | (classId, insts) <- toAscList iMap, inst <- insts] where inferInstance typed (classId, Instance ty name ps idefs) = let dvs = map snd $ fst $ dictVars ps 0 perMethod s = do let rawExpr = maybe (V $ "{default}" ++ s) id $ mlookup s idefs expr <- snd <$> patternCompile searcher rawExpr (ta, (sub, n)) <- either (Left . (name++) . (" "++) . (s++) . (": "++)) Right $ infer s typed [] expr (Tip, 0) qc <- typeOfMethod searcher s let (tx, ax) = typeAstSub sub ta -- e.g. qc = Eq a => a -> a -> Bool -- We instantiate: Eq a1 => a1 -> a1 -> Bool. (Qual [Pred _ headT] tc, n1) = instantiate qc n -- Mix the predicates `ps` with the type of `headT`, applying a -- substitution such as (a1, [a]) so the variable names match. -- e.g. Eq a => [a] -> [a] -> Bool Just subc = match headT ty (Qual ps2 t2, n2) = instantiate (Qual ps $ apply subc tc) n1 case match tx t2 of Nothing -> Left "class/instance type conflict" Just subx -> do ((ps3, _), tr) <- prove searcher (dictVars ps2 0) $ optiApp $ proofApply subx ax if length ps2 /= length ps3 then Left $ ("want context: "++) . (foldr (.) id $ shows . fst <$> ps3) $ name else pure tr in do ms <- mapM perMethod $ findSigs searcher classId pure $ insert name (Qual [] $ TC "DICTIONARY", flip (foldr L) dvs $ L "@" $ foldl A (V "@") ms) typed primAdts = [ (TC "()", [Constr "()" []]) , (TC "Bool", [Constr "True" [], Constr "False" []]) , (TAp (TC "[]") (TV "a"), [Constr "[]" [], Constr ":" $ map ("",) [TV "a", TAp (TC "[]") (TV "a")]]) , (TAp (TAp (TC ",") (TV "a")) (TV "b"), [Constr "," $ map ("",) [TV "a", TV "b"]]) ] prims = let ro = E . Basic dyad s = TC s `arr` (TC s `arr` TC s) wordy = foldr arr (TAp (TAp (TC ",") (TC "Word")) (TC "Word")) [TC "Word", TC "Word", TC "Word", TC "Word"] bin s = A (ro "Q") (ro s) in map (second (first $ Qual [])) $ [ ("doubleFromInt", (arr (TC "Int") (TC "Double"), A (ro "T") (ro "FLO"))) , ("intFromDouble", (arr (TC "Double") (TC "Int"), A (ro "T") (ro "OLF"))) , ("doubleFromWord", (arr (TC "Word") (TC "Double"), A (ro "T") (ro "FLW"))) , ("doubleAdd", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FADD"))) , ("doubleSub", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FSUB"))) , ("doubleMul", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FMUL"))) , ("doubleDiv", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FDIV"))) , ("doubleEq", (arr (TC "Double") (arr (TC "Double") (TC "Bool")), bin "FEQ")) , ("doubleLE", (arr (TC "Double") (arr (TC "Double") (TC "Bool")), bin "FLE")) , ("rawDouble", (arr (TC "Double") $ arr (arr (TC "Word") $ arr (TC "Word") $ TV "a") $ TV "a", A (ro "T") (ro "PAIR64"))) , ("intEq", (arr (TC "Int") (arr (TC "Int") (TC "Bool")), bin "EQ")) , ("intLE", (arr (TC "Int") (arr (TC "Int") (TC "Bool")), bin "LE")) , ("wordLE", (arr (TC "Word") (arr (TC "Word") (TC "Bool")), bin "U_LE")) , ("wordEq", (arr (TC "Word") (arr (TC "Word") (TC "Bool")), bin "EQ")) , ("charEq", (arr (TC "Char") (arr (TC "Char") (TC "Bool")), bin "EQ")) , ("charLE", (arr (TC "Char") (arr (TC "Char") (TC "Bool")), bin "LE")) , ("fix", (arr (arr (TV "a") (TV "a")) (TV "a"), ro "Y")) , ("if", (arr (TC "Bool") $ arr (TV "a") $ arr (TV "a") (TV "a"), ro "I")) , ("intFromWord", (arr (TC "Word") (TC "Int"), ro "I")) , ("wordFromInt", (arr (TC "Int") (TC "Word"), ro "I")) , ("chr", (arr (TC "Int") (TC "Char"), ro "I")) , ("ord", (arr (TC "Char") (TC "Int"), ro "I")) , ("ioBind", (arr (TAp (TC "IO") (TV "a")) (arr (arr (TV "a") (TAp (TC "IO") (TV "b"))) (TAp (TC "IO") (TV "b"))), ro "C")) , ("ioPure", (arr (TV "a") (TAp (TC "IO") (TV "a")), ro "V")) , ("primitiveError", (arr (TAp (TC "[]") (TC "Char")) (TV "a"), ro "ERR")) , ("newIORef", (arr (TV "a") (TAp (TC "IO") (TAp (TC "IORef") (TV "a"))), ro "NEWREF")) , ("readIORef", (arr (TAp (TC "IORef") (TV "a")) (TAp (TC "IO") (TV "a")), A (ro "T") (ro "READREF"))) , ("writeIORef", (arr (TAp (TC "IORef") (TV "a")) (arr (TV "a") (TAp (TC "IO") (TC "()"))), A (A (ro "R") (ro "WRITEREF")) (ro "B"))) , ("exitSuccess", (TAp (TC "IO") (TV "a"), ro "END")) , ("unsafePerformIO", (arr (TAp (TC "IO") (TV "a")) (TV "a"), A (A (ro "C") (A (ro "T") (ro "END"))) (ro "K"))) , ("fail#", (TV "a", A (V "unsafePerformIO") (V "exitSuccess"))) , ("join#", (TV "a", A (V "unsafePerformIO") (V "exitSuccess"))) , ("word64Add", (wordy, A (ro "QQ") (ro "DADD"))) , ("word64Sub", (wordy, A (ro "QQ") (ro "DSUB"))) , ("word64Mul", (wordy, A (ro "QQ") (ro "DMUL"))) , ("word64Div", (wordy, A (ro "QQ") (ro "DDIV"))) , ("word64Mod", (wordy, A (ro "QQ") (ro "DMOD"))) ] ++ map (\(s, v) -> (s, (dyad "Int", bin v))) [ ("intAdd", "ADD") , ("intSub", "SUB") , ("intMul", "MUL") , ("intDiv", "DIV") , ("intMod", "MOD") , ("intQuot", "QUOT") , ("intRem", "REM") , ("intXor", "XOR") , ("intAnd", "AND") , ("intOr", "OR") ] ++ map (\(s, v) -> (s, (dyad "Word", bin v))) [ ("wordAdd", "ADD") , ("wordSub", "SUB") , ("wordMul", "MUL") , ("wordDiv", "U_DIV") , ("wordMod", "U_MOD") , ("wordQuot", "U_DIV") , ("wordRem", "U_MOD") ] tabulateModules mods = foldM ins Tip mods where ins tab (k, (mexs, prog)) = case mlookup k tab of Nothing -> do v <- maybe pure processExports mexs (foldr ($) neatEmpty{moduleImports = ["#"]} prog) pure $ insert k v tab Just _ -> Left $ "duplicate module: " ++ k processExports exs neat = do mes <- Just . concat <$> mapM (processExport neat) exs pure neat { moduleExports = mes } processExport neat = \case ExportVar v -> case lookup v $ topDefs neat of Nothing -> Left $ "bad export " ++ v Just _ -> Right [v] ExportCon c ns -> case mlookup c $ type2Cons neat of Just cnames | ns == [".."] -> Right cnames | null delta -> Right ns | True -> Left $ "bad exports: " ++ show delta where delta = [n | n <- ns, not $ elem n cnames] Nothing -> case mlookup c $ typeclasses neat of Nothing -> Left $ "bad export " ++ c Just methodNames | ns == [".."] -> Right methodNames | null delta -> Right ns | True -> Left $ "bad exports: " ++ show delta where delta = [n | n <- ns, not $ elem n methodNames] data Searcher = Searcher { astLink :: Ast -> Either String ([String], Ast) , findPrec :: String -> (Int, Assoc) , findCon :: String -> Either String [Constr] , findField :: String -> (String, [(String, Type)]) , typeOfMethod :: String -> Either String Qual , findSigs :: String -> [String] , findInstances :: String -> [(String, Instance)] } isExportOf s neat = case moduleExports neat of Nothing -> True Just es -> elem s es findAmong fun viz s = case concat $ maybe [] (:[]) . mlookup s . fun <$> viz s of [] -> Left $ "missing: " ++ s [unique] -> Right unique _ -> Left $ "ambiguous: " ++ s slowUnionWith f x y = foldr go x $ toAscList y where go (k, v) m = insertWith f k v m searcherNew tab neat = Searcher { astLink = astLink' , findPrec = \s -> if s == ":" then (5, RAssoc) else either (const defPrec) id $ findAmong opFixity visible s , findCon = findAmong dataCons visible , findField = findField' , typeOfMethod = fmap fst . findAmong typedAsts visible , findSigs = \s -> case mlookup s mergedSigs of Nothing -> error $ "missing class: " ++ s Just [sigs] -> sigs _ -> error $ "ambiguous class: " ++ s , findInstances = maybe [] id . (`mlookup` mergedInstances) } where mergedSigs = foldr (slowUnionWith (++)) Tip $ map (fmap (:[]) . typeclasses) $ neat : map (tab !) imps mergedInstances = foldr (slowUnionWith (++)) Tip [fmap (map (im,)) $ instances x | (im, x) <- ("", neat) : map (\im -> (im, tab ! im)) imps] defPrec = (9, LAssoc) findImportSym s = concat [maybe [] (\(t, _) -> [(im, t)]) $ mlookup s $ typedAsts n | (im, n) <- importedNeats s] importedNeats s@(h:_) = [(im, n) | im <- imps, let n = tab ! im, h == '{' || isExportOf s n] visible s = neat : (snd <$> importedNeats s) classes im = typeclasses $ if im == "" then neat else tab ! im findField' f = case [(con, fields) | dc <- dataCons <$> visible f, (_, cons) <- toAscList dc, Constr con fields <- cons, (f', _) <- fields, f == f'] of [] -> error $ "no such field: " ++ f h:_ -> h imps = moduleImports neat defs = fromList $ topDefs neat astLink' ast = runDep $ go [] ast where go bound ast = case ast of V s | elem s bound -> pure ast | member s defs -> unlessAmbiguous s $ addDep s *> pure ast | member s $ typedAsts neat -> unlessAmbiguous s $ pure ast | True -> case findImportSym s of [] -> badDep $ "missing: " ++ s [(im, t)] -> pure $ E $ Link im s t _ -> badDep $ "ambiguous: " ++ s A x y -> A <$> go bound x <*> go bound y L s t -> L s <$> go (s:bound) t _ -> pure ast unlessAmbiguous s f = case findImportSym s of [] -> f _ -> badDep $ "ambiguous: " ++ s inferModule tab acc name = case mlookup name acc of Nothing -> do let neat = tab ! name imps = moduleImports neat typed = typedAsts neat acc' <- foldM (inferModule tab) acc imps let searcher = searcherNew acc' neat depdefs <- mapM (\(s, t) -> (s,) <$> patternCompile searcher t) $ topDefs neat typed <- inferDefs searcher depdefs (topDecls neat) typed typed <- inferTypeclasses searcher (instances neat) typed Right $ insert name neat { typedAsts = typed } acc' Just _ -> Right acc untangle s = do tab <- insert "#" neatPrim <$> (parseProgram s >>= tabulateModules) foldM (inferModule tab) Tip $ keys tab neatPrim = foldr (\(a, b) -> addAdt a b []) neatEmpty { typedAsts = fromList prims } primAdts
-- Export lists. module Main where import Base import Map import Ast import RTS import Typer import Kiselyov import System hide_prelude_here' = hide_prelude_here' dumpWith dumper s = case untangle s of Left err -> err Right tab -> foldr ($) [] $ map (\(name, neat) -> ("module "++) . (name++) . ('\n':) . (foldr (.) id $ dumper neat)) $ toAscList tab dumpLambs neat = map (\(s, t) -> (s++) . (" = "++) . shows t . ('\n':)) $ second snd <$> toAscList (typedAsts neat) dumpTypes neat = map (\(s, q) -> (s++) . (" :: "++) . shows q . ('\n':)) $ second fst <$> toAscList (typedAsts neat) dumpRawCombs neat = map go combs where rawCombs = optim . nolam . snd <$> typedAsts neat combs = toAscList $ rawCombs go (s, t) = (s++) . (" = "++) . shows t . (";\n"++) dumpCombs neat = map go combs where rawCombs = optim . nolam . snd <$> typedAsts neat combs = toAscList $ rewriteCombs rawCombs <$> rawCombs go (s, t) = (s++) . (" = "++) . shows t . (";\n"++) main = getArgs >>= \case "comb":_ -> interact $ dumpWith dumpCombs "rawcomb":_ -> interact $ dumpWith dumpRawCombs "lamb":_ -> interact $ dumpWith dumpLambs "type":_ -> interact $ dumpWith dumpTypes "wasm":opts -> interact \s -> either id id $ untangle s >>= compileWith "1<<22" libcWasm ("no-main":opts) "warts":opts -> interact $ either id (warts opts) . untangle _ -> interact \s -> either id id $ untangle s >>= compile
Precisely
Before proceeding, we crudely implement arbitrary-precision integers to enable some cool demos. It also exercises our code that handles typeclasses.
Unlike standard Haskell, we add a Ring
typeclass rather than Num
. The (+)
and (*)
operators should be reserved for rings, and there are uses for rings
that are not also instances of the Num
typeclass. For example, Gaussian
integers are great for indexing a 2D rectangular board, especially when we want
to rotate by a right angle (multiplication by i) or talk about the cardinal
directions (which correspond the units).
The integers are the initial
ring, so we treat the integer constant n
as fromInteger n
; if this results
in ambiguity, then we drop fromInteger
.
Thus the laws that we know to be true in our bones, such as a*(b + c) = a*b
, will never lead us astray. We must explicitly write
a*cfromIntegral
to,
say, map a Word32
to a Word64
. Other languages convert silently, and wind
up defying our algebraic intuition.
To represent an integer, we use a list of Word32
numbers, plus a boolean to
represent its sign. For GHC compatibility we call the function
integerSignList
instead of pattern matching on Integer
values.
We implement schoolbook algorithms for basic arithmetic, which is straightforward except for division. I realized that when doing long division by hand, to find the next digit of the divisor, I pick something that seems reasonable via a method that seems partly subconscious! How can we possibly code this?
Luckily, there is a simple algorithm that makes good guesses. See Knuth, The Art of Computer Programming.
We rename div
and mod
to quot
and rem
, then introduce wrappers for
div
and mod
. Now our divisions behave correctly, though it is sad that
div
and mod
need more instructions. (FORTRAN set an unfortunate precedent
of truncating division to zero, ultimately
forcing languages like C and
WebAssembly and even hardware to conform.)
Our treatment of integer literals causes a bootstrapping issue. Suppose a
literal "0" is to be converted to an Int
. Then our compiler applies the Int
edition of fromInteger
to the Integer
0, which involves a call to mpView
,
whose implementation needs the Int
0. If we simply code this as 0
, then we
wind up with a circular definition, because our compiler would insert another
fromInteger
call. We work around this with a definition that bypasses
overloading by returning ord '\0'
.
-- Assumes overloaded integer literals, lexical negation. -- Bits. module Base where infixr 9 . infixr 8 ^ infixl 7 * , /, `div` , `mod` , `quot`, `rem` infixr 6 <> infixl 6 + , - infixr 5 ++ infixl 4 <*> , <$> , <* , *> infix 4 == , /= , <= , < , >= , > infixl 3 && , <|> infixl 2 || infixl 1 >> , >>= infixr 1 =<< infixr 0 $ class Semigroup a where (<>) :: a -> a -> a class Monoid a where mempty :: a mconcat :: [a] -> a mconcat = foldr (<>) mempty instance Monoid [a] where mempty = [] instance Semigroup [a] where (<>) = (++) class Functor f where fmap :: (a -> b) -> f a -> f b class Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b class Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b (<$>) = fmap liftA2 f x y = f <$> x <*> y (>>) f g = f >>= \_ -> g (=<<) = flip (>>=) class Eq a where (==) :: a -> a -> Bool instance Eq () where () == () = True instance Eq Bool where True == True = True False == False = True _ == _ = False instance (Eq a, Eq b) => Eq (a, b) where (a1, b1) == (a2, b2) = a1 == a2 && b1 == b2 instance Eq a => Eq [a] where xs == ys = case xs of [] -> case ys of [] -> True _ -> False x:xt -> case ys of [] -> False y:yt -> x == y && xt == yt instance Eq Int where (==) = intEq instance Eq Char where (==) = charEq ($) f x = f x id x = x const x y = x flip f x y = f y x (&) x f = f x class Ord a where (<=) :: a -> a -> Bool x <= y = case compare x y of LT -> True EQ -> True GT -> False compare :: a -> a -> Ordering compare x y = if x <= y then if y <= x then EQ else LT else GT instance Ord Int where (<=) = intLE instance Ord Char where (<=) = charLE data Ordering = LT | GT | EQ deriving (Eq, Show) instance Ord a => Ord [a] where xs <= ys = case xs of [] -> True x:xt -> case ys of [] -> False y:yt -> if x <= y then if y <= x then xt <= yt else True else False compare xs ys = case xs of [] -> case ys of [] -> EQ _ -> LT x:xt -> case ys of [] -> GT y:yt -> if x <= y then if y <= x then compare xt yt else LT else GT data Maybe a = Nothing | Just a deriving (Eq, Show) data Either a b = Left a | Right b deriving (Eq, Show) fst (x, y) = x snd (x, y) = y uncurry f (x, y) = f x y first f (x, y) = (f x, y) second f (x, y) = (x, f y) bool a b c = if c then b else a not a = if a then False else True x /= y = not $ x == y (.) f g x = f (g x) (||) f g = if f then True else g (&&) f g = if f then g else False take :: Int -> [a] -> [a] take 0 xs = [] take _ [] = [] take n (h:t) = h : take (n - 1) t drop :: Int -> [a] -> [a] drop n xs | n <= 0 = xs drop _ [] = [] drop n (_:xs) = drop (n-1) xs splitAt n xs = (take n xs, drop n xs) maybe n j m = case m of Nothing -> n; Just x -> j x instance Functor Maybe where fmap f = maybe Nothing (Just . f) instance Applicative Maybe where pure = Just ; mf <*> mx = maybe Nothing (\f -> maybe Nothing (Just . f) mx) mf instance Monad Maybe where return = Just ; mf >>= mg = maybe Nothing mg mf instance Alternative Maybe where empty = Nothing ; x <|> y = maybe y Just x foldr c n = \case [] -> n; h:t -> c h $ foldr c n t length :: [a] -> Int length = foldr (\_ n -> n + 1) 0 mapM f = foldr (\a rest -> liftA2 (:) (f a) rest) (pure []) mapM_ f = foldr ((>>) . f) (pure ()) forM = flip mapM sequence = mapM id replicateM = (sequence .) . replicate foldM f z0 xs = foldr (\x k z -> f z x >>= k) pure xs z0 when x y = if x then y else pure () unless x y = if x then pure () else y error = primitiveError undefined = error "undefined" foldr1 c l@(h:t) = maybe undefined id $ foldr (\x m -> Just $ maybe x (c x) m) Nothing l foldl f a bs = foldr (\b g x -> g (f x b)) (\x -> x) bs a foldl1 f (h:t) = foldl f h t scanl f q ls = q : (case ls of [] -> [] x:xs -> scanl f (f q x) xs) scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = [] elem k xs = foldr (\x t -> x == k || t) False xs find f xs = foldr (\x t -> if f x then Just x else t) Nothing xs (++) = flip (foldr (:)) concat = foldr (++) [] map = flip (foldr . ((:) .)) [] head (h:_) = h tail (_:t) = t (!!) :: [a] -> Int -> a xs!!0 = head xs xs!!n = tail xs!!(n - 1) replicate :: Int -> a -> [a] replicate 0 _ = [] replicate n x = x : replicate (n - 1) x repeat x = x : repeat x cycle = concat . repeat null [] = True null _ = False reverse = foldl (flip (:)) [] dropWhile _ [] = [] dropWhile p xs@(x:xt) | p x = dropWhile p xt | True = xs span _ [] = ([], []) span p xs@(x:xt) | p x = first (x:) $ span p xt | True = ([],xs) break p = span (not . p) isSpace c = elem (ord c) [32, 9, 10, 11, 12, 13, 160] words s = case dropWhile isSpace s of "" -> [] s' -> w : words s'' where (w, s'') = break isSpace s' lines "" = [] lines s | (l, s') <- break (== '\n') s = l : case s' of [] -> [] _:s'' -> lines s'' instance Functor [] where fmap = map instance Applicative [] where pure = (:[]); f <*> x = concatMap (<$> x) f instance Monad [] where return = (:[]); (>>=) = flip concatMap instance Alternative [] where empty = [] ; (<|>) = (++) concatMap = (concat .) . map lookup s = foldr (\(k, v) t -> if s == k then Just v else t) Nothing filter p = foldr (\x -> bool id (x:) $ p x) [] filterM p = foldr (\x -> liftA2 (bool id (x:)) $ p x) $ pure [] union xs ys = foldr (\y acc -> (if elem y acc then id else (y:)) acc) xs ys intersect xs ys = filter (\x -> maybe False (\_ -> True) $ find (x ==) ys) xs xs \\ ys = filter (not . (`elem` ys)) xs last (x:xt) = go x xt where go x xt = case xt of [] -> x; y:yt -> go y yt init (x:xt) = case xt of [] -> []; _ -> x : init xt intercalate sep = \case [] -> []; x:xt -> x ++ concatMap (sep ++) xt intersperse sep = \case [] -> []; x:xt -> x : foldr ($) [] (((sep:) .) . (:) <$> xt) all f = and . map f any f = or . map f and = foldr (&&) True or = foldr (||) False zipWith f xs ys = case xs of [] -> []; x:xt -> case ys of [] -> []; y:yt -> f x y : zipWith f xt yt zip = zipWith (,) unzip [] = ([], []) unzip ((a, b):rest) = (a:at, b:bt) where (at, bt) = unzip rest transpose [] = [] transpose ([] : xss) = transpose xss transpose ((x:xs) : xss) = (x : [h | (h:_) <- xss]) : transpose (xs : [ t | (_:t) <- xss]) data State s a = State (s -> (a, s)) runState (State f) = f instance Functor (State s) where fmap f = \(State h) -> State (first f . h) instance Applicative (State s) where pure a = State (a,) (State f) <*> (State x) = State \s -> let (g, s') = f s in first g $ x s' instance Monad (State s) where return a = State (a,) (State h) >>= f = State $ uncurry (runState . f) . h evalState m s = fst $ runState m s get = State \s -> (s, s) put n = State \s -> ((), n) either l r e = case e of Left x -> l x; Right x -> r x instance Functor (Either a) where fmap f e = either Left (Right . f) e instance Applicative (Either a) where pure = Right ef <*> ex = case ef of Left s -> Left s Right f -> either Left (Right . f) ex instance Monad (Either a) where return = Right ex >>= f = either Left f ex class Alternative f where empty :: f a (<|>) :: f a -> f a -> f a asum = foldr (<|>) empty (*>) = liftA2 \x y -> y (<*) = liftA2 \x y -> x many p = liftA2 (:) p (many p) <|> pure [] some p = liftA2 (:) p (many p) sepBy1 p sep = liftA2 (:) p (many (sep *> p)) sepBy p sep = sepBy1 p sep <|> pure [] between x y p = x *> (p <* y) showParen b f = if b then ('(':) . f . (')':) else f iterate f x = x : iterate f (f x) takeWhile _ [] = [] takeWhile p xs@(x:xt) | p x = x : takeWhile p xt | True = [] a ^ b = case b of 0 -> 1 1 -> a _ -> case r of 0 -> h2 1 -> h2*a where (q, r) = divMod b 2 h = a^q h2 = h*h class Enum a where succ :: a -> a pred :: a -> a toEnum :: Int -> a fromEnum :: a -> Int enumFrom :: a -> [a] enumFromTo :: a -> a -> [a] enumFromThen :: a -> a -> [a] enumFromThenTo :: a -> a -> a -> [a] succ = toEnum . (+ 1) . fromEnum pred = toEnum . (- 1) . fromEnum enumFrom x = map toEnum [fromEnum x ..] enumFromThen x y = map toEnum [fromEnum x, fromEnum y ..] enumFromTo x y = map toEnum [fromEnum x .. fromEnum y] enumFromThenTo x1 x2 y = map toEnum [fromEnum x1, fromEnum x2 .. fromEnum y] instance Enum Int where succ = (+ 1) pred = (- 1) toEnum = id fromEnum = id enumFrom = iterate succ enumFromThen x y = iterate (+(y - x)) x enumFromTo lo hi = takeWhile (<= hi) $ enumFrom lo enumFromThenTo x y lim = takeWhile ((if y < x then (<=) else (>=)) lim) $ enumFromThen x y instance Enum Bool where toEnum 0 = False toEnum 1 = True fromEnum False = 0 fromEnum True = 1 instance Enum Char where toEnum = chr fromEnum = ord instance Enum Word where toEnum = wordFromInt fromEnum = intFromWord instance Enum Rational where toEnum = (% 1) . toEnum fromEnum (a :% b) = fromEnum $ a `div` b succ = (+ 1) pred = (- 1) instance Enum Double where toEnum = doubleFromInt fromEnum = intFromDouble succ = (+ 1) pred = (- 1) fromIntegral = fromInteger . toInteger class Ring a where (+) :: a -> a -> a (-) :: a -> a -> a (*) :: a -> a -> a fromInteger :: Integer -> a negate :: a -> a negate = (0 -) class Field a where recip :: a -> a recip = (1 /) (/) :: a -> a -> a a / b = a * recip b fromRational :: Rational -> a fromRational (a:%b) = fromInteger a / fromInteger b class Integral a where div :: a -> a -> a mod :: a -> a -> a quot :: a -> a -> a rem :: a -> a -> a toInteger :: a -> Integer divMod :: a -> a -> (a, a) divMod a b = (q, a - b*q) where q = div a b quotRem :: a -> a -> (a, a) quotRem a b = (q, a - b*q) where q = quot a b instance Ring Int where (+) = intAdd (-) = intSub (*) = intMul fromInteger = intFromWord . fromInteger instance Integral Int where div = intDiv mod = intMod quot = intQuot rem = intRem toInteger x | 0 <= x = Integer True $ if x == 0 then [] else [wordFromInt x] | True = Integer False [wordFromInt -x] zeroWord = wordFromInt $ ord '\0' instance Ring Word where (+) = wordAdd (-) = wordSub (*) = wordMul fromInteger (Integer xsgn xs) = (if xsgn then id else wordSub zeroWord) case xs of [] -> zeroWord (x:_) -> x instance Integral Word where div = wordDiv mod = wordMod quot = wordQuot rem = wordRem toInteger x = Integer True $ if x == 0 then [] else [x] instance Eq Word where (==) = wordEq instance Ord Word where (<=) = wordLE data Word64 = Word64 Word Word deriving Eq instance Ring Word64 where Word64 a b + Word64 c d = uncurry Word64 $ word64Add a b c d Word64 a b - Word64 c d = uncurry Word64 $ word64Sub a b c d Word64 a b * Word64 c d = uncurry Word64 $ word64Mul a b c d fromInteger (Integer xsgn xs) = if xsgn then Word64 x y else uncurry Word64 $ word64Sub 0 0 x y where (x, xt) = mpView xs (y, _) = mpView xt instance Ord Word64 where Word64 a b <= Word64 c d | b == d = a <= c | True = b <= d instance Integral Word64 where div (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Div a b c d mod (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Mod a b c d quot (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Div a b c d rem (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Mod a b c d toInteger (Word64 a b) = Integer True [a, b] -- Multiprecision arithmetic. data Integer = Integer Bool [Word] deriving Eq instance Ring Integer where Integer xsgn xs + Integer ysgn ys | xsgn == ysgn = Integer xsgn $ mpAdd xs ys | True = case mpCompare xs ys of LT -> mpCanon ysgn $ mpSub ys xs EQ -> Integer True [] _ -> mpCanon xsgn $ mpSub xs ys Integer xsgn xs - Integer ysgn ys | xsgn /= ysgn = Integer xsgn $ mpAdd xs ys | True = case mpCompare xs ys of LT -> mpCanon (not ysgn) $ mpSub ys xs EQ -> Integer True [] _ -> mpCanon xsgn $ mpSub xs ys Integer xsgn xs * Integer ysgn ys | null xs || null ys = Integer True [] | True = Integer (xsgn == ysgn) $ mpMul xs ys fromInteger = id instance Integral Integer where div (Integer xsgn xs) (Integer ysgn ys) = if xsgn == ysgn then Integer True qs else case rs of [] -> mpCanon0 False qs _ -> mpCanon0 False $ mpAdd qs [1] where (qs, rs) = mpDivMod xs ys mod (Integer xsgn xs) (Integer ysgn ys) = if xsgn == ysgn then mpCanon0 xsgn rs else mpCanon ysgn $ mpSub ys rs where rs = snd $ mpDivMod xs ys quot (Integer xsgn xs) (Integer ysgn ys) = mpCanon0 (xsgn == ysgn) $ fst $ mpDivMod xs ys rem (Integer xsgn xs) (Integer ysgn ys) = mpCanon0 xsgn $ snd $ mpDivMod xs ys toInteger = id instance Ord Integer where compare (Integer xsgn xs) (Integer ysgn ys) | xsgn = if ysgn then mpCompare xs ys else GT | True = if ysgn then LT else mpCompare ys xs instance Enum Integer where succ = (+ Integer True [1]) pred = (+ Integer False [1]) toEnum = toInteger fromEnum = fromInteger enumFrom = iterate succ enumFromTo lo hi = takeWhile (<= hi) $ enumFrom lo mpView [] = (0, []) mpView (x:xt) = (x, xt) mpCanon sgn xs = mpCanon0 sgn $ reverse $ dropWhile (0 ==) $ reverse xs mpCanon0 sgn xs = case xs of [] -> Integer True [] _ -> Integer sgn xs mpCompare [] [] = EQ mpCompare [] _ = LT mpCompare _ [] = GT mpCompare (x:xt) (y:yt) = case mpCompare xt yt of EQ -> compare x y o -> o mpAdc [] [] c = ([], c) mpAdc xs ys c = first (lo:) $ mpAdc xt yt hi where (x, xt) = mpView xs (y, yt) = mpView ys (lo,hi) = uncurry (word64Add c 0) $ word64Add x 0 y 0 mpAdd xs ys | c == 0 = zs | True = zs ++ [c] where (zs, c) = mpAdc xs ys 0 mpSub xs ys = fst $ mpSbb xs ys 0 mpSbb xs ys b = go xs ys b where go [] [] b = ([], b) go xs ys b = first (lo:) $ go xt yt $ 1 - hi where (x, xt) = mpView xs (y, yt) = mpView ys (lo,hi) = uncurry word64Sub (word64Sub x 1 y 0) b 0 mpMulWord _ [] c = if c == 0 then [] else [c] mpMulWord x (y:yt) c = lo:mpMulWord x yt hi where (lo, hi) = uncurry (word64Add c 0) $ word64Mul x 0 y 0 mpMul [] _ = [] mpMul (x:xt) ys = case mpMulWord x ys 0 of [] -> [] z:zs -> z:mpAdd zs (mpMul xt ys) mpDivModWord xs y = first (reverse . dropWhile (0 ==)) $ go 0 $ reverse xs where go r [] = ([], r) go n (x:xt) = first (q:) $ go r xt where q = fst $ word64Div x n y 0 r = x - q*y -- Only lower bits matter. mpDivMod xs ys = first (reverse . dropWhile (== 0)) $ go us where s = mpDivScale $ last ys us = mpMulWord s (xs ++ [0]) 0 vs = mpMulWord s ys 0 (v1:vt) = reverse vs vlen = length vs go us | ulen <= vlen = ([], fst $ mpDivModWord us s) | True = first (q:) $ go $ lsbs ++ init ds where ulen = length us (u0:u1:ut) = reverse us (lsbs, msbs) = splitAt (ulen - vlen - 1) us (ql, qh) = word64Div u1 u0 v1 0 q0 = if 1 <= qh then -1 else ql (q, ds) = foldr const undefined [(q, ds) | q <- iterate (- 1) q0, let (ds, bor) = mpSbb msbs (mpMulWord q vs 0) 0, bor == 0] mpDivScale n | n1 == 0 = 1 | otherwise = fst $ word64Div 0 1 n1 0 where n1 = succ n mpBase _ [] = ('0':) mpBase b xs = go xs where go [] = id go xs = go q . shows r where (q, r) = mpDivModWord xs b instance Show Integer where showsPrec _ (Integer xsgn xs) = (if xsgn then id else ('-':)) . mpBase 10 xs instance (Ord a, Ord b) => Ord (a, b) where (a1, b1) <= (a2, b2) = a1 <= a2 && (not (a2 <= a1) || b1 <= b2) a < b = a <= b && a /= b a > b = b <= a && a /= b (>=) = flip(<=) instance Applicative IO where pure = ioPure ; (<*>) f x = ioBind f \g -> ioBind x \y -> ioPure (g y) instance Monad IO where return = ioPure ; (>>=) = ioBind instance Functor IO where fmap f x = ioPure f <*> x class Show a where showsPrec :: Int -> a -> String -> String showsPrec _ x = (show x++) show :: a -> String show x = shows x "" showList :: [a] -> String -> String showList = showList__ shows shows = showsPrec 0 showList__ _ [] s = "[]" ++ s showList__ showx (x:xs) s = '[' : showx x (showl xs) where showl [] = ']' : s showl (y:ys) = ',' : showx y (showl ys) showInt__ n | 0 == n = id | True = showInt__ (n`div`10) . (chr (48+n`mod`10):) instance Show () where show () = "()" instance Show Bool where show True = "True" show False = "False" instance Show a => Show [a] where showsPrec _ = showList instance Show Int where showsPrec _ n | 0 == n = ('0':) | 1 <= n = showInt__ n | 2 * n == 0 = ("-2147483648"++) | True = ('-':) . showInt__ (0 - n) showWord_ n | 0 == n = id | True = showWord_ (n`div`10) . (chr (48+(intFromWord $ n`mod`10)):) instance Show Word where showsPrec _ n | 0 == n = ('0':) | True = showWord_ n instance Show Word64 where showsPrec p (Word64 x y) = showsPrec p $ Integer True [x, y] showLitChar__ '\n' = ("\\n"++) showLitChar__ '\\' = ("\\\\"++) showLitChar__ c | n < 32 || n > 127 = ('\\':) . protectDecEsc (shows n) | otherwise = (c:) where n = ord c protectDecEsc f s | (c:_) <- s, '0' <= c, c <= '9' = f $ ("\\&"++) s | otherwise = f s instance Show Char where showsPrec _ '\'' = ("'\\''"++) showsPrec _ c = ('\'':) . showLitChar__ c . ('\'':) showList s = ('"':) . foldr (.) id (map go s) . ('"':) where go '"' = ("\\\""++) go c = showLitChar__ c instance (Show a, Show b) => Show (a, b) where showsPrec _ (a, b) = showParen True $ shows a . (',':) . shows b integerSignList (Integer xsgn xs) f = f xsgn xs unwords [] = "" unwords ws = foldr1 (\w s -> w ++ ' ':s) ws unlines = concatMap (++"\n") abs x = if 0 <= x then x else -x signum x | 0 == x = 0 | 0 <= x = 1 | otherwise = -1 otherwise = True sum = foldr (+) 0 product = foldr (*) 1 max a b = if a <= b then b else a min a b = if a <= b then a else b gcd x y = gcd' (abs x) (abs y) where gcd' a 0 = a gcd' a b = gcd' b (a `rem` b) lcm = \cases _ 0 -> 0 0 _ -> 0 x y -> abs ((x `quot` (gcd x y)) * y) readNatural = foldl (\n d -> toInteger 10*n + toInteger (ord d - ord '0')) (toInteger 0) readInteger ('-':t) = -(readNatural t) readInteger s = readNatural s infixl 7 % data Rational = Integer :% Integer deriving Eq numerator (p :% _) = p denominator (_ :% q) = q x % y = reduce_ (x * signum y) (abs y) reduce_ x y = (x `quot` d) :% (y `quot` d) where d = gcd x y instance Ord Rational where (a :% b) <= (c :% d) = a*d <= b*c instance Ring Rational where (a :% b) + (c :% d) = reduce_ (a*d + b*c) (b*d) (a :% b) - (c :% d) = reduce_ (a*d - b*c) (b*d) (a :% b) * (c :% d) = reduce_ (a*c) (b*d) fromInteger n = n :% 1 instance Show Rational where showsPrec _ (a :% b) = shows a . (" % "++) . shows b instance Field Rational where recip (x :% y) = (y * signum x) :% abs x instance Ring Double where (+) = doubleAdd (-) = doubleSub (*) = doubleMul fromInteger n = integerSignList n \sgn ws -> case reverse ws of [] -> doubleFromInt 0 x:xt -> (if sgn then id else negate) let dx = doubleFromWord x sh = foldr1 (*) $ replicate 32 $ doubleFromInt 2 in case xt of [] -> dx y:yt -> foldr (const (sh*)) (dx*sh + doubleFromWord y) yt instance Eq Double where (==) = doubleEq instance Ord Double where (<=) = doubleLE instance Show Double where showsPrec _ d = case compare d 0 of EQ -> ('0':) LT -> ('-':) . shows -d GT -> go where tens = iterate (10*) 1 tenths = iterate (0.1*) 1 (as, bs) = if d >= 1 then span (<= d) tens else span (>= d) tenths norm = if d >= 1 then d / last as else d / head bs dig = intFromDouble norm go = shows dig . ('.':) . (tail (show $ 1000000 + intFromDouble (1000000 * (norm - doubleFromInt dig)))++) . showsE (length as) showsE e = if e == 1 && d >= 1 then id else ('e':) . shows (if d >= 1 then e - 1 else 0 - e) instance Field Double where (/) = doubleDiv floor = intFromDouble . doubleFloor ceiling x = 0 - floor (0 - x) class Bits a where xor :: a -> a -> a (.&.) :: a -> a -> a (.|.) :: a -> a -> a shiftR :: a -> Int -> a rotateR :: a -> Int -> a rotateL :: a -> Int -> a complement :: a -> a complement x = -1 - x instance Bits Int where xor = intXor (.&.) = intAnd (.|.) = intOr shiftR n i = intShr n $ fromIntegral i rotateR n i = intOr (intShr n u) (intShl n $ 32 - u) where u = fromIntegral i rotateL n i = intOr (intShr n $ 32 - u) (intShl n u) where u = fromIntegral i instance Bits Word where xor = wordXor (.&.) = wordAnd (.|.) = wordOr shiftR n i = wordShr n $ fromIntegral i rotateR n i = wordOr (wordShr n u) (wordShl n $ 32 - u) where u = fromIntegral i rotateL n i = wordOr (wordShr n $ 32 - u) (wordShl n u) where u = fromIntegral i instance Bits Word64 where xor (Word64 a b) (Word64 c d) = Word64 (wordXor a c) (wordXor b d) (Word64 a b) .&. (Word64 c d) = Word64 (wordAnd a c) (wordAnd b d) (Word64 a b) .|. (Word64 c d) = Word64 (wordOr a c) (wordOr b d) shiftR (Word64 a b) i | u >= 32 = Word64 (wordShr b $ u - 32) 0 | otherwise = Word64 (wordShr a u + wordShl b (32 - u)) (wordShr b u) where u = fromIntegral i rotateR (Word64 a b) i | u >= 32 = small b a $ u - 32 | otherwise = small a b u where u = fromIntegral i small a b u = Word64 (wordShr a u + wordShl b (32 - u)) (wordShr b u + wordShl a (32 - u)) rotateL (Word64 a b) i = let n = wordFromInt i in uncurry Word64 (word64Shl a b n 0) .|. uncurry Word64 (word64Shr a b (64 - n) 0)
Wasm to Haskell?
It’d be nice to write wasm ourselves, but for expedience, we rely on Clang to compile our runtime system to a wasm binary which we manipulate. WebAssembly turns out to be pleasantly malleable. A binary breaks up into independent sections, and we can add, delete, or modify sections before stitching them together again.
We write a tool that just does enough wasm parsing to print the sections of a
wasm file we want in the form of a Haskell module, and run this on the output
of crossly warts
to create WartsBytes.hs
.
module Main where import Base import System data Charser a = Charser { getCharser :: String -> Either String (a, String) } instance Functor Charser where fmap f (Charser x) = Charser $ fmap (first f) . x instance Applicative Charser where pure a = Charser $ \s -> Right (a, s) f <*> x = Charser \inp -> do (fun, t) <- getCharser f inp (arg, u) <- getCharser x t pure (fun arg, u) instance Monad Charser where Charser f >>= g = Charser $ (good =<<) . f where good (r, t) = getCharser (g r) t return = pure bad :: String -> Charser a bad = Charser . const . Left headerAndVersion :: String headerAndVersion = "\0asm\x1\0\0\0" eof :: Charser Bool eof = Charser \s -> Right (null s, s) next :: Charser Int next = Charser \case [] -> Left "unexpected EOF" h:t -> Right (ord h, t) sat f = Charser \case h:t | f h -> Right (h, t) _ -> Left "unsat" remainder :: Charser String remainder = Charser \s -> Right (s, "") varuint7 = next varuint32 = varuint varuint :: Charser Int varuint = unleb 1 0 -- varuint = fromIntegral <$> unleb 1 0 -- unleb :: Integer -> Integer -> Charser Integer unleb m acc = do -- d <- fromIntegral <$> next d <- next if d > 127 then unleb (m * 128) $ (d - 128) * m + acc else pure $ d*m + acc sections = eof >>= \b -> if b then pure [] else do n <- varuint7 s <- vec (chr <$> next) ((n, s):) <$> sections wasm = do s <- replicateM 8 (chr <$> next) if s /= headerAndVersion then bad "bad header or version" else sections hexDigit n | n < 10 = chr $ n + ord '0' | True = chr $ n - 10 + ord 'a' xxd = \case "" -> "" h:t -> let n = ord h in hexDigit (div n 16) : hexDigit (mod n 16) : xxd t replicateM = (mapM id .) . replicate vec f = varuint >>= (`replicateM` f) search00type xs = do fts <- maybe (Left "missing section 1") Right $ lookup 1 xs ios <- fst <$> getCharser go fts maybe (Left "missing (0, 0) functype") Right $ lookup (0, 0) $ zip ios [0..] where go = vec $ do sat (== '\x60') inCount <- varuint replicateM inCount next outCount <- varuint replicateM outCount next pure (inCount, outCount) searchExport needle xs = do exs <- maybe (Left "missing section 7") Right $ lookup 7 xs maybe (Left "not found") Right =<< asum . fst <$> getCharser go exs where go = vec $ do s <- vec $ chr <$> next next n <- varuint pure $ if s == needle then Just n else Nothing allFunCount xs = do impCount <- maybe (Right 0) countImps $ lookup 2 xs funCount <- maybe (Right 0) countFuns $ lookup 3 xs pure $ impCount + funCount where countImps imps = length . fst <$> getCharser goImps imps goImps = vec $ do vec next vec next sat (== '\0') varuint pure () countFuns funs = fst <$> getCharser varuint funs main = do s <- getContents case getCharser wasm s of Left e -> putStrLn $ "parse error: " ++ e Right (xs, []) -> do putStr "module WartsBytes where\nimport Base\nwartsBytes = " print $ second xxd <$> filter (not . (`elem` [0, 6]) . fst) xs putStrLn $ either error (("allFunCount = "++) . show) $ allFunCount xs putStrLn $ either error (("funType00Idx = "++) . show) $ search00type xs putStrLn $ either error (("reduceFunIdx = "++) . show) $ searchExport "reduce" xs _ -> error "unreachable"
The RTS includes generate code that wraps foreign imports. It can only be used with programs that declare the same foreign imports.
Webby
We build a compiler that goes directly from Haskell to wasm. The code does little more than dumping the runtime system wasm binary along with bytes describing the initial contents of the heap.
For now we look for the main
function in the Main
module and export it as
the go
function; export declarations are ignored.
-- In-browser compiler. module Main where import Base import Ast import Map import Typer import RTS import System import WartsBytes data StrLen = StrLen { _str :: String -> String, _len :: Int } instance Monoid StrLen where mempty = StrLen id 0 instance Semigroup StrLen where (StrLen s1 n1) <> (StrLen s2 n2) = StrLen (s1 . s2) (n1 + n2) hexValue d | d <= '9' = ord d - ord '0' | d <= 'F' = 10 + ord d - ord 'A' | d <= 'f' = 10 + ord d - ord 'a' unxxd s = StrLen (go s) (length s `div` 2) where go s = case s of [] -> id (d1:d0:rest) -> ((chr $ hexValue d1 * 16 + hexValue d0):) . go rest main = do putStr "DEBUG\n" interact $ either id id . toWasm toWasm s = do objList <- toAscList <$> objectify s let ffis = foldr (\(k, v) m -> insertWith (error $ "duplicate import: " ++ k) k v m) Tip $ concatMap (toAscList . ffiImports . _neat . snd) objList ffiMap = fromList $ zip (keys ffis) [0..] offobjs = snd $ foldr (\(name, obj) (n, m) -> (n + length (_mem obj), insert name (n, obj) m)) (0, Tip) objList (ffeMap, memFun) = foldr (layout offobjs ffiMap) (Tip, id) objList ffes = toAscList $ fst <$> ffeMap mem = memFun [] go (n, x) -- Function section: for each export, declare a function of type () -> ().. | n == 3 = leb n <> extendSection x (replicate (length ffes) $ unxxd "01") -- Export section: add each export. | n == 7 = leb n <> extendSection x (zipWith encodeExport (fst <$> ffes) [allFunCount..]) -- Code section: for nth export, define a function calling rts_reduce([512 + 4*n]) | n == 10 = leb n <> extendSection x (callRoot <$> [0..length ffes - 1]) | True = leb n <> leb (_len s) <> s where s = unxxd x roots = encodeData rootBase $ littleEndian $ (snd <$> ffes) ++ [0] prog = encodeData heapBase $ littleEndian $ length mem : mem pure $ _str (unxxd "0061736d01000000" <> mconcat (map go wartsBytes) -- Data section: -- 512 : null-terminated roots array -- 1048576 - 4: hp -- 1048576: initial heap contents <> leb 11 <> extendSection "00" [roots, prog]) "" extendSection x xs = encodeSection (k + length xs) $ unxxd s <> mconcat xs where (k, s) = splitLeb x splitLeb x = go x 1 0 where go (d1:d0:t) m acc | n < 128 = (acc + n*m, t) | True = go t (m*128) $ acc + (n - 128)*m where n = hexValue d1 * 16 + hexValue d0 encodeExport s n = encodeString s <> unxxd "00" <> leb n encodeString s = let n = length s in leb n <> StrLen (s++) n littleEndian ns = StrLen (foldr (.) id $ go 4 <$> ns) $ 4 * length ns where go k n | k == 0 = id | True = (chr (n `mod` 256):) . go (k - 1) (n `div` 256) rootBase = unxxd "004180040b" -- sleb 512 = 8004 heapBase = unxxd "0041fcff3f0b" -- sleb (1048576 - 4) = fcff3f -- 0 locals. -- i32.const 0; i32.load 512 + 4*n; call $REDUCE; end; callRoot n = leb (_len s) <> s where s = unxxd "0041002802" <> slebPos (512 + 4*n) <> unxxd "10" <> leb reduceFunIdx <> unxxd "0b" encodeData addr s = addr <> leb (_len s) <> s encodeSection k s = let n = leb k in leb (_len n + _len s) <> n <> s leb n | n <= 127 = StrLen (chr n:) 1 | True = StrLen (chr (128 + n `mod` 128):) 1 <> leb (n `div` 128) slebPos n | n <= 63 = StrLen (chr n:) 1 | True = StrLen (chr (128 + n `mod` 128):) 1 <> slebPos (n `div` 128)
webby.wasm
To build a browser-based compiler, we essentially run the previous compiler on itself, thus producing a wasm binary that can translate Haskell directly into wasm.
One change is needed: we swap System.hs
for SystemWasm.hs
. The
Linux version declares foreign imports for those in our runtime system for
Linux. The wasm version declares foreign imports for functions provided by
the host environment.
-- For wasm environments providing Haskell-like getchar, putchar, eof. module System where import Base foreign import ccall "putchar" putChar :: Char -> IO () foreign import ccall "getchar" getChar :: IO Char foreign import ccall "eof" isEOFInt :: IO Int isEOF = (0 /=) <$> isEOFInt putStr = mapM_ putChar putStrLn = (>> putChar '\n') . putStr print = putStrLn . show getContents = isEOF >>= \b -> if b then pure [] else getChar >>= \c -> (c:) <$> getContents interact f = getContents >>= putStr . f
Messy
By now, we’re painfully aware of the effects of rash decisions made while writing our earlier compilers. Compilation time has grown rapidly. Parts of the code are difficult to extend.
I’m cleaning up the precisely
compiler first, and intend to backport the
changes later, so the journey will look smoother. Until then, the evolution
of our compiler will appear jumpy.
Examples:
-
Rewriting let expressions as lambda terms should happen after type-checking, so that type annotations are easier to handle.
-
Some library-like functions like
overFree
should be scrapped because they’re only used once. -
I renamed
beta
tofill
as it really fills holes in a context, that is, it substitutes without caring about variable capture. We get away with this because we unshadow variables during type-checking except for join point variables.
Introducing features in a different order also smooths the journey. For
example, I originally tried hash consing just before the virtually
compiler,
which slowed compilation and whose code changed markedly as syntax features
were added. I pushed it to a far later compiler for faster bootstrapping and
more stable code. Another example is the Show
typeclass, which I originally
added in a surprisingly recent compiler.
Built-in primitives started in their own #
module, then were replaced by code
that pre-defined them for every module. But I think I was right the first time,
and switched it back. Hopefully there are no traces of my misadventure left.
At the same time, I’m introducing more chaos as I forge ahead with new experiments, especially those involving the web. The above only chronicles the start of adventures in the browser. A fuller account:
-
The
crossly wasm
command modifies the output C so Clang can compile it to wasm: it adds a minimal bumpmalloc
implementation, and uses attributes to declare exports. -
The
crossly warts
command also modifies the output C so Clang can compile to wasm, though it avoidsmalloc
for some reason. It only generates code for FFIs and the RTS, and assumes a tool likewebby
will supply the initial heap contents. -
The
Imp
compiler is likewebby
but with an extra hack to request the source of dependent import modules. -
The
reply
family of REPLs introduced more complications. To avoid compilingBase
every time, we define a CBOR-based object file format that contains the bytecode for each definition, along with symbols, types, typeclasses, fixities, type aliases, and so on. -
For the REPL, we add a
scratchpad
buffer where the interpreter writes new bytecode. For definitions, we write the number of new symbols and callvmgcroot()
to add roots so they survive GC. There’s also a tagging mechanism, where the least significant bit indicates the rest of the word is the index of a previous definition. We do all this because GC could occur while compiling new definitions; thescratchpad
is untouched by GC. -
Related are
introspect.c
, which lets us examine the heap from the REPL, and a reduction function that is aware of "gas": a feature that allows us to run a program for a limited number of steps, and resume execution later if it has yet to finish.
Designing the object file format leads us to take a closer look at the many phases of compilation:
-
During parsing, we compile
data
declarations, generating ASTs for their Scott-encoding and lookup tables for field names. We also genreate instances forderiving
clauses. -
During parsing, we generate selector code for each typeclass declaration.
-
During parsing, we compile each FFI import to an
F
combinator, adding a lambda abstraction for each argument. -
Immediately after parsing, we find all exported symbols. The sole purpose of the
type2Cons
field is to find all exported constructors in a declaration likeFoo(..)
. -
We apply type aliases to canonicalize all type names.
-
We topologically sort the modules, giving us an order which we can compile them. Recall our compilers expect the input to be the concatenation of all source modules, except for
imp.wasm
, which has a way of requesting more modules. -
We rewrite expressions according to fixity declarations, and rewrite pattern matches as lambda terms.
-
At last, we compile top-level definitions to combinators, reconciling them with any type annotations. Life was simpler with our early compilers, where this was the only task we performed.
-
It is only now we can compile typeclass instance definitions, as they may depend on top-level definitions. (In the previous step, there are dependencies going the other way, but we know the purported types of all instances, which is the only information needed.)
Our data structures freely mix information collected during parsing with information generated by compilation.