WebAssembly

Click below to compile an expression to WebAssembly and run it.




wasm:

Available operators:

abs neg ceil floor trunc nearest sqrt + - * / min max copysign

The wasm binary format

We walk through a small standalone wasm binary.

We open with the 4-byte magic string "\0asm", then the 4-byte version, which is 1 in little-endian:

00 61 73 6d 01 00 00 00

Apart from this version number, WebAssembly encodes integers with LEB128, using the signed variant when appropriate. We shall refer to the encoded numbers as varuints and varints.

The numbers in our example are small enough that a varint or a varuint can be thought of as a byte (holding a number between 0 and 127, or -64 and 63).

A number of sections follow the first 8 bytes. Each section begins with a varuint for the section ID, followed by a varuint for the length of the section.

Section 1 declares the types of function signatures.

01  ; Section 1
08  ; ...is 8-bytes long

We’ll export a function e for JavaScript to call, and import a function f from the import object i through which it returns a 32-bit integer.

02  ; Declare 2 function types.

The type signature of the imported function is:

60  ; Function.
01  ; One input.
7f  ; 32-bit integer (i32).
00  ; No outputs.

and the type signature of the exported function is:

60  ; Function.
00  ; No inputs.
00  ; No outputs.

Thus the entire type section is:

01 08  ; Type section follows. It's 8 bytes long.
02     ; Two type signatures follow.
60 01 7f 00 60 00 00

Section 2 declares imports. We declare an import object i containing one function f mentioned above. A string is encoded with a varint holding its length followed by the string contents.

02 07  ; 7-byte import section.
01     ; One import.
01 69  ; The string "i".
01 66  ; The string "f".
00     ; Function.
00     ; Index of signature in section 1.

Section 3 declares signatures of the functions defined in section 10.

03 02  ; 2-byte function section.
01     ; One signature.
01     ; Index of signature in section 1.

Section 7 declares exports:

07 05  ; 5-byte export section.
01     ; One signature.
01 65  ; The string "e".
00     ; Function.
01     ; Function index.

Functions are numbered from 0, so function index 1 refers to the second function. The first function mentioned was the function in section 2, and the second function mentioned is the function in section 3. In general,

if there are \(m\) functions in section 2 and \(n\) functions in section 3, then the function index \(k\) will lie in \([m..n - 1]\) and refer to the \((k - m)\)th function of section 3.

Section 10 holds the code. Here, we define the body of our exported function, which calls the imported function with the constant 42 (0x2a).

0a 08  ; 8-byte code section.
01     ; One function body.
06     ; Length of function body.
00     ; No local variables.
41 2a  ; Encoding of "i32.const 42".
10 00  ; Call function 0.
0b     ; End of code.

We can put this altogether in an HTML snippet that uses WebAssembly to show "42" in an alert:

<script type="text/javascript">
WebAssembly.instantiate(new Uint8Array([
  0,97,115,109,1,0,0,0,1,8,2,96,1,127,0,96,0,0,2,7,1,1,105,1,
  102,0,0,3,2,1,1,7,5,1,1,101,0,1,10,8,1,6,0,65,42,16,0,11]),
  {i:{f:x => alert(x)}}).then(x => x.instance.exports.e());
</script>

Parser

We modify a grammar from another demo so it accepts Haskell-style expressions: application associates to the left and functions are curried. For example, min 4 3, the minimum of 4 and 3, is interpreted as (min 4) 3, and 1 + 2 is translated internally to ((+) 1) 2.

var    ::= ('a'|..|'z'|'A'|..|'Z')+
num    ::= ('0'|..|'9'|'.')+
factor ::= ['-'] ( var | num | '(' expr ')' )+
term   ::= factor ( ('*'|'/') factor )*
expr   ::= term ( ('+'|'-') term )*
line   ::= expr EOF

We parse an expression into simplified version of the data structure we used to hold lambda calculus terms. A leaf node either holds a double constant, or a string representing a primitive function. All internal nodes represent function application. Lambda abstraction is absent.

{-# LANGUAGE CPP #-}
{-# LANGUAGE OverloadedStrings #-}
#ifdef __HASTE__
import Haste.DOM
import Haste.Events
import Haste.Foreign
import Data.List.Split
import Numeric
import System.IO.Unsafe
#else
import Data.ReinterpretCast
#endif
import Data.Char
import Text.Parsec

data Expr = D Double | V String | App Expr Expr
type Parser = Parsec String ()

line :: Parser Expr
line = spaces >> expr <* eof where
  eat :: Parser a -> Parser a
  eat p = p <* spaces
  var  = eat $ V <$> many1 letter
  num  = eat $ D . read <$> many1 (digit <|> char '.')
  tok  = eat . string
  una  = option id $ const (App (V "neg")) <$> tok "-"
  fac  = una <*> (foldl1 App <$> many1
    (var <|> num <|> between (tok "(") (tok ")") expr))
  term = fac  `chainl1` (bin "*" <|> bin "/")
  expr = term `chainl1` (bin "+" <|> bin "-")
  bin s = const (\a b -> App (App (V s) a) b) <$> tok s

Compiler

Our compiler generates the above example with two changes:

  1. Our import function expects a double-precision floating point input (f64) instead of a 32-bit integer.

  2. Our function body consists of a translation of the given expression instead of the constant 42.

Rather than simply dump a wall of bytes, we write some helper routines to output them so the structure of a wasm binary is apparent.

toAsm e = concat [
  [0, 0x61, 0x73, 0x6d, 1, 0, 0, 0],  -- Magic string, version.
  -- Type section.
  sect 1 [encSig ["f64"] [], encSig [] []],
  -- Import section.
  -- [0, 0] = external_kind Function, index 0.
  sect 2 [encStr "i" ++ encStr "f" ++ [0, 0]],
  -- Function section.
  -- [1] = Type index.
  sect 3 [[1]],
  -- Export section.
  -- [0, 1] = external_kind Function, index 1.
  sect 7 [encStr "e" ++ [0, 1]],
  -- Code section.
  -- 0 = local variable count.
  -- [0x10, 0, 0xb] = call function 0, end of code.
  sect 10 [lenc $ 0 : compile [] e ++ [0x10, 0, 0xb]]]

lenc xs = length xs : xs

sect t xs = t : lenc (length xs : concat xs)

encStr s = lenc $ ord <$> s

encType "i32" = 0x7f
encType "f64" = 0x7c

encSig ins outs = 0x60  -- Function type.
  : lenc (encType <$> ins) ++ lenc (encType <$> outs)

No type checking is performed, though this is easy to add. Thus the only non-trivial task is compiling the expression to assembly.

Wasm is stack-based, so for each operator:

  1. We recursively compile the subtrees for each of its operands (from left to right), so when executed, the operands will be on top of the stack.

  2. We output the opcode corresponding to the operator.

We place the opcodes for all the f64 operators in an associative list, which suffices for our simple demo.

In wasm, the 8 bytes of a double are encoded in little-endian. Normally, we can take care of this in Haskell with, say, the Data.ReinterpretCast package but Haste lacks support for the relevant Haskell primitives. For the Haste version, we use JavaScript to get at the bytes representing a double:

<script type="text/javascript">
function fromDouble(d) {return new Uint8Array(new Float64Array([d]).buffer);}
</script>

We wrap it in unsafePerformIO so we can use it as a pure function.

ops = zip (words $ "abs neg ceil floor trunc nearest sqrt "
  ++ "+ - * / min max copysign") [0x99..]

compile m e = case e of
  App a b -> compile (b:m) a
#ifdef __HASTE__
  D d -> 0x44 : map fromIntegral (doubleToBytes d) where
    doubleToBytes :: Double -> [Int]
    doubleToBytes = unsafePerformIO . ffi "fromDouble"
#else
  D d -> 0x44 : map fromIntegral
    [doubleToWord d `div` (256^i) `mod` 256 | i <- [0..7]]
#endif
  V s -> case lookup s ops of
    Nothing -> error $ "bad op: " ++ s
    Just b -> concatMap (compile []) m ++ [b]

User Interface

For the web version, we compile a given expression, dump the assembly in a textarea element, and run the assembly which calls a function that shows the result on the page:

<script type="text/javascript">
function runWasmInts(a) {
  WebAssembly.instantiate(new Uint8Array(a),
    {i:{f:x => Haste.setResult(x)}}).then(x => x.instance.exports.e());
}
</script>

For the command-line version of the program, we compile a predefined program.

#ifdef __HASTE__
dump s = unlines $ unwords . map xxShow <$> chunksOf 8 s where
  xxShow c = reverse $ take 2 $ reverse $ '0' : showHex c ""

main = withElems ["input", "output", "asm", "evalB"] $
    \[iEl, oEl, aEl, evalB] -> do

  let
    setResult :: Double -> IO ()
    setResult d = setProp oEl "innerHTML" $ " = " ++ show d

  export "setResult" setResult
  evalB `onEvent` Click $ const $ do
    setProp oEl "innerHTML" ""
    ee <- parse line "" <$> getProp iEl "value"
    case ee of
      Left m -> do
        setProp aEl "value" ""
        setProp oEl "value" $ "parse error: " ++ show m
      Right e -> do
        let asm = toAsm e
        setProp aEl "value" $ dump asm
        ffi "runWasmInts" asm :: IO ()
#else
main = print $ toAsm e where Right e = parse line "" "min (sqrt 8) 2"
#endif

Ben Lynn blynn@cs.stanford.edu 💡