Get a Life

Zoom: Steps: 2^

Presets:

When I was a kid, I wrote a BASIC program to explore Conway’s Game of Life on my 8088. It was painfully slow, so I began an assembly rewrite. But I failed at Life: I abandoned the project in frustration because I kept freezing up my system.

It was just as well. I believe the only optimizations I had in mind were bit twiddling and focusing on or around live cells. So I was astounded when I later learned of Bill Gosper’s Hashlife algorithm, which miraculously simulates an exponential number of generations in linear time. All the bit twiddling in the world couldn’t come close.

I eagerly studied any explanations I could find. Unfortunately, although Hashlife’s ingredients are elementary in isolation (quadtrees; memoization; interning), some aspects of the mixture confused me. I hoped one day to figure it all out and run Life in the fast lane.

I have at last fulfilled this lifelong ambition. Perhaps I acted out of anger, as now and then I’ll see a post about coding the Game of Life, only to discover the naive approach lurking behind it. Seriously?! Who cares about the elegant source or whatever if the algorithm is mediocre? [I exaggerate; in truth, I enjoy seeing the Game of Life in one line of APL, with comonads, in the Game of Life itself, and so on.]

At some point it’s easier to quit complaining and fight back. Here, then, is yet another write-up on programming the Game of Life, but one of the minority featuring Hashlife. Batteries included: we walk through the entire source for the demo above.

{-# LANGUAGE CPP #-}
#ifdef __HASTE__
{-# LANGUAGE PackageImports #-}
#endif
{-# LANGUAGE LambdaCase, TupleSections, RecordWildCards #-}
{-# LANGUAGE FlexibleContexts #-}
#ifdef __HASTE__
import "mtl" Control.Monad.State.Strict
import Haste.DOM
import Haste.Events
import Haste.Graphics.Canvas
import Data.IORef
import Text.Read (readMaybe)
#else
import Control.Monad.State.Strict
#endif
import Data.List (find)
import Data.Map (Map, (!))
import qualified Data.Map as M

A matter of life and death

Life takes places on an infinite grid of cells, each of which is dead or alive. A cell’s neighbours are the eight cells surrounding it (Moore neighbourhood). Then:

A dead cell with exactly 3 live neighbours comes to life.
A live cell with exactly 2 or 3 live neighbours remains alive.
Otherwise the cell will be dead.

We represent a dead cell with 0 and a live cell with 1. The following function summarizes the rules:

nextLife :: Int -> Int -> Int
nextLife 0 3 = 1
nextLife 1 2 = 1
nextLife 1 3 = 1
nextLife _ _ = 0

Curiously, this is the same as saying a cell is alive exactly when the sum or product of these two numbers is 3.

Quad workout

Quadtrees are fundamental to Hashlife. In our rendition, at the lowest level, we represent a single cell with a bit.

At the level above, we organize the cells in 2x2 blocks. We represent the 4 bits of a block with an Int. The following functions convert between an Int and the 4 cells in a 2x2 block:

data ZNode = ZNode Int Int Int Int deriving (Show, Eq, Ord)

pack4 :: Int -> Int -> Int -> Int -> Int
pack4 a b c d = a + 2*b + 4*c + 8*d

unpack4 :: Int -> ZNode
unpack4 n = ZNode (f 1) (f 2) (f 4) (f 8) where f k = div n k `mod` 2

We record the 4 cells of a 2x2 block in Z-order. If we imagine a compass, then the order is NW, NE, SW, SE, the same order a pen travels when we write the letter Z. We use screen coordinates.

zorder :: [(Int, Int)]
zorder = [(0,0), (1,0), (0,1), (1,1)]

At the next level above, we organize the 2x2 blocks themselves in 2x2 superblocks. These are represented as a ZNode; each number represents a 2x2 block of cells in Z-order.

Consider a 4x4 block of labelled cells, where each cell’s state is represented with a bit:

a0 a1 b0 b1
a2 a3 b2 b3
c0 c1 d0 d1
c2 c3 d2 d3

We represent the above with ZNode a b c d where a is the Int with binary representation a3 a2 a1 a0, and similarly for the other fields.

Given a 4x4 block of cells, the following returns the 2x2 central block of cells of the next generation. The rules imply no cell outside the 4x4 grid may affect this calculation. They also imply this 2x2 central block is the only part of the next generation we may reliably compute.

base :: Int -> Int -> Int -> Int -> Int
base a b
     c d = pack4 nw ne
                 sw se
  where
  ZNode a0 a1
        a2 a3 = unpack4 a
  ZNode b0 b1
        b2 b3 = unpack4 b
  ZNode c0 c1
        c2 c3 = unpack4 c
  ZNode d0 d1
        d2 d3 = unpack4 d
  nw = nextLife a3 $ sum
    [ a0, a1, b0
    , a2,     b2
    , c0, c1, d0
    ]
  ne = nextLife b2 $ sum
        [ a1, b0, b1
        , a3,     b3
        , c1, d0, d1
    ]
  sw = nextLife c1 $ sum
    [ a2, a3, b2
    , c0,     d0
    , c2, c3, d2
    ]
  se = nextLife d0 $ sum
        [ a3, b2, b3
        , c1,     d1
        , c3, d2, d3
        ]

The building blocks of Life

We label each a ZNode with an Int. Then we can represent an 8x8 block of cells as a ZNode, whose fields are now labels of 4x4 cells. We can go further: by induction, for any n, we can represent any 2ⁿ x 2ⁿ block of cells with a ZNode, which we interpret as a 2x2 block of 2^n-1 x 2^n-1 cells.

We sometimes call this n the level. It is not stored in the ZNode itself, so functions taking ZNodes often need another argument that contains the level.

We only store one copy of a ZNode (think hash consing; string interning; maximal sharing; the flyweight pattern). Life patterns are often repetitive, so a handful of ZNodes sometimes suffice for representing many cells.

This is how Hashlife got its name, though we shall use an ordered tree instead of hashing to map ZNodes to Ints.

We maintain Data.Map of ZNodes to Ints. On encountering a ZNode, we first check this map: if already present, then we reuse its corresponding Int. Otherwise we assign it an unused Int and add it to the map. The Ints start from 16 to avoid colliding with numbers that represent 2x2 blocks of cells.

We also maintain an inverse map from an Int to the ZNode it represents.

In addition, each ZNode is paired with an Int that represents its future center; see below.

Empty cells are a special case. The Int 0 represents empty squares of any level. This comes in handy when we want to embed a given pattern in a large empty space.

data Mem = Mem (Map Int (ZNode, Int)) (Map ZNode Int) deriving Show

initMem :: Mem
initMem = Mem mempty mempty

intern :: ZNode -> Int -> State Mem Int
intern z future = do
  Mem m idxs <- get
  let next = M.size idxs + 16
  put $ Mem (M.insert next (z, future) m) $ M.insert z next idxs
  pure next

recall :: Int -> State Mem (ZNode, Int)
recall 0 = pure (ZNode 0 0 0 0, 0)
recall k = (\(Mem m _) -> m!k) <$> get

Double Life

Suppose we are given 8x8 cells. As this happens to be the size of a chessboard, let’s label the cells the same way a chess player would:

Our base function takes 4x4 cells and returns the central 2x2 cells one generation into the future. On the top-left quadrant:

it computes the following 2x2 cells, where the prime symbol indicates one generation into the future:

Similarly, on the 4x4 cells:

the base function returns:

Thus pasting together the result of 9 calls to base yields:

Similarly, pasting together the result of 4 more calls to base yields:

where a double prime indicates 2 generations into the future.

To recap, given 8x8 cells, repeated calls to base reveals the central 4x4 cells 2 generations into the future.

We can scale this up for higher powers of 2. For example, given 256x256 cells, a bunch of recursions gives us the central 128x128 cells exactly 64 generations into the future.

We have the perfect conditions for memoization. Thus we record the future central cells the first time we compute them, and look them up every time thereafter. This is why we associate each ZNode with an Int; the latter references the future central cells.

We split off a function reduce3x3 that applies a function f to each 2x2 box in a 3x3 grid, then applies g to their results.

gosper :: Int -> Int -> Int -> Int -> Int -> State Mem Int
gosper 0 a b c d = pure $ base a b c d
gosper n a b
         c d = do
  (ZNode _  a1
         a2 a3,       x0) <- recall a
  (ZNode       b0 _
               b2 b3, x2) <- recall b
  (ZNode c0 c1
         _  c3,       x6) <- recall c
  (ZNode       d0 d1
               d2 _,  x8) <- recall d
  x1 <- rec    a1 b0
               a3 b2
  x3 <- rec a2 a3
            c0 c1
  x4 <- rec    a3 b2
               c1 d0
  x5 <- rec       b2 b3
                  d0 d1
  x7 <- rec    c1 d0
               c3 d2
  reduce3x3 rec (memo n) x0 x1 x2
                         x3 x4 x5
                         x6 x7 x8
  where
  rec a b c d = memo n a b c d >>= fmap snd . recall

reduce3x3 f g
  x0 x1 x2
  x3 x4 x5
  x6 x7 x8 = do
  y0 <- f x0 x1
          x3 x4
  y1 <- f    x1 x2
             x4 x5
  y2 <- f x3 x4
          x6 x7
  y3 <- f    x4 x5
             x7 x8
  g y0 y1
    y2 y3

memo :: Int -> Int -> Int -> Int -> Int -> State Mem Int
memo _ 0 0 0 0 = pure 0
memo 0 a b c d = pure $ pack4 a b c d
memo n a b c d = seek >>= maybe create pure
  where
  z = ZNode a b c d
  seek = (\(Mem _ idxs) -> M.lookup z idxs) <$> get
  create = intern z =<< gosper (n - 1) a b c d

Such is Life

We’re done! We’ve presented a complete implementation of Hashlife. However, we need a bit more code to show it off.

We define a bookkeeping data type to hold the current pattern’s size (it measures 2^lifeSize+1 x 2^lifeSize+1 cells), the coordinates of the top-left corner, and its ZNode index.

data Life = Life
  { lifeSize :: Int
  , lifeOrigin :: (Int, Int)
  , lifeIndex :: Int
  , lifeMemory :: Mem
  } deriving Show

A straightforward recursion turns a list of points into a quadtree:

enliven :: [(Int, Int)] -> Life
enliven [] = Life 0 (0, 0) 0 initMem
enliven ps = uncurry (Life sz (xmin, ymin))
  $ runState (enc sz (xmin, ymin)) initMem where
  (xs, ys) = unzip ps
  xmin = minimum xs
  ymin = minimum ys
  xmax = maximum xs
  ymax = maximum ys
  loggish n = max 0 $ head (filter (\k -> 2^k >= n) [0..]) - 1
  sz = loggish $ max (ymax - ymin) (xmax - xmin) + 1
  enc _ (ox, oy) | ox > xmax || oy > ymax = pure 0
  enc n (ox, oy) = mapM go zorder >>= (\[a,b,c,d] -> memo n a b c d) where
    go (dx, dy)
      | n == 0    = pure $ fromEnum $ elem (ox + dx, oy + dy) ps
      | otherwise = enc (n - 1) (ox + dx*p, oy + dy*p)
    p = 2^n

We’d like an inverse to this function, but we first deal with an annoyance. For quadtree nodes at the lowest level, it’s irritating that we break a number into 4 bits rather than look it up in a map. We add a function to hide this:

visit :: Int -> State Mem ZNode
visit k | k < 16    = pure $ unpack4 k
        | otherwise = fst <$> recall k

Listing the coordinates of the live cells of a given pattern is now another straightforward recursion:

points :: Life -> [(Int, Int)]
points Life{..} = evalState (coords lifeSize lifeOrigin lifeIndex) lifeMemory
  where
  coords :: Int -> (Int, Int) -> Int -> State Mem [(Int, Int)]
  coords _ _ 0 = pure []
  coords n (x, y) k = do
    ZNode a b c d <- visit k
    concat <$> zipWithM go [a,b,c,d] zorder
    where
    go p (dx, dy)
      | n == 0 = pure $ if p == 0 then [] else [(x+dx, y+dy)]
      | otherwise = coords (n - 1) (x + e*dx, y + e*dy) p
    e = 2^n

We perform a quick sanity check with GHCi. We define a pattern:

rPentomino :: [(Int, Int)]
rPentomino = [(1,0), (2,0), (0,1), (1,1), (1,2)]

Then confirm:

points $ enliven rPentomino

returns the same points.

Let’s predict the future! The following function takes a 2ⁿ⁺¹ x 2ⁿ⁺¹ pattern and returns its 2ⁿ x 2ⁿ center, 2^n-1 generations into the future. We just pluck out the memoized future center index and repackage it:

scry :: Life -> Life
scry Life{..} = Life n (ox + p, oy + p) x lifeMemory where
  (ox, oy) = lifeOrigin
  n = lifeSize - 1
  p = 2^n
  x = snd $ evalState (recall lifeIndex) lifeMemory

We find:

points $ scry $ enliven rPentomino

returns the single point [(1,2)]. The r-pentomino fits in a 4x4 square, so scry returns the middle 2x2 square 1 generation into the future. The result is correct, but underwhelming.

To travel further through time and space, we repeatedly pad our pattern with empty squares before calling scry. Here, we find it handy that 0 represents an empty square of any size. For the sake of code reuse, to double the dimensions of a pattern, we surround it with empty squares to make a 3x3 grid, then reduce with middle, a function which takes a 4x4 grid and discards all but the central 2x2 grid.

pad :: Life -> Life
pad Life{..} = Life
  { lifeSize = n
  , lifeOrigin = (ox - p, oy - p)
  , lifeIndex = i'
  , lifeMemory = st
  } where
  (ox, oy) = lifeOrigin
  p = 2^lifeSize
  n = lifeSize + 1
  i = lifeIndex
  (i', st) = runState (reduce3x3 (middle n) (memo n)
    0 0 0
    0 i 0
    0 0 0) lifeMemory

middle :: Int -> Int -> Int -> Int -> Int -> State Mem Int
middle n a b c d = do
  ZNode _ _ _ a3 <- visit a
  ZNode _ _ b2 _ <- visit b
  ZNode _ c1 _ _ <- visit c
  ZNode d0 _ _ _ <- visit d
  memo (n - 1) a3 b2 c1 d0

Rather than return a list of points, let’s draw the central 64x64 cells:

#ifndef __HASTE__
main :: IO ()
main = putStr $ unlines $
  [[if elem (c, r) ps then '#' else ' ' | c <- [-32..32]] | r <- [-32..32]]
  where
  ps = points $ scry $ iterate pad (enliven rPentomino) !! 10
#endif

What should we see? Well, we started with a 4x4 pattern, then padded it 10 times, yielding a 2¹² x 2¹² pattern. Hashlife should return the central 2¹¹ x 2¹¹ cells 2¹⁰ = 1024 generations into the future, and we display the central 64x64 cells:

                                                         ##
                                                        #  #
                                                         # #
                                                          #
                           #
                          # #   ##
                           ##   ##


                                                                #
                                                                #
                                                                #

                                                          #
                                                          #
                                                         ###


                                                         ###
                                                          #
                         #                                #
                        # #
                        # #
                         #      #                            #
           ##                  # #                          ###
           ##                  # #                         #  ##
                                #                          ## #
                                                            ##
   ##                                                       #
   ##
                                                         ####
                                  #                     # ## #
                                  #                         #  #
                                  #                         #  #
                                                      #  # #   #
                                                     ## #   ##
                                                      #     ##
                                                      ## # #
                                                      #    #

                                                       ###








                                         ##

This took under a quarter of a second on my laptop. Evidently we take less than 250 microseconds per generation.

If we bump up the 10 to 20, we find the running time hardly changes, so we’re somehow beating 250 nanoseconds a generation. Bump it up some more, and it seems we compute each generation in less than a CPU clock cycle! Of course, in reality, the pattern has stabilized and we’re just recycling memoized answers.

Baby steps

Hashlife is great for exploring the distant future, but what if we want to take it slow and watch the pattern change one generation at a time?

No problem. Instead of traveling some power of 2 generations into the future in each reduction step, we simply travel 0 generations, except for the base case which remains the same as before, namely, it computes one generation ahead.

More generally, to step forward 2^k generations at a time for any k, we travel 0 generations as we reduce until we reach the desired level, then switch to the original Hashlife algorithm.

We start with a more general version of gosper that is like the big brother of reduce3x3: a function that slides a 2x2 window over a 4x4 grid and applies a function f to each, followed by applying g to the 3x3 results.

We could rewrite gosper to use this function, but that would result in more lookups, and we prefer to have the code above to stand on its own.

reduce4x4 f g a b
              c d = do
  ZNode a0 a1 a2 a3 <- visit a
  ZNode b0 b1 b2 b3 <- visit b
  ZNode c0 c1 c2 c3 <- visit c
  ZNode d0 d1 d2 d3 <- visit d
  x0 <- f a0 a1
          a2 a3
  x1 <- f    a1 b0
             a3 b2
  x2 <- f       b0 b1
                b2 b3
  x3 <- f a2 a3
          c0 c1
  x4 <- f    a3 b2
             c1 d0
  x5 <- f       b2 b3
                d0 d1
  x6 <- f c0 c1
          c2 c3
  x7 <- f    c1 d0
             c3 d2
  x8 <- f       d0 d1
                d2 d3
  g x0 x1 x2
    x3 x4 x5
    x6 x7 x8

We begin by recursively reducing with middle, as this function extracts the center cells with no time travel. Upon reaching the level k we look up the future center.

baby :: Int -> Life -> Life
baby k Life{..} = Life
  { lifeSize = sz
  , lifeOrigin = (ox + p, oy + p)
  , lifeIndex = i'
  , lifeMemory = st
  } where
  (ox, oy) = lifeOrigin
  sz = lifeSize - 1
  p = 2^sz
  go _ 0 0 0 0 = pure 0
  go n a b c d
    | n <= k = memo (n + 1) a b c d >>= fmap snd . recall
    | otherwise = reduce4x4 (middle n)
      (reduce3x3 (go (n - 1)) (memo n)) a b c d
  (i', st) = runState (visit lifeIndex
    >>= \(ZNode a b c d) -> go sz a b c d) lifeMemory

There may be live cells on the edge of a pattern, so before we step, we pad it with enough empty cells to guarantee no live cells are lost after reduction. At the least, this requires quadrupling both dimensions.

Afterwards, the extra room may prove unnecessary. In an interactive setting, we repeatedly display the pattern after advancing some number of generations, so to avoid runaway padding, we shrink the pattern when possible. We look for a 2x2 box in a 4x4 grid such that all cells outside the box are dead. If one exists, then we discard all but the box and recurse.

shrink :: Life -> Life
shrink Life{..} = uncurry ($) $
  runState (go lifeSize lifeOrigin lifeIndex) lifeMemory
  where
  f a b c d = pure $ ZNode a b c d
  zsum (ZNode a b c d) = a + b + c + d
  go 0 d k = pure $ Life 0 d k
  go n (dx, dy) k = do
    ZNode a b c d <- visit k
    reduce4x4 f g a b c d
    where
    g x0 x1 x2 x3 x4 x5 x6 x7 x8 = let
      tot = sum $ zsum <$> [x0, x2, x6, x8]
      xs = [x0,x1,x2,x3,x4,x5,x6,x7,x8]
      xds = zip xs [0..]
      in case find ((tot ==) . zsum . fst) xds of
        Just (ZNode a b c d, i) -> let
          (y, x) = divMod i 3
          in go (n-1) (dx + x*2^(n-1), dy + y*2^(n-1))
            =<< memo (n-1) a b c d
        Nothing -> pure $ Life n (dx, dy) k

run :: Int -> Life -> Life
run k lf@Life{..} = shrink $ baby k $ iterate pad lf !! n where
  n = max 2 $ k + 1 - lifeSize

Our display is limited to a viewport, so we refine our points function to prune cells outside a given rectangle. We also use difference lists to avoid costly concatenation.

-- | Assumes x0 y0 even, x1 y1 odd, x0 < x1, y0 < y1.
crop :: (Int, Int) -> (Int, Int) -> Life -> [(Int, Int)]
crop (x0, y0) (x1, y1) Life{..} = evalState (go lifeSize lifeOrigin lifeIndex) lifeMemory []
  where
  go _ _ 0 = pure id
  go n (x, y) k
    | x > x1 || y > y1 || x + 2*e <= x0 || y + 2*e <= y0 = pure id
    | otherwise = do
      ZNode a b c d <- visit k
      foldr (.) id <$> zipWithM f [a,b,c,d] zorder
    where
    f p (dx, dy)
      | n == 0 = pure $ if p == 0 then id else ((x+dx, y+dy):)
      | otherwise = go (n - 1) (x + e*dx, y + e*dy) p
    e = 2^n

Lastly, we throw together a makeshift UI for the interactive demo at the top of this page:

#ifdef __HASTE__
main :: IO ()
main = withElems
  ["canvas", "goB", "level", "customB", "customT", "zoomDown", "zoomUp"]
  $ \[canvasE, goB, lvl, customB, customT, zoomUp, zoomDown] -> do
  Just canvas <- fromElem canvasE
  lf <- newIORef $ enliven rPentomino
  zoomRef <- newIORef 4
  let
    snapshot = do
      zoom <- readIORef zoomRef
      let
        w = div 320 (zoom * 2)
        h = div 240 (zoom * 2)
        cell (x', y') = renderOnTop canvas $ fill
          $ rect (x, y) (x + z, y + z)
          where
          x = fromIntegral $ zoom*(x' + w)
          y = fromIntegral $ zoom*(y' + h)
          z = fromIntegral zoom
      render canvas $ setFillColor (RGB 0 0 0)
      mapM_ cell . crop (-w,-h) (w-1,h-1) =<< readIORef lf
    setup but pat = do
      Just b <- elemById but
      void $ b `onEvent` Click $ const $ do
        writeIORef lf $ enliven pat
        snapshot
  snapshot
  setup "acorn" [(1,0),(3,1),(0,2),(1,2),(4,2),(5,2),(6,2)]
  setup "rpent" rPentomino
  setup "pihept" [(0,0),(1,0),(2,0),(0,1),(0,2),(2,1),(2,2)]
  setup "rabbits" [(0,0),(4,0),(5,0),(6,0),(0,1),(1,1),(2,1),(5,1),(1,2)]
  void $ customB `onEvent` Click $ const $ do
    ps <- maybe [] id . readMaybe <$> getProp customT "value"
    writeIORef lf $ enliven ps
    snapshot
  void $ goB `onEvent` Click $ const $ do
    n <- maybe 0 id . readMaybe <$> getProp lvl "value"
    setProp lvl "value" $ show n
    modifyIORef lf $ run n
    snapshot
  void $ zoomUp `onEvent` Click $ const $ do
    modifyIORef zoomRef $ max 1 . (`div` 2)
    snapshot
  void $ zoomDown `onEvent` Click $ const $ do
    modifyIORef zoomRef $ min 16 . (*2)
    snapshot
#endif

A better Life

We could replace our persistent tree-based maps with mutable hash maps in the ST monad. I avoided this to cause less trouble when compiling with Haste, which I use to build the JavaScript version.

It may be cleaner to split off the cached future center into another map. And for faster baby steps, perhaps we could memoize multiple center blocks at different points in the future and only compute futures lazily.

It turns out we can remove the pack4 and unpack4 mess (and use our generic ZNode scheme to hold 4 bits) at the cost of some performance; about 10% on the r-pentomino. The code would be marginally cleaner, but the base case is still special, and our write-up would need to mention interning earlier.

On the other hand, to speed up our program, we could double down on bit twiddling and make 8x8 the lowest level. We could place entries of a 4x4 or 3x3 matrix in an array instead of making them function arguments and writing out everything explicitly. See Gil Dogon’s annotated C implementation of Hashlife.

Ben Lynn blynn@cs.stanford.edu 💡