Lovely as a Tree

⚂

a . (b . c . d) . (e . f)

Of the many ways to draw binary trees, computer scientists often choose level-based layout, that is, nodes have the same y-coordinate if and only if they belong to the same level. By convention, deeper levels appear below shallower levels, and levels are evenly spaced.

The Reingold-Tilford algorithm efficiently produces some of the prettiest pictures of this style:

Left nodes appear to the left of right nodes.
Parent nodes are centered above their children.
No edges cross.
Drawing the reverse of a tree results in its mirror image.
A subtree appears the same no matter where it occurs on the tree.
The width of the drawing is minimal.

These slides show the Reingold-Tilford algorithm in action, which helped me understand it.

I found a summary of level-based algorithms for drawing trees, which is nice apart from a few errors of varying severity. It seems "Figure 7" should be "Figure 8" in some places; "right" should be "left" in one sentence; "n/3" should be "n/4"; "Principle 6" is misleading because the algorithm of Buchheim et al. merely opportunistically spreads out the siblings between two others at a certain point in the algorithm without moving any other siblings. (Insisting all siblings be evenly spaced results in wider drawings.)

See also the chapter on drawing general trees from the Handbook of Graph Drawing and Visualization.

Preliminaries

Let’s walk through an implementation of the Reingold-Tilford algorithm.

We swipe some definitions from GHC:

data Tree a = Node { rootLabel :: a, subForest :: [Tree a] }
instance Functor Tree where
  fmap f (Node a xs) = Node (f a) (fmap f <$> xs)
levels :: Tree a -> [[a]]
levels t = map (map rootLabel) $ takeWhile (not . null) $
  iterate (concatMap subForest) [t]
minimum = foldr1 min
maximum = foldr1 max

We fetch our Charser parser combinator libary:

jsEval "curl_module('../compiler/Charser.ob')"

and write a parser for a simple language describing trees. The syntax mimics notation for composing functions. The right-associative operator "." connects subtrees, and any alphanumeric string is a leaf.

import Charser
data Expr = Fun String | Com Expr Expr deriving Show
expr :: Charser Expr
expr = space *> com <* eof where
  com = atm `chainr1` (const Com <$> ch '.')
  atm = fun <|> ch '(' *> com <* ch ')'
  fun = Fun <$> some alphaNumChar <* space
  ch c = char c <* space

chainr1 :: Charser a -> Charser (a -> a -> a) -> Charser a
chainr1 p infixOp = (&) <$> p <*>
  (flip <$> infixOp <*> chainr1 p infixOp <|> pure id)

parse expr "" "a . b . c . d"
parse expr "" "a . (b . c) . d"

The naive approach

The idea behind the algorithm is simple and elegant. We build the drawing bottom-up. The y-coordinate of a node is implicit, as it is determined by the level, so we only need to pick good x-coordinates.

Base case: place a leaf node at x = 0.

Inductive step: given two well-drawn subtrees, we trace their contours. The right contour of a tree is the sequence of the x-coordinates of the rightmost node in each level, and the left contour is the same for the leftmost nodes.

We compare the right contour of the left layout against the left contour of the right layout, and indent one of them so the minimum distance between the contours is 1 unit. To ensure the drawing’s leftmost point has x = 0, we indent the right subtree when they are too close, and the left subtree when they are too far apart.

Lastly, the parent is given the x-coordinate halfway between its children.

simpleDraw :: Expr -> Tree (Double, String)
simpleDraw = \case
  Fun s   -> Node (0, s) []
  Com l r -> Node (m, "") xs where
    [ll, rr]       = simpleDraw <$> [l, r]
    [lCont, rCont] = [fst . head <$> levels rr, fst . last <$> levels ll]
    d              = 1 - minimum (zipWith (-) lCont rCont)
    xs | d >= 0    = [                  ll, first (+ d) <$> rr]
       | otherwise = [first (+(-d)) <$> ll,                 rr]
    m              = (sum $ fst . rootLabel <$> xs) / 2

putStr $ unlines $ map show $
  levels $ either undefined simpleDraw $ parse expr "" "a . (b . c) . d"

Though concise, the running time of the above code is quadratic in the number of nodes because:

Many nodes might be indented many times.
Tracing a contour requires traversing the whole tree.

Thanks to lazy evaluation and zipWith, we only completely traverse the shorter of the two subtrees. This behaviour becomes important when we improve our algorithm.

From quadratic to linear

To fix the first issue, we record indent values in subtree roots. A final top-down pass recursively applies the indents cumulatively to compute the x-coordinates of each node. In essence, we are using:

fmap (x +) . fmap (y +) = fmap ((x + y) +)

(GHC has fusion optimizations that do this automatically for simpler cases.)

To fix the second issue, we maintain auxiliary edges on the tree so we can follow contours without traversing its entirety.

Instead of Data.Tree we define a custom RT tree data structure. The shift field records the indent amount for the first optimization, and the link field may hold an auxiliary edge for the second optimization. Because of our first optimization, we need to record an indent modifier with each link. We parameterize so the node data structure can hold data of any given type.

data RT a = RT { xpos :: Int
               , shift :: Int
               , hold :: a
               , link :: Maybe (Int, RT a)
               , kids :: [RT a]
               }

The following applies all indents of an RT tree to produce a Data.Tree representing the final drawing:

addx :: Int -> RT a -> Tree (Int, a)
addx i rt = Node (xpos rt + shift rt + i, hold rt) $
  addx (i + shift rt) <$> kids rt

As for the link field, observe we can mostly figure out the right contour by starting at the root and recursively following the last child. Trouble arises when we reach a leaf but there are still more levels to go.

To solve this problem, for rightmost leaf nodes except those in the last level, we set link to point at the rightmost node of the next level. When first placing a node, it either lies on the deepest level or is guaranteed to be an internal node, so link is initially Nothing.

When combining 2 subtrees, the weave function traverses the rightmost children and links of both subtrees until it bottoms out on at least one of them. Then if needed, it follows at most one more edge or link on the other subtree to create a new link. It also stores the difference between the total indent of the link destination and that of the link source so we can update the indent value accordingly when following links.

Left contours are similarly handled. In fact, to avoid code duplication, our contour function takes an f argument that should be head for left contours and last for right contours, and our weave function calls a helper function with id or reverse depending on whether it’s acting on the left or the right side of the subtrees.

We could optimize further. For example, we know only one link on the shallower subtree needs adjustment. However, the asymptotic time complexity is unaffected.

Typical implementations modify data in place, but as our code is pure, we create new RT nodes instead.

contour :: ([RT a] -> RT a) -> (Int, RT a) -> [Int]
contour f (acc, rt) = h : case kids rt of
  [] -> maybe [] (contour f . first (+ acc')) (link rt)
  ks -> contour f (acc', f ks)
  where acc' = acc + shift rt
        h    = acc'+ xpos rt

weave :: RT a -> RT a -> [RT a]
weave l r = [weave' id (0, l) (0, r), weave' reverse (0, r) (0, l)]

weave' :: ([RT a] -> [RT a]) -> (Int, RT a) -> (Int, RT a) -> RT a
weave' f (accL, l) (accR, r)
  | Nothing      <- follow = l
  | Just (n, x)  <- link l = l { link = Just (n, weave' f (n + accL', x) y) }
  | (k:ks)   <- f $ kids l = l { kids = f $ weave' f (accL', k) y : ks }
  | otherwise              = l { link = first (+(-accL')) <$> follow }
  where
    accL' = accL + shift l
    accR' = accR + shift r
    follow | (k:_) <- f $ kids r = Just (accR', k)
           | otherwise           = first (accR' +) <$> link r
    Just y = follow

This time, we want a configurable minimum gap between siblings, as well as integral x-coordinates. So if necessary, we bump up the indent value so the average of the x values of the sibling roots is a whole number.

We also change our API to take any Tree a instead of our Expr data structure.

padding :: Int  -- Minimum horizontal gap between nodes.
padding = 50

placeRT :: Tree a -> RT a
placeRT (Node a [])     = RT 0 0 a Nothing []
placeRT (Node a [l, r]) = RT m 0 a Nothing xs where
  [ll, rr] = placeRT <$> [l, r]
  g = padding - minimum (zipWith (-)
    (contour head (0, rr)) (contour last (0, ll)))
  s = xpos ll + xpos rr
  gap = abs g + mod (abs g + s) 2  -- Adjust so midpoint is whole number.
  m = div (s + gap) 2
  xs = if g >= 0 then weave ll                 rr { shift = gap }
                 else weave ll { shift = gap } rr
placeRT _ = error "full binary trees only please"

drawRT :: Tree a -> Tree (Int, a)
drawRT = addx 0 . placeRT

drawExpr :: Expr -> Tree (Int, String)
drawExpr = drawRT . fromExpr where
  fromExpr (Fun s)   = Node s []
  fromExpr (Com l r) = Node "" $ fromExpr <$> [l, r]

Web demo

It’s straightforward though a little tedious to turn coordinates into an SVG drawing:

render :: Int -> Tree (Int, String) -> String
render depth (Node (x, s) ks) = concat $
  ((\(Node (x2, _) _) -> concat
    [ "<line x1='",  show $ x + x0, "' y1='", show $ depth*40 + y0
    , "' x2='", show $ x2 + x0, "' y2='", show $ (depth + 1)*40 + y0
    , "' stroke='black'/>"
    ]) <$> ks) ++
  (if null ks then
    [ "<rect x='", show $ x - 12 + x0, "' y='", show $ depth*40 - 12 + y0
    , "' width='24' height='24' stroke='black' fill='white'/>\n"
    ]
  else
    [ "<circle r='3' cx='", show $ x + x0, "' cy='", show $ depth*40 + y0
    , "' stroke='black'/>\n"
    ]) ++
  (if null s then [] else
    [ "<text text-anchor='middle' alignment-baseline='central'"
    , " x='", show $ x + x0, "' y='", show $ depth*40 + y0
    , "'>", s, "</text>\n"
    ]) ++
  (render (depth + 1) <$> ks)
  where (x0, y0) = (15, 15)

We hook up the "Draw!" button to a function that parses and draws the tree in the input box.

While working on this, I ran into a problem. For some reason that is probably important to some committee, JavaScript’s createElement creates elements that can be added to the SVG, but on my browser they are never rendered even though they appear in the DOM. Instead, we must call createElementNS to create SVG elements.

In the end, it didn’t matter as I sidestepped the issue entirely by setting innerHTML directly.

We compute the viewBox with the help of a function that determines the maximum x- and y-coordinates of a tree drawing.

maxXY :: Tree (Int, a) -> (Int, Int)
maxXY t = (maximum xs, length xs) where xs = fst . last <$> levels t

drawdemo = parse expr "" <$> jsEval "input.value;" >>= \case
  Left err -> jsEval $ "msg.innerText= 'Parse error: " ++ show err ++ "';"
  Right t -> do
    jsEval "msg.innerHTML = '';"
    let
      drawing = drawExpr t
      (x, y) = maxXY drawing
    jsEval $ "soil.setAttribute('viewBox', '-15 -15 " ++ show (x + 30) ++ " " ++ show (40*y + 30) ++ "');"
    jsEval $ "soil.innerHTML = " ++ shows (render 0 drawing) ";"

drawdemo

jsEval [r|drawB.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "drawdemo"); });|]

We write a helper that generates a random expression with a given number of leaf nodes, and call it with a positive integer drawn from a distribution whose expected value is 12. In particular, we add one to the number of rolls of an 11-sided die it takes to see a particular face.

rndmod n = fromIntegral . readInteger <$> jsEval
  ("Math.floor(Math.random()*" ++ show n ++ ");")

randomExpr 1 = (:[]) . (['a'..'z'] !!) <$> rndmod 26
randomExpr n = do
  m <- succ <$> rndmod (n - 1)
  a <- randomExpr m
  b <- randomExpr $ n - m
  pure $ concat ["(", a, ".", b, ")"]

rndsize = rndmod 11 >>= \case
  0 -> pure 1
  _ -> succ <$> rndsize

rando = do
  x <- randomExpr =<< rndsize
  jsEval $ "input.value = " ++ shows x ";"
  drawdemo

jsEval [r|randB.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "rando"); });|]

Ben Lynn blynn@cs.stanford.edu 💡

Lovely as a Tree

Preliminaries

The naive approach

From quadratic to linear

Web demo

Contents