data Tree a = Node { rootLabel :: a, subForest :: [Tree a] } instance Functor Tree where fmap f (Node a xs) = Node (f a) (fmap f <$> xs) levels :: Tree a -> [[a]] levels t = map (map rootLabel) $ takeWhile (not . null) $ iterate (concatMap subForest) [t] minimum = foldr1 min maximum = foldr1 max
Lovely as a Tree
⚂
Of the many ways to draw binary trees, computer scientists often choose level-based layout, that is, nodes have the same y-coordinate if and only if they belong to the same level. By convention, deeper levels appear below shallower levels, and levels are evenly spaced.
The Reingold-Tilford algorithm efficiently produces some of the prettiest pictures of this style:
-
Left nodes appear to the left of right nodes.
-
Parent nodes are centered above their children.
-
No edges cross.
-
Drawing the reverse of a tree results in its mirror image.
-
A subtree appears the same no matter where it occurs on the tree.
-
The width of the drawing is minimal.
These slides show the Reingold-Tilford algorithm in action, which helped me understand it.
I found a summary of level-based algorithms for drawing trees, which is nice apart from a few errors of varying severity. It seems "Figure 7" should be "Figure 8" in some places; "right" should be "left" in one sentence; "n/3" should be "n/4"; "Principle 6" is misleading because the algorithm of Buchheim et al. merely opportunistically spreads out the siblings between two others at a certain point in the algorithm without moving any other siblings. (Insisting all siblings be evenly spaced results in wider drawings.)
See also the chapter on drawing general trees from the Handbook of Graph Drawing and Visualization.
Preliminaries
Let’s walk through an implementation of the Reingold-Tilford algorithm.
We swipe some definitions from GHC:
We fetch our Charser
parser combinator libary:
jsEval "curl_module('../compiler/Charser.ob')"
and write a parser for a simple language describing trees. The syntax mimics notation for composing functions. The right-associative operator "." connects subtrees, and any alphanumeric string is a leaf.
import Charser data Expr = Fun String | Com Expr Expr deriving Show expr :: Charser Expr expr = space *> com <* eof where com = atm `chainr1` (const Com <$> ch '.') atm = fun <|> ch '(' *> com <* ch ')' fun = Fun <$> some alphaNumChar <* space ch c = char c <* space chainr1 :: Charser a -> Charser (a -> a -> a) -> Charser a chainr1 p infixOp = (&) <$> p <*> (flip <$> infixOp <*> chainr1 p infixOp <|> pure id) parse expr "" "a . b . c . d" parse expr "" "a . (b . c) . d"
The naive approach
The idea behind the algorithm is simple and elegant. We build the drawing bottom-up. The y-coordinate of a node is implicit, as it is determined by the level, so we only need to pick good x-coordinates.
Base case: place a leaf node at x = 0.
Inductive step: given two well-drawn subtrees, we trace their contours. The right contour of a tree is the sequence of the x-coordinates of the rightmost node in each level, and the left contour is the same for the leftmost nodes.
We compare the right contour of the left layout against the left contour of the right layout, and indent one of them so the minimum distance between the contours is 1 unit. To ensure the drawing’s leftmost point has x = 0, we indent the right subtree when they are too close, and the left subtree when they are too far apart.
Lastly, the parent is given the x-coordinate halfway between its children.
simpleDraw :: Expr -> Tree (Double, String) simpleDraw = \case Fun s -> Node (0, s) [] Com l r -> Node (m, "") xs where [ll, rr] = simpleDraw <$> [l, r] [lCont, rCont] = [fst . head <$> levels rr, fst . last <$> levels ll] d = 1 - minimum (zipWith (-) lCont rCont) xs | d >= 0 = [ ll, first (+ d) <$> rr] | otherwise = [first (+(-d)) <$> ll, rr] m = (sum $ fst . rootLabel <$> xs) / 2 putStr $ unlines $ map show $ levels $ either undefined simpleDraw $ parse expr "" "a . (b . c) . d"
Though concise, the running time of the above code is quadratic in the number of nodes because:
-
Many nodes might be indented many times.
-
Tracing a contour requires traversing the whole tree.
Thanks to lazy evaluation and zipWith
, we only completely traverse the
shorter of the two subtrees. This behaviour becomes important when we improve
our algorithm.
From quadratic to linear
To fix the first issue, we record indent values in subtree roots. A final top-down pass recursively applies the indents cumulatively to compute the x-coordinates of each node. In essence, we are using:
fmap (x +) . fmap (y +) = fmap ((x + y) +)
(GHC has fusion optimizations that do this automatically for simpler cases.)
To fix the second issue, we maintain auxiliary edges on the tree so we can follow contours without traversing its entirety.
Instead of Data.Tree
we define a custom RT
tree data structure.
The shift
field records the indent amount for the first optimization, and
the link
field may hold an auxiliary edge for the second optimization.
Because of our first optimization, we need to record an indent modifier with
each link.
We parameterize so the node data structure can hold data of any given type.
data RT a = RT { xpos :: Int , shift :: Int , hold :: a , link :: Maybe (Int, RT a) , kids :: [RT a] }
The following applies all indents of an RT
tree to produce a Data.Tree
representing the final drawing:
addx :: Int -> RT a -> Tree (Int, a) addx i rt = Node (xpos rt + shift rt + i, hold rt) $ addx (i + shift rt) <$> kids rt
As for the link
field, observe we can mostly figure out the right contour by
starting at the root and recursively following the last child. Trouble arises
when we reach a leaf but there are still more levels to go.
To solve this problem, for rightmost leaf nodes except those in the last level,
we set link
to point at the rightmost node of the next level. When first
placing a node, it either lies on the deepest level or is guaranteed to be
an internal node, so link
is initially Nothing
.
When combining 2 subtrees, the weave
function traverses the rightmost
children and links of both subtrees until it bottoms out on at least one of
them. Then if needed, it follows at most one more edge or link on the other
subtree to create a new link. It also stores the difference between the total
indent of the link destination and that of the link source so we can update the
indent value accordingly when following links.
Left contours are similarly handled. In fact, to avoid code duplication, our
contour
function takes an f
argument that should be head
for left
contours and last
for right contours, and our weave
function calls a
helper function with id
or reverse
depending on whether it’s acting on
the left or the right side of the subtrees.
We could optimize further. For example, we know only one link on the shallower subtree needs adjustment. However, the asymptotic time complexity is unaffected.
Typical implementations modify data in place, but as our code is pure, we
create new RT
nodes instead.
contour :: ([RT a] -> RT a) -> (Int, RT a) -> [Int] contour f (acc, rt) = h : case kids rt of [] -> maybe [] (contour f . first (+ acc')) (link rt) ks -> contour f (acc', f ks) where acc' = acc + shift rt h = acc'+ xpos rt weave :: RT a -> RT a -> [RT a] weave l r = [weave' id (0, l) (0, r), weave' reverse (0, r) (0, l)] weave' :: ([RT a] -> [RT a]) -> (Int, RT a) -> (Int, RT a) -> RT a weave' f (accL, l) (accR, r) | Nothing <- follow = l | Just (n, x) <- link l = l { link = Just (n, weave' f (n + accL', x) y) } | (k:ks) <- f $ kids l = l { kids = f $ weave' f (accL', k) y : ks } | otherwise = l { link = first (+(-accL')) <$> follow } where accL' = accL + shift l accR' = accR + shift r follow | (k:_) <- f $ kids r = Just (accR', k) | otherwise = first (accR' +) <$> link r Just y = follow
This time, we want a configurable minimum gap between siblings, as well as integral x-coordinates. So if necessary, we bump up the indent value so the average of the x values of the sibling roots is a whole number.
We also change our API to take any Tree a
instead of our Expr
data
structure.
padding :: Int -- Minimum horizontal gap between nodes. padding = 50 placeRT :: Tree a -> RT a placeRT (Node a []) = RT 0 0 a Nothing [] placeRT (Node a [l, r]) = RT m 0 a Nothing xs where [ll, rr] = placeRT <$> [l, r] g = padding - minimum (zipWith (-) (contour head (0, rr)) (contour last (0, ll))) s = xpos ll + xpos rr gap = abs g + mod (abs g + s) 2 -- Adjust so midpoint is whole number. m = div (s + gap) 2 xs = if g >= 0 then weave ll rr { shift = gap } else weave ll { shift = gap } rr placeRT _ = error "full binary trees only please" drawRT :: Tree a -> Tree (Int, a) drawRT = addx 0 . placeRT drawExpr :: Expr -> Tree (Int, String) drawExpr = drawRT . fromExpr where fromExpr (Fun s) = Node s [] fromExpr (Com l r) = Node "" $ fromExpr <$> [l, r]
Web demo
It’s straightforward though a little tedious to turn coordinates into an SVG drawing:
render :: Int -> Tree (Int, String) -> String render depth (Node (x, s) ks) = concat $ ((\(Node (x2, _) _) -> concat [ "<line x1='", show $ x + x0, "' y1='", show $ depth*40 + y0 , "' x2='", show $ x2 + x0, "' y2='", show $ (depth + 1)*40 + y0 , "' stroke='black'/>" ]) <$> ks) ++ (if null ks then [ "<rect x='", show $ x - 12 + x0, "' y='", show $ depth*40 - 12 + y0 , "' width='24' height='24' stroke='black' fill='white'/>\n" ] else [ "<circle r='3' cx='", show $ x + x0, "' cy='", show $ depth*40 + y0 , "' stroke='black'/>\n" ]) ++ (if null s then [] else [ "<text text-anchor='middle' alignment-baseline='central'" , " x='", show $ x + x0, "' y='", show $ depth*40 + y0 , "'>", s, "</text>\n" ]) ++ (render (depth + 1) <$> ks) where (x0, y0) = (15, 15)
We hook up the "Draw!" button to a function that parses and draws the tree in the input box.
While working on this, I ran into a problem. For some reason that is probably
important to some committee, JavaScript’s createElement
creates elements that
can be added to the SVG, but on my browser they are never rendered even though
they appear in the DOM. Instead, we must call createElementNS
to create SVG
elements.
In the end, it didn’t matter as I sidestepped the issue entirely by setting
innerHTML
directly.
We compute the viewBox
with the help of a function that determines the
maximum x- and y-coordinates of a tree drawing.
maxXY :: Tree (Int, a) -> (Int, Int) maxXY t = (maximum xs, length xs) where xs = fst . last <$> levels t drawdemo = parse expr "" <$> jsEval "input.value;" >>= \case Left err -> jsEval $ "msg.innerText= 'Parse error: " ++ show err ++ "';" Right t -> do jsEval "msg.innerHTML = '';" let drawing = drawExpr t (x, y) = maxXY drawing jsEval $ "soil.setAttribute('viewBox', '-15 -15 " ++ show (x + 30) ++ " " ++ show (40*y + 30) ++ "');" jsEval $ "soil.innerHTML = " ++ shows (render 0 drawing) ";" drawdemo jsEval [r|drawB.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "drawdemo"); });|]
For the random tree feature, we lean on JavaScript to generate pseudorandom numbers:
function rndmod(n) { return Math.floor(Math.random() * n); }
We write a helper that generates a random expression with a given number of leaf nodes, and call it with a positive integer drawn from a distribution whose expected value is 12. In particular, we add one to the number of rolls of an 11-sided die it takes to see a particular face.
rndmod n = fromIntegral . readInteger <$> jsEval ("rndmod(" ++ show n ++ ");") randomExpr 1 = (:[]) . (['a'..'z'] !!) <$> rndmod 26 randomExpr n = do m <- succ <$> rndmod (n - 1) a <- randomExpr m b <- randomExpr $ n - m pure $ concat ["(", a, ".", b, ")"] rndsize = rndmod 11 >>= \case 0 -> pure 1 _ -> succ <$> rndsize rando = do x <- randomExpr =<< rndsize jsEval $ "input.value = " ++ shows x ";" drawdemo jsEval [r|randB.addEventListener("click", (ev) => { repl.run("chat", ["Main"], "rando"); });|]