Generating Functions, Generating Fun

We previously explored Doug McIlroy’s beautiful code that represents power series with lazy lists. However, there’s a drawback to this direct approach: it could be unclear whether a given list represents a power series or not.

McIlroy suggests a dedicated data type for power series, at the cost of a few more characters:

infixr 5 :+:
data Ps a = Pz | a :+: Ps a deriving (Eq, Show)

where Pz is the power series zero.

We must tolerate some mess due to limitations of my compiler:

-- My Base class lacks Foldable.
foldrPs f z = \case
  Pz -> z
  a:+:at -> f a $ foldrPs f z at

-- My compiler lacks DeriveFunctor.
instance Functor Ps where fmap f = foldrPs ((:+:) . f) Pz

-- I don't have IsList either.
fromList = foldr (:+:) Pz
toList = foldrPs (:) []

zipPs f = \cases
  Pz _ -> Pz
  _ Pz -> Pz
  (a:+:at) (b:+:bt) -> f a b:+:zipPs f at bt

Then we rewrite the original definitions. Since we’re paying extra, we may as well extend our routines to support polynomials.

instance Ring a => Ring (Ps a) where
  (+) = zipZero (+)
  (-) = zipZero (-)
  (f:+:ft) * gs@(g:+:gt) = f*g:+:fmap (f*) gt + ft*gs
  _ * _ = Pz
  fromInteger = series . fromInteger

series = (:+:Pz)

zipZero (##) = \cases
  Pz ys -> (0##) <$> ys
  xs Pz -> (##0) <$> xs
  (x:+:xt) (y:+:yt) -> x##y :+: zipZero (##) xt yt

instance (Eq a, Ring a, Field a) => Field (Ps a) where
  (0:+:ft) / (0:+:gt) = ft/gt
  (f:+:ft) / (g:+:gt) = qs where qs = f/g:+:fmap ((1/g)*) (ft - qs*gt)

diff (_:+:ft) = zipPs (*) ft (fromList [1..])
int fs = 0:+:zipPs (/) fs (fromList [1..])

(f:+:ft) # gs@(0:+:gt) = f :+: gt*(ft#gs)
revert (0:+:ft) = rs where rs = 0 :+: 1/(ft#rs)
sqrt1 = \case
  0:+:0:+:ft -> 0:+:sqrt1 ft
  fs@(1:+:ft) -> qs where qs = 1 + int((diff fs)/(2*qs))

This definition makes some code marginally prettier:

x = (0:+:)

And we’re done! We can perform magic with power series once more:

exps = 1 + int exps
tans = revert $ int $ 1/(1 + x(x 1))
peep = take 12 . toList
peep exps
peep tans

This time, we can also show off polynomials:

(1 + x(2 + x 3))^3

We take a moment to redefine peep so it also pretty-prints a power series with MathJax.

texTerm a n = let p :% q = a/ 1 in (. texPower n) case q of
  1 | p == 1, n > 0 -> id
    | otherwise -> shows p
  _ -> ("\\frac{"++) . shows p . ("}{"++) . shows q . ('}':)
texPower = \case
  0 -> id
  1 -> ('x':)
  n -> ("x^{"++) . shows n . ('}':)
tex = go 0 True where
  go n isFirst = \case
    Pz | isFirst -> ('0':)
       | otherwise -> id
    a :+: at
      | n == 12 -> ("..."++)
      | a == 0 -> go (n + 1) isFirst at
      | otherwise -> sepPlus . texTerm a n . go (n + 1) False at
      where sepPlus = if a > 0 && not isFirst then ('+':) else id
peep ps = do
  print $ take 12 $ toList ps
  jsEval $ [r|
const e = document.createElement("div"); e.style["font-size"]="85%";
e.textContent = String.raw`\[|] ++ tex ps [r|\]`;
runme_out.appendChild(e); MathJax.typeset([e]);
|]

peep tans
peep $ (1 + x(2 + x 3))^3

Formal Dress

By now, we’re comfortable with representing a power series with a list. But this idea cuts both ways: we can represent a list with a power series.

McIlroy demonstrates this by representing a list of polynomials with a power series, that is, he describes a power series whose coefficients are themselves power series. Consider the identity:

\[ \frac{1}{1-z} = 1 + z + z^2 + …​ \]

If we choose \(z\) to be the polynomial \(1 + x\), then this produces all the powers of \(1 + x\), whose coefficients form Pascal’s triangle.

pascal = 1/(1 - x(series (1 + x 1)))
putStr $ unlines $ map (unwords . map (show . numerator)) $ toList <$>
  take 10 (toList pascal)

Although iterate ((1 + x 1)*) 1 results in a list that behaves identically, we shall see that writing lists as power series has benefits.

Special ops

Our new way of thinking means our initial concern also cuts both ways: it could be unclear whether a given power series represents a list or not.

Thus we say formal power series when a power series as a list, and we’re interested in tasks such as computing members of a sequence rather than actually summing up a sum and worrying about convergence. This occurs often in combinatorics, where formal power series are called generating functions.

We don’t abandon analysis completely. Occasionally we do care about convergence after all, as it can shed insight on the asymptotic behaviour of a sequence. Also, all power series converge when \(x = 0\), and we write \(f(0)\) for the constant term of a power series \(f(x)\). And even without convergence, differentiation and integration remain sensible if we view them as formal operations.

It turns out there is more than one way to map lists to formal series. Herbert Wilf, Generatingfunctionology, explains how to disambiguate. We call:

\[f(x) = a_0 + a_1 x + a_2 x^2 + …​ \]

the ordinary power series generating function for the sequence of coefficients:

\[ \left\{a_n\right\}_0^\infty = a_0, a_1, a_2, …​ \]

and we write:

\[ \newcommand{\ops}{\overset{ops}\longleftrightarrow} f \ops \left\{a_n\right\}_0^\infty \]

We may write \(f\) for \(f(x)\) when the variable is clear from context.

Suppose we drop the first \(h\) members of the sequence so \(a_h, a_{h+1}, …​\) remains. Then we have:

\[ \frac{f - a_0 - ... - a_{h-1} x^{h-1}}{x^h} \ops \left\{a_{n+h}\right\}_0^\infty \]

Haskell’s drop h is more concise, but again, we shall see that writing computations as arithmetic operations has benefits.

Because power series are sums, the correspondence is linear, that is, for all \(g \ops \left\{b_n\right\}_0^\infty\):

\[f + g \ops \left\{a_n + b_n\right\}_0^\infty\]

and for any constant \(c\):

\[cf \ops \left\{ca_n\right\}_0^\infty\]

Why algebra matters

Combining the above ideas, if \(f\) is the ordinary power series for the Fibonacci numbers 0, 1, 1, 2, 3, 5, …​

\[ f = 0 + x + x^2 + 2x^3 + 3x^4 + 5x^5 + …​ \]

then the recursion \(a_{n+2} = a_{n+1} + a_n\) and initial conditions \(a_0 = 0, a_1 = 1\) become:

\[ f = x + x^2 \left(f + \frac{f}{x}\right) \]

fibs = x(1 + x(fibs + fibs/x 1))
peep fibs

which perhaps impresses everyone except seasoned Haskell programmers, who may say we’ve merely disguised the classic:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

If so, fine, but it’s only because Haskell is such an amazing language to begin with! At any rate, even if the classic edition is more elementary, we stress again that expressing list operations with formal power series has benefits.

And this time, we reveal the benefits: formal power series allow us to draw on a vast wealth of mathematical knowledge. Humankind has toiled in rings and fields for thousands of years. We’ve picked up a few tricks along the way, such as the invention of symbolic algebra several centuries ago, and today we think nothing of mechanically solving for \(f\):

\[ f = \frac{x}{1 - x - x^2} \]

Indeed, a computer algebra system could automate this. We persevere without one, though we can check our work:

peep $ x 1/(1 - x 1 - x(x 1))

\[ f = \frac{x}{1 - x - x^2} = \frac{1}{\sqrt{5}} \left( \frac{1}{1 - \frac{1}{2}(1 + \sqrt{5})x} - \frac{1}{1 - \frac{1}{2}(1 - \sqrt{5})x} \right) \]

Then exploiting the above \(1 / (1 - z)\) identity leads to Binet’s formula for the \(n\)th Fibonacci number:

\[ \frac{1}{\sqrt{5}} \left(\frac{1 + \sqrt{5}}{2}\right)^n - \frac{1}{\sqrt{5}} \left(\frac{1 - \sqrt{5}}{2}\right)^n \]

Can you deduce this by reasoning with drop and iterate and zipWith? Even if you could, why bother when ordinary algebra hands you the answer on a silver platter?

For other sequences, although we might never find nice closed formulas like this, generating functions can still help us compute the members of the sequence or reveal some of their properties.

Derivative work

We have \(x D x^n = x (n x^{n-1}) = n x^n \).

xD = x . diff

peep $ iterate x 1 !! 5
peep $ xD $ iterate x 1 !! 5

Hence for any polynomial \(p(x)\):

\[p(xD) f \ops \left\{p(n) a_n\right\}_0^\infty\]

For example, the generating function for the squares is:

\[ (xD)^2\frac{1}{1-x} \ops \left\{n^2 \right\}_0^\infty\]

peep $ xD (xD (1 / (1 - x 1)))

When \(a_0 = 0\), this generalizes to a Laurent polynomial \(p\) by defining \(D^{-1}\) to mean integration from 0 to \(x\) and:

\[(xD)^{-1} = D^{-1} x^{-1}\]

Thus the generating function for the inverse of the nonzero squares is:

unxD = int . (/x 1)

peep $ unxD (unxD (1 / (1 - x 1) - 1))

This is the dilogarithm \(\mathrm{Li}_2(z)\), the order 2 polylogarithm. The Basel problem asks for the value of \(\mathrm{Li}_2(1)\). (Spoiler: it’s \(\pi^2 / 6\).)

Convolution

Multiplying ordinary power series results in convolution:

\[f g \ops \left\{\sum_{k=0}^{n} a_k b_{n - k}\right\}_0^\infty\]

A useful specialization is \(g = 1/(1 - x) \ops \left\{1\right\}_0^\infty \). Then we get the partial sums of the \(a_n\):

\[f \left(\frac{1}{1-x}\right) \ops \left\{\sum_{k=0}^{n} a_k \right\}_0^\infty\]

For example, the ordinary power series for the sum of the first \(n\) squares is:

peep $ xD (xD (1/(1 - x 1))) * (1/(1 - x 1))

McIlroy demonstrates convolution by counting binary trees. Let \(t\) be the ordinary power series for the number of binary trees of \(n\) nodes. Observe:

  1. There is a unique empty binary tree.

  2. A non-empty binary tree is a parent node with two child nodes that are themselves binary trees.

Or as we would say in Haskell:

data BinaryTree = Empty | Node BinaryTree BinaryTree

We have the generating function:

\[ t = 1 + x t^2 \]

The 1 counts the empty tree, the \(x\) accounts for the parent node, and \(t^2\) is a convolution that counts the cases where the left child has \(k\) nodes and the right has \(n - k\) nodes for each \(k\).

ts = 1 + x (ts^2)
peep ts

These are the Catalan numbers. Algebra shows:

\[ t = \frac{1 - \sqrt{1 - 4x}}{2x} \]

peep $ (1 - sqrt1 (1 - x 4))/(x 2)

Considering the binomial expansion of the square root leads to a formula for the \(n\)th Catalan number:

\[ \frac{1}{n+1} {2n \choose n} \]

We omit the details here; we are content just celebrating another victory for algebra.

McIlroy generalizes by allowing each node to have any number of children, that is, ordered trees. We start with perhaps the simplest kind of tree: unary trees, otherwise known as lists. Each node has zero or one children:

data List = EmptyList | ListNode List

so the generating function that counts them is:

list = 1 + x list
peep list

Although it’s obvious there is exactly one list of each size, it’s gratifying to see our machinery work as intended.

A nonempty tree is a parent node with a list of child nonempty trees, also known as a forest. Empty trees complicate matters because an empty child tree would be the same as no child, so we stick to counting nonempty trees; we can always account for the empty case afterwards.

These Haskell declarations cut a long story short:

data NonEmptyTree = Node Forest
data Forest = Forest (List NonEmptyTree)

and translate to the ordinary power series:

tree = x forest
forest = list#tree

In doing so, we score another win for algebra: Haskell’s algebraic data types map neatly to generating functions, sparing us from thinking hard. See Joel Burget’s post for more, including how differentiation produces zippers.

peep tree

The Catalan numbers strike again! A sprinkling of algebra confirms the forest function satisfies the same equation as ts.

What about unordered unlabeled trees? We save these for another episode, where we show how generating functions can tame all sorts of trees.

Exponential generating functions

Let’s take a whirlwind tour of other ways to represent lists with formal series.

The exponential generating function for \( \left\{a_n\right\}_0^\infty = a_0, a_1, a_2, …​ \) is:

\[f(x) = a_0 + \frac{a_1}{1!} x + \frac{a_2}{2!} x^2 + …​ \]

and we write:

\[ \newcommand{\egf}{\overset{egf}\longleftrightarrow} f \egf \left\{a_n\right\}_0^\infty \]

egfPeep = peep . zipPs (flip (/)) exps

This correspondence is also linear, and the \(xD\) trick for polynomial remains the same. However, shifting a sequence is achieved with differentiation:

\[ D^h f \egf \left\{a_{n+h}\right\}_0^\infty \]

and multiplication involves an extra factor which suits some problems:

\[ f g \egf \left\{ \sum_{k=0}^n {n \choose k} a_k b_{n-k} \right\}_0^\infty \]

where \(g \egf \left\{b_{n}\right\}_0^\infty\).

For example, a derangement is a permutation with no fixed points. If \(a_n\) is the number of derangements of \(n\) objects, then:

\[ n! = \sum_{k=0}^n {n \choose k} a_k \]

because every permutation can be viewed as fixing \(k\) objects and deranging the others.

Since \(e^x \egf \left\{1\right\}_0^\infty\), taking the egf on both sides gives:

\[ \frac{1}{1 - x} = f(x) e^x \]

Then the following code counts derangements:

egfPeep $ 1 / (exps * (1 - x 1))

A little algebra gives a formula for the number of derangements of \(n\) objects:

\[ 1 - 1 - \frac{1}{1!} + \frac{1}{2!} - …​ + (-1)^n \frac{1}{n!} \]

Precious moments

For a random variable \(X\), the moment generating function is:

\[ M_X \egf \left\{ E[X^n] \right\}_0^\infty \]

Linearity of expectation and the Taylor expansion of the exponential function implies \(M_x(t) = E[e^{tX}]\).

Exponentiation lets us replace sums with products. We often use logarithms to convert products to sums for easier calculation, but going the other way also comes in handy.

For example, a Bernoulli random variable \(X\) is 1 with probability \(p\) and 0 otherwise, so its moment generating function is:

\[ E[e^{tX}] = (1 - p)e^0 + p e^{t} = 1 + (e^t - 1)p \]

A binomial random variable is the sum of \(n\) independent Bernoulli random variables, hence its moment generating function is:

\[ E[(e^{tX})^n] = (1 + (e^t - 1)p)^n \]

Now, a Poisson random variable \(Z\) with the expection of \(\lambda\) events in a given interval is defined by:

\[ Pr[Z = k] = e^{-\lambda} \frac{\lambda^k}{k!} \]

for all naturals \(k\) (which is almost an exponential generating function). Thus its moment generating function is:

\[ E[e^{tZ}] = \sum_{k\ge 0} e^{-\lambda} \frac{\lambda^k}{k!} e^{kt} = e^{-\lambda} e^{\lambda e^t} \]

This is the same as:

\[ \lim_{n \rightarrow \infty} (1 + (e^t - 1)(\lambda / n))^n = e^{(e^t - 1)\lambda} \]

The cumulant generating function of a random variable \(X\) is:

\[ K_X = \log M_X \]

We can view \(K_X\) as an exponential generating function thanks to the Taylor expansion \(\log (1 + x) = x - x^2 / 2 + …​\)

logs = int $ 1/(1 + x 1)
take 3 $ toList logs

Sometimes we would rather work with sums rather than products. Define the cumulants \(K_n[X]\) by:

\[ K_X \egf \left\{K_n[X]\right\}_0^\infty \]

Some algebra shows:

\[ K_0[X] = 0, K_1[X] = E[X], K_2[X] = \operatorname{Var}(X) \]

For independent random variables \(X_1, …​, X_n\) with zero mean:

\[ K_m\left[ \frac{X_1 + ... + X_n}{\sqrt{n}} \right] = \frac{K_m[X_1] + ... + K_m[X_n]}{n^{m/2}} \]

which leads to a proof of the Central Limit Theorem, where it turns out the convergence of the moment generating function becomes important. We can restrict the proof to distributions with bounded higher moments, or we can multiply the exponent by \(i\) to get \(E[e^{itX}]\), guaranteeing convergence thanks to the magic of Fourier series.

Dirchlet series generating functions

The Dirchlet series generating function for \( \left\{a_n\right\}_1^\infty = a_1, a_2, …​ \) is:

\[f(s) = a_1 + \frac{a_2}{2^s} + \frac{a_3}{3^s} + …​ \]

and we write:

\[ \newcommand{\Dir}{\overset{Dir}\longleftrightarrow} f \Dir \left\{a_n\right\}_1^\infty \]

These are too far from power series for our code, so we just state some facts here. Multiplying Dirchlet series gives:

\[ fg = \left\{ \sum_{d|n} a_n b_{n/d} \right\}_1^\infty \]

where \( g \Dir \left\{b_n\right\}_1^\infty \).

The Dirchlet series for the sequence of all ones is the Riemann zeta function:

\[ \zeta(s) = 1^{-s} + 2^{-s} + 3^{-s} + …​ \Dir \left\{1\right\}_1^\infty \]

Thus \( \zeta^2 \Dir \left\{d(n)\right\}_1^\infty \) where \(d(n)\) is the number of divisors of \(n\).

Let \(\mathbb{P}\) be the prime numbers. For any multiplicative function \(f\), we have:

\[ \sum_{n=1}^\infty \frac{f(n)}{n^{-s}} = \prod_{p\in\mathbb{P}} 1 + f(p)p^{-s} + f(p^2)p^{-2s} + ... \]

In particular, since \(f(n) = 1\) is multiplicative:

\[ \zeta(s) = \prod_{p\in\mathbb{P}} 1 + p^{-s} + p^{-2s} + ... = \prod_{p\in\mathbb{P}} \frac{1}{1 - p^{-s}} \]

Let \(\mu(n)\) be the multiplicative function satisfying \(\mu(1) = 1, \mu(p) = -1, \mu(p^k) = 0\) for all primes \(p\) and \(k > 1\). Then:

\[ \sum_{n=1}^\infty \frac{\mu(n)}{n^{-s}} = \prod_{p\in\mathbb{P}} 1 - p^{-s} \]

so \( \zeta^{-1} \Dir \left\{ \mu(n) \right\}_1^\infty \).

We call \(\mu(n)\) the Möbius function, which we use to perform Möbius inversion: suppose \( f \Dir \left\{a_n\right\}_1^\infty \) and \( g \Dir \left\{b_n\right\}_1^\infty \) are sequences satisfying:

\[ a_n = \sum_{d|n} b_d \]

Then \(f = g \zeta \), hence \(g = f \zeta^{-1} \), implying:

\[ b_n = \sum_{d|n} \mu(n/d) a_d \]

For example, the cyclotomic polynomials satisfy \(\prod_{d|n} \Phi_d (x) = x^n - 1 \). We take logarithms to convert a product to a sum:

\[\sum_{d|n} \log \Phi_d (x) = \log (x^n - 1) \]

Möbius inversion followed by exponentiation yields a formula for cyclotomic polynomials:

\[ \Phi_n(x) = \prod_{d|n} (x^d - 1)^{\mu(n/d)} \]


Ben Lynn blynn@cs.stanford.edu 💡