The Chernoff Bound

The Chernoff bound is like a genericized trademark: it refers not to a particular inequality, but rather a technique for obtaining exponentially decreasing bounds on tail probabilities.

Much of this material comes from my CS 365 textbook, Randomized Algorithms by Motwani and Raghavan.

For \(i = 1,…​,n\), let \(X_i\) be independent random variables that take the value \(1\) with probability \(p_i\) and \(0\) otherwise. Suppose at least one of the \(p_i\) is nonzero. Let \(X = \sum_{i=1}^N x_i\), and let \(\mu = E[X] = \sum_{i=1}^N p_i\).

Multiplicative Chernoff Bound

We first focus on bounding \(\Pr[X > (1+\delta)\mu]\) for \(\delta > 0\).

We have \(\Pr[X > (1+\delta)\mu] = \Pr[e^{tX} > e^{t(1+\delta)\mu}]\) for all \(t > 0\). We’ll later select an optimal value for \(t\). By Markov’s inequality, we have:

\[ \Pr[e^{tX} > e^{t(1+\delta)\mu}] \le E[e^{tX}] / e^{t(1+\delta)\mu} \]

My textbook stated this inequality is in fact strict if we assume none of the \(p_i\) are 0 or 1, but I’m not sure this is required, due to a strict inequality later on.

Then since the \(X_i\) are independent:

\[ E[e^{tX}] = E[e^{t(X_1 + …​ + X_n)}] = E[\prod_{i=1}^N e^{tX_i}] = \prod_{i=1}^N E[e^{tX_i}] \]

We can compute \(E[e^{tX_i}]\) explicitly: this random variable is \(e^t\) with probability \(p_i\), and \(1\) otherwise, that is, with probability \(1 - p_i\), thus this is equal to:

\[ \prod_{i=1}^N E[e^{tX_i}] = \prod_{i=1}^N (1 + p_i(e^t - 1)) \]

We have \(1 + x < e^x\) for all \(x > 0\). As long as at least one \(p_i > 0\):

\[\prod_{i=1}^N (1 + p_i(e^t - 1)) < \prod_{i=1}^N e^{p_i(e^t - 1)} = e^{(p_1 + ... + p_n) (e^t - 1)} = e^{(e^t - 1)\mu}\]

Whence:

\[\Pr[X > (1+\delta)\mu] < e^{(e^t - 1)\mu} / e^{t(1+\delta)\mu}\]

It is time to choose \(t\). Differentiating the right-hand side shows we attain the minimum at \(t = ln(1+\delta)\), which is positive when \(\delta\) is. This value of \(t\) yields the Chernoff bound:

\[\Pr[X > (1+\delta)\mu] < \pmatrix{\frac{e^\delta}{(1+\delta)^{1+\delta}}}^\mu\]

Below Expectations

We use the same technique to bound \(\Pr[X < (1-\delta)\mu]\) for \(\delta > 0\). We have:

\[\Pr[X < (1-\delta)\mu] = \Pr[-X > -(1-\delta)\mu] = \Pr[e^{-tX} > e^{-(1-\delta)\mu}]\]

for any \(t > 0\). If we proceed as before, that is, apply Markov’s inequality, use the approximation \(1+x < e^x\), then pick \(t\) to minimize the bound, we have:

\[\Pr[X < (1-\delta)\mu] < \pmatrix{\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}}^\mu\]

Bounding the bounds

Unfortunately, the above bounds are difficult to use, so in practice we use cruder but friendlier approximations.

Recall \(\ln(1-x) = -x - x^2 / 2 - x^3 / 3 - …​\). Thus if \(\delta \le 1\), we have:

\[\ln (1-\delta) > -\delta - \delta^2 / 2\]

Exponentiating both sides, raising to the power of \(1-\delta\) and dropping the highest order term yields:

\[(1-\delta)^{1-\delta} > e^{-\delta + \delta^2/2}\]

Thus for \(0 < \delta < 1\):

\[\Pr[X < (1-\delta)\mu] < e^{-\delta^2\mu/2}, 0 < \delta < 1\]

As for the other Chernoff bound, Wikipedia states:

\[\Pr[X > (1+\delta)\mu] < e^{-\delta^2\mu/3}, 0 < \delta < 1\]

but my textbook quotes a better bound:

\[\Pr[X > (1+\delta)\mu] < e^{-\delta^2\mu/4}, 0 < \delta < 2e - 1\]

An Additive Chernoff Bound

Due to Hoeffding, this Chernoff bound appears as Problem 4.6 in Motwani and Raghavan. It was also mentioned in a cryptography class I took long ago.

For \(i = 1, …​, n\), let \(X_i\) be a random variable that takes \(1\) with probability \(p\) and \(0\) otherwise, and suppose they are independent. Let \(X = \sum_{i=1}^n X_i\). Then:

\[\Pr[|X - E[X]| \ge \sqrt{n}\delta ] \le 2 e^{-2 \delta^2}\]

Ben Lynn blynn@cs.stanford.edu 💡