# The Chernoff Bound

The *Chernoff bound* is like a genericized trademark: it refers not to a
particular inequality, but rather a technique for obtaining exponentially
decreasing bounds on tail probabilities.

Much of this material comes from my
CS 365 textbook,
*Randomized Algorithms* by
Motwani and Raghavan.

For \(i = 1,...,n\), let \(X_i\) be independent random variables that take the value \(1\) with probability \(p_i\) and \(0\) otherwise. Suppose at least one of the \(p_i\) is nonzero. Let \(X = \sum_{i=1}^N x_i\), and let \(\mu = E[X] = \sum_{i=1}^N p_i\).

## Multiplicative Chernoff Bound

We first focus on bounding \(\Pr[X > (1+\delta)\mu]\) for \(\delta > 0\).

We have \(\Pr[X > (1+\delta)\mu] = \Pr[e^{tX} > e^{t(1+\delta)\mu}]\) for all \(t > 0\). We’ll later select an optimal value for \(t\). By Markov’s inequality, we have:

My textbook stated this inequality is in fact strict if we assume none of the \(p_i\) are 0 or 1, but I’m not sure this is required, due to a strict inequality later on.

Then since the \(X_i\) are independent:

We can compute \(E[e^{tX_i}]\) explicitly: this random variable is \(e^t\) with probability \(p_i\), and \(1\) otherwise, that is, with probability \(1 - p_i\), thus this is equal to:

We have \(1 + x < e^x\) for all \(x > 0\). As long as at least one \(p_i > 0\), we have:

Whence:

It is time to choose \(t\). Differentiating the right-hand side shows we attain the minimum at \(t = ln(1+\delta)\), which is positive when \(\delta\) is. This value of \(t\) yields the Chernoff bound:

## Below Expectations

We use the same technique to bound \(\Pr[X < (1-\delta)\mu]\) for \(\delta > 0\). We have:

for any \(t > 0\). If we proceed as before, that is, apply Markov’s inequality, use the approximation \(1+x < e^x\), then pick \(t\) to minimize the bound, we have:

## Bounding the bounds

Unfortunately, the above bounds are difficult to use, so in practice we use cruder but friendlier approximations.

Recall \(ln(1-x) = -x - x^2 / 2 - x^3 / 3 - ...\). Thus if \(\delta \le 1\), we have:

Exponentiating both sides, raising to the power of \(1-\delta\) and dropping the highest order term yields:

Thus for \(0 < \delta < 1\):

As for the other Chernoff bound, Wikipedia states:

but my textbook quotes a better bound:

## An Additive Chernoff Bound

Due to Hoeffding, this Chernoff bound appears as Problem 4.6 in Motwani and Raghavan. It was also mentioned in a cryptography class I took long ago.

For \(i = 1, ..., n\), let \(X_i\) be a random variable that takes \(1\) with probability \(p\) and \(0\) otherwise, and suppose they are independent. Let \(X = \sum_{i=1}^n X_i\). Then:

*blynn@cs.stanford.edu*💡