The Chernoff Bound
The Chernoff bound is like a genericized trademark: it refers not to a particular inequality, but rather a technique for obtaining exponentially decreasing bounds on tail probabilities.
Much of this material comes from my CS 365 textbook, Randomized Algorithms by Motwani and Raghavan.
For \(i = 1,…,n\), let \(X_i\) be independent random variables that take the value \(1\) with probability \(p_i\) and \(0\) otherwise. Suppose at least one of the \(p_i\) is nonzero. Let \(X = \sum_{i=1}^N x_i\), and let \(\mu = E[X] = \sum_{i=1}^N p_i\).
Multiplicative Chernoff Bound
We first focus on bounding \(\Pr[X > (1+\delta)\mu]\) for \(\delta > 0\).
We have \(\Pr[X > (1+\delta)\mu] = \Pr[e^{tX} > e^{t(1+\delta)\mu}]\) for all \(t > 0\). We’ll later select an optimal value for \(t\). By Markov’s inequality, we have:
\[ \Pr[e^{tX} > e^{t(1+\delta)\mu}] \le E[e^{tX}] / e^{t(1+\delta)\mu} \]
My textbook stated this inequality is in fact strict if we assume none of the \(p_i\) are 0 or 1, but I’m not sure this is required, due to a strict inequality later on.
Then since the \(X_i\) are independent:
\[ E[e^{tX}] = E[e^{t(X_1 + … + X_n)}] = E[\prod_{i=1}^N e^{tX_i}] = \prod_{i=1}^N E[e^{tX_i}] \]
We can compute \(E[e^{tX_i}]\) explicitly: this random variable is \(e^t\) with probability \(p_i\), and \(1\) otherwise, that is, with probability \(1 - p_i\), thus this is equal to:
\[ \prod_{i=1}^N E[e^{tX_i}] = \prod_{i=1}^N (1 + p_i(e^t - 1)) \]
We have \(1 + x < e^x\) for all \(x > 0\). As long as at least one \(p_i > 0\):
Whence:
It is time to choose \(t\). Differentiating the right-hand side shows we attain the minimum at \(t = ln(1+\delta)\), which is positive when \(\delta\) is. This value of \(t\) yields the Chernoff bound:
Below Expectations
We use the same technique to bound \(\Pr[X < (1-\delta)\mu]\) for \(\delta > 0\). We have:
for any \(t > 0\). If we proceed as before, that is, apply Markov’s inequality, use the approximation \(1+x < e^x\), then pick \(t\) to minimize the bound, we have:
Bounding the bounds
Unfortunately, the above bounds are difficult to use, so in practice we use cruder but friendlier approximations.
Recall \(\ln(1-x) = -x - x^2 / 2 - x^3 / 3 - …\). Thus if \(\delta \le 1\), we have:
Exponentiating both sides, raising to the power of \(1-\delta\) and dropping the highest order term yields:
Thus for \(0 < \delta < 1\):
As for the other Chernoff bound, Wikipedia states:
but my textbook quotes a better bound:
An Additive Chernoff Bound
Due to Hoeffding, this Chernoff bound appears as Problem 4.6 in Motwani and Raghavan. It was also mentioned in a cryptography class I took long ago.
For \(i = 1, …, n\), let \(X_i\) be a random variable that takes \(1\) with probability \(p\) and \(0\) otherwise, and suppose they are independent. Let \(X = \sum_{i=1}^n X_i\). Then: