# The Chernoff Bound

The Chernoff bound is like a genericized trademark: it refers not to a particular inequality, but rather a technique for obtaining exponentially decreasing bounds on tail probabilities.

Much of this material comes from my CS 365 textbook, Randomized Algorithms by Motwani and Raghavan.

For $$i = 1,...,n$$, let $$X_i$$ be independent random variables that take the value $$1$$ with probability $$p_i$$ and $$0$$ otherwise. Suppose at least one of the $$p_i$$ is nonzero. Let $$X = \sum_{i=1}^N x_i$$, and let $$\mu = E[X] = \sum_{i=1}^N p_i$$.

## Multiplicative Chernoff Bound

We first focus on bounding $$\Pr[X > (1+\delta)\mu]$$ for $$\delta > 0$$.

We have $$\Pr[X > (1+\delta)\mu] = \Pr[e^{tX} > e^{t(1+\delta)\mu}]$$ for all $$t > 0$$. We’ll later select an optimal value for $$t$$. By Markov’s inequality, we have:

$\Pr[e^{tX} > e^{t(1+\delta)\mu}] \le E[e^{tX}] / e^{t(1+\delta)\mu}$

My textbook stated this inequality is in fact strict if we assume none of the $$p_i$$ are 0 or 1, but I’m not sure this is required, due to a strict inequality later on.

Then since the $$X_i$$ are independent:

$E[e^{tX}] = E[e^{t(X_1 + ... + X_n)}] = E[\prod_{i=1}^N e^{tX_i}] = \prod_{i=1}^N E[e^{tX_i}]$

We can compute $$E[e^{tX_i}]$$ explicitly: this random variable is $$e^t$$ with probability $$p_i$$, and $$1$$ otherwise, that is, with probability $$1 - p_i$$, thus this is equal to:

$\prod_{i=1}^N E[e^{tX_i}] = \prod_{i=1}^N (1 + p_i(e^t - 1))$

We have $$1 + x < e^x$$ for all $$x > 0$$. As long as at least one $$p_i > 0$$, we have:

$\prod_{i=1}^N (1 + p_i(e^t - 1)) < \prod_{i=1}^N e^{p_i(e^t - 1)} = e^{(p_1 + ... + p_n) (e^t - 1)} = e^{(e^t - 1)\mu}$

Whence:

$\Pr[X > (1+\delta)\mu] < e^{(e^t - 1)\mu} / e^{t(1+\delta)\mu}$

It is time to choose $$t$$. Differentiating the right-hand side shows we attain the minimum at $$t = ln(1+\delta)$$, which is positive when $$\delta$$ is. This value of $$t$$ yields the Chernoff bound:

$\Pr[X > (1+\delta)\mu] < \pmatrix{\frac{e^\delta}{(1+\delta)^{1+\delta}}}^\mu$

## Below Expectations

We use the same technique to bound $$\Pr[X < (1-\delta)\mu]$$ for $$\delta > 0$$. We have:

$\Pr[X < (1-\delta)\mu] = \Pr[-X > -(1-\delta)\mu] = \Pr[e^{-tX} > e^{-(1-\delta)\mu}]$

for any $$t > 0$$. If we proceed as before, that is, apply Markov’s inequality, use the approximation $$1+x < e^x$$, then pick $$t$$ to minimize the bound, we have:

$\Pr[X < (1-\delta)\mu] < \pmatrix{\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}}^\mu$

## Bounding the bounds

Unfortunately, the above bounds are difficult to use, so in practice we use cruder but friendlier approximations.

Recall $$ln(1-x) = -x - x^2 / 2 - x^3 / 3 - ...$$. Thus if $$\delta \le 1$$, we have:

$ln (1-\delta) > -\delta - \delta^2 / 2$

Exponentiating both sides, raising to the power of $$1-\delta$$ and dropping the highest order term yields:

$(1-\delta)^{1-\delta} > e^{-\delta + \delta^2/2}$

Thus for $$0 < \delta < 1$$:

$\Pr[X < (1-\delta)\mu] < e^{-\delta^2\mu/2}, 0 < \delta < 1$

As for the other Chernoff bound, Wikipedia states:

$\Pr[X > (1+\delta)\mu] < e^{-\delta^2\mu/3}, 0 < \delta < 1$

but my textbook quotes a better bound:

$\Pr[X > (1+\delta)\mu] < e^{-\delta^2\mu/4}, 0 < \delta < 2e - 1$

For $$i = 1, ..., n$$, let $$X_i$$ be a random variable that takes $$1$$ with probability $$p$$ and $$0$$ otherwise, and suppose they are independent. Let $$X = \sum_{i=1}^n X_i$$. Then:
$\Pr[|X - E[X]| \ge \sqrt{n}\delta ] \le 2 e^{-2 \delta^2}$