So, the probability goes to zero at a rate of at least
.
However, it turns out that this is an extremely loose bound. Accordingto the Central Limit Theorem
in distribution. This suggests that for large values of n,
That is, the Gaussian tail probability is tending to zero
exponentially fast.
Chernoff's bound
Note that for any nonnegative random variable
and
,
Chernoff's bound is based on finding the value of
that
minimizes the upper bound. If
is a sum of independent random
variables. For example, say
then the bound becomes
Thus, the problem of finding a tight bound boils down to finding a
good bound for
. Chernoff ('52), first
studied this situation for binary random variables. Then,Hoeffding ('63) derived a more general result for arbitrary
bounded random variables.
Hoeffding's indequality
Theorem
Hoeffding's inequality
Let
be independent bounded random variables such that
with probability 1. Let
. Then for any
, we have
The key to proving Hoeffding's inequality is the following upper
bound: if
is a random variable with
and
then
This
upper bound is derived as follows. By the convexity of theexponential function,
Thus,
Now let
Then we have
To minimize the upper bound let's express
in a Taylor's series with remainder :
Now,
is maximized by
So,
Now, we can apply this upper bound to derive Hoeffding's inequality.
Similarly,
.
This completes the proof of the Hoeffding's theorem.
Application
Let
as in the
classification problem. Then for a fixed f, it follows fromHoeffding's inequality (i.e., Chernoff's bound in this special case)
that
Now, we want a bound like this to hold uniformly for all
.
Assume that
is a finite collection of models and let
denote its cardinality. We would like to bound the
probability that
. Note that the event
Therefore
Thus, we have shown that with probability at
least
,
And accordingly, we can be reasonably
confident in selecting
from
based on the empirical risk
function
.