for some
.
The moment condition can be difficult to verify in general, but it
does hold, for example, for bounded random variables. If
[link] holds, then the Craig-Bernstein (CB) inequality states:
for
and
This shows that the tail decays exponentially in t,
rather than exponentially in
Recall Hoeffding's inequality:
If
then
which implies
This indicates that the CB inequality may be much tighter
than Hoeffding's. To use the CB inequality, we need to bound thevariance of
Note that
Assumption 1
The support of Y and the range f(X) is in a known interval of length b.
Proposition 1
With the above assumption,
[link] holds with
Proposition 2
Again, with the above assumption, it may be shown that
You can write
as
Note that the variance of
is upper-bounded by its second moment. Also note that the covariance
of the two terms above is zero:
This is evident when you recall that
Now we can bound the second moments of
and
So
The final step is to see that
Thus,
And therefore, we can say that, with probability at least
In other words, with probability at least
(where
),
Now, suppose we have assigned positive numbers
to each
satisfying the Kraft inequality:
Note that
[link] holds
In particular, we let
be a function of f:
So we can use this
along with the procedure introduced in
Lecture 9 (
i.e., Union of events bound followed
by the Kraft inequality) to obtain the following. For all
with probability at least
Now set
and assume
Then define
Now, after using
and rearranging terms, we have:
We want to choose f to minmize this upper bound. So take
So, with probability at least
where
Now we use the Craig-Bernstein inequality to bound the difference between
and
With probability at least
Now we can again use the union bound to combine
[link] and
[link] :
With probability at least
Now set
then we have
Integrating, we get
To sum up, we have shown that for
or,
since
Or, in expanded form:
Notice that if
and if
is not too large
(e.g.,
), then we have
, within a logarithmic factor of the parametric rateof convergence!