Error bound
Suppose that we have a pool of candidate functions
, and we want to select a function
from
using the training data. Our usual approach is to show that the distribution of
concentrates about its mean as
grows. First, we assign a complexity
to each
so that
. Then, apply the union bound to get a
uniform concentration inequality holding for all models in
. Finally, we use this concentration inequality to bound the expected risk of our selected model.
We will essentially accomplish the same result here, but avoid the need for explicit concentration inequalities and instead make use of the information-theoretic bounds.
We would like to select an
so that the excess risk is small.
where
is again the KL divergence.
Unfortunately, as mentioned before,
is not a true distance. So instead we will focus on the expected squared Hellinger distance as our measure of performance. We will get a bound on
Maximum complexity-regularized likelihood estimation
Theorem
Li-barron 2000, kolaczyk-nowak 2002
Let
be a random sample of training data with
independent,
for some unknown function
.
Suppose we have a collection of candidate functions
, and complexities
, satisfying
Define the complexity-regularized estimator
Then,
Before proving the theorem, let's look at a special case.
Gaussian noise
Suppose
.
Using results from
example 1 , we have
Then,
We also have,
Combine everything together to get
The theorem tells us that
or
Now let's come back to the proof.
Proof
Now, define the theoretical analog of
:
Since
we can see that
Then can write
Now, simply multiply the argument inside the
by
to get
The terms
are precisely
what we wanted for the upper bound of the theorem. So, to finish theproof we only need to show that the last term is non-positive.
Applying Jensen's inequality, we get
Both
and
are random, which makes the expectation difficult to
compute. However, we can simplify the problem using the union bound,which eliminates the dependence on
:
where the last two lines come from
and