<< Chapter < Page | Chapter >> Page > |
In the last lecture , we studied bounds of the following form: for any , with probability at least ,
which led to upper bounds on the estimation error of the form
The key assumptions made in deriving the error bounds were:
The bounds are valid for every and are called distribution-free.
In this lecture we will generalize the previous results in a powerful way by developing bounds applicable to possibly infinitecollections of candidates. To start let us suppose that is a countable, possibly infinite, collection of candidate functions.Assign a positive number c( ) to each , such that
The numbers c( ) can be interpreted as
In particular, if P( ) is the prior probability of then
so produces
Now recall Hoeffding's inequality. For each and every
or for every
Suppose is specified. Using the values c( ) for , define
Then we have
Furthermore we can apply the union bound as follows
So for any with probability at least , we have that
and
which implies that for any with probability at least , we have
Note that this is precisely the bound we derived in the last lecture .
Choosing c( )
The generalized bounds allow us to handle countably infinite collections of candidate functions, but we require that
Of course, if where is a proper prior probability distribution then we have
However, it may be difficult to design a probability distribution over an infinite class of candidates. The coding perspectiveprovides a very practical means to this end.
Assume that we have assigned a uniquely decodable binary code to each , and let c( ) denote the codelength for . That is, the code for is c( ) bits long. A very useful class of uniquely decodable codes are called prefix codes.
For any binary prefix code, the codeword lengths , , ... satisfy
Conversely, given any , , ... satisfying the inequality above we can construct a prefix code with these codeword lengths.We will prove this result a bit later, but now let's see how this is useful in our learning problem.
Assume that we have assigned a binary prefix codeword to each , and let c( ) denote the bit-length of the codeword for . Set . Then
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?