<< Chapter < Page | Chapter >> Page > |
In our past lectures we considered collections of candidate function that were either finite or enumerable . We then constructed penalties, usually codelengths, for eachcandidate , , such that This allowed us to derive uniform concentration inequalities over the entire set using the union bound. However, in many cases the collections may be uncountably infinite. A simple example is the collection of a single threshold classifier in 1-d having the form
and their complements
Thus, contains an uncountable number of classifiers, and we cannot apply the union bound argument in such cases.
To quantize
q is positive, such that
if the density of x is bounded by . .
Consider the fact that given only n training data, many of the classifiers in such a collection may produce identical empiricalerrors. Also, many will produce identical label assignments on the data. We will have at most unique labels.
f is uncountable, its interceptions are countable and bounded by . n intervals with 2 classifier per interval.
The number of distinct labeling assignments that a class can produce on a set of n points is denoted
The VC dimension is . Specifically, , where is largest integer such that Ex. , , .
Ex. Consider
Let q be a positive integer and
and,
Moreover, for any there exists an such that
Now suppose we have n training data and suppose . We know that in general, the minimum empirical risk classifier willconverge to the Bayes classifier at the rate of or slower. Therefore, it is unnecessary to drive the approximationerror down faster than So, we can restrict our attention of and, provided that the density of x is bound above. We have
Vapnik-Chervonenkis theory is based not on explicitly quantizing the collection of candidate functions, but rather on recognizingthat the richness of is limited in a certain sense by the number of training data. Indeed, given n i.i.d. training data,there are at most different binary labelings. Therefore, any collection may be divided into subsets of classifiers that are "equilvalent" with respect to the training data. In manycases a collection may not even be capable of producing different labellings.
Consider .
Suppose we have n training data: . With denotes the location of each training point in [0,1].Associated with each x is a label . Any classifier in will label all points to the left of a number as "1" or "0", and points to the right as "0" or "1", respectively.For , all points are either labelled "0" or "1". For , is labelled "0" or "1" and are label "1" or "0" and so on. We see that there are exactly different labellings; far less than !
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?