<< Chapter < Page Chapter >> Page >

Review of past lecture

In our past lectures we considered collections of candidate function F that were either finite or enumerable . We then constructed penalties, usually codelengths, for eachcandidate c ( f ) , f F , such that f F 2 c ( f ) 1 This allowed us to derive uniform concentration inequalities over the entire set F using the union bound. However, in many cases the collections F may be uncountably infinite. A simple example is the collection F of a single threshold classifier in 1-d having the form

f t ( x ) = 1 { x t }

and their complements

f s ( t ) = 1 { x < s } .

Thus, F contains an uncountable number of classifiers, and we cannot apply the union bound argument in such cases.

Two ways to proceed

Discretize or quantize the collection

To quantize F

F q = f , f ( x ) = 1 { x 1 / q i i { 0 , 1 , ... , q } }

q is positive, such that f q F q

| f - f q | c / q

if the density of x is bounded by c > 0 . q < n 1 / 2 .

Identical empirical errors

Consider the fact that given only n training data, many of the classifiers in such a collection may produce identical empiricalerrors. Also, many f F will produce identical label assignments on the data. We will have at most 2 n unique labels.

f is uncountable, its interceptions are countable and bounded by 2 n . n intervals with 2 classifier per interval.

The number of distinct labeling assignments that a class F can produce on a set of n points is denoted

S ( F , n ) 2 n

The VC dimension is l o g S ( F , n ) . Specifically, V C ( F ) = k , where k is largest integer such that S ( F , k ) = 2 k Ex. 2 n = 2 n , n = 2 , V C ( F ) = 2 .

Ex. Consider

F = f : f ( x ) = 1 { x t } o r f ( x ) = 1 { x < t } , t [ 0 , 1 ]

Let q be a positive integer and

F q = f : f ( x ) = 1 { x i / q } o r f ( x ) = 1 { x < i / q } , i { 0 , 1 , ... , q }

and,

| f q | = 2 ( q + 1 ) .

Moreover, for any f F there exists an f 1 F q such that

| f ( x ) - f q ( x ) | d x ( i - 1 ) / q i / q 1 d x = 1 / q .

Now suppose we have n training data and suppose f * F . We know that in general, the minimum empirical risk classifier willconverge to the Bayes classifier at the rate of n - 1 / 2 or slower. Therefore, it is unnecessary to drive the approximationerror down faster than n - 1 / 2 So, we can restrict our attention of F n - 1 / 2 and, provided that the density of x is bound above. We have

m i n f F n - 1 / 2 R ( f ) - R ( f * ) C f q m i n | f * ( x ) - f ( x ) | d x c / n 1 / 2 .

Vapnik-Chervonenkis theory is based not on explicitly quantizing the collection of candidate functions, but rather on recognizingthat the richness of F is limited in a certain sense by the number of training data. Indeed, given n i.i.d. training data,there are at most 2 n different binary labelings. Therefore, any collection F may be divided into 2 n subsets of classifiers that are "equilvalent" with respect to the training data. In manycases a collection may not even be capable of producing 2 n different labellings.

Example

Consider X = [ 0 , 1 ] .

F = f : f ( x ) = 1 { x t } o r f ( x ) = 1 { x < t } t [ 0 , 1 ]

Suppose we have n training data: ( x 1 , ... , x n ) [ 0 , 1 ] . With x s denotes the location of each training point in [0,1].Associated with each x is a label y { 0 , 1 } . Any classifier in F will label all points to the left of a number t [ 0 , 1 ] as "1" or "0", and points to the right as "0" or "1", respectively.For t [ 0 , x 1 ) , all points are either labelled "0" or "1". For t ( x 1 , x 2 ) , x 1 is labelled "0" or "1" and x 2 ... x n are label "1" or "0" and so on. We see that there are exactly 2 n different labellings; far less than 2 n !

Questions & Answers

what are components of cells
ofosola Reply
twugzfisfjxxkvdsifgfuy7 it
Sami
58214993
Sami
what is a salt
John
the difference between male and female reproduction
John
what is computed
IBRAHIM Reply
what is biology
IBRAHIM
what is the full meaning of biology
IBRAHIM
what is biology
Jeneba
what is cell
Kuot
425844168
Sami
what is cytoplasm
Emmanuel Reply
structure of an animal cell
Arrey Reply
what happens when the eustachian tube is blocked
Puseletso Reply
what's atoms
Achol Reply
discuss how the following factors such as predation risk, competition and habitat structure influence animal's foraging behavior in essay form
Burnet Reply
cell?
Kuot
location of cervical vertebra
KENNEDY Reply
What are acid
Sheriff Reply
define biology infour way
Happiness Reply
What are types of cell
Nansoh Reply
how can I get this book
Gatyin Reply
what is lump
Chineye Reply
what is cell
Maluak Reply
what is biology
Maluak
what is vertibrate
Jeneba
what's cornea?
Majak Reply
what are cell
Achol
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask