<< Chapter < Page | Chapter >> Page > |
Farther proof – be a source of learning theory balance, infinite hypothesis classes. This definition – given a set of d points, we say, a hypothesis class h shatters the set s, if h can realize any labeling on it; okay? And what I mean by realizing any labeling on it – the informal way of thinking about this is: if a hypothesis class has shattered the set s, what that means is that I can take these d points, and I can associate these d points with any caught set of labels y; right? So choose any set of labeling y for each of these d points. And if your hypothesis class shatters s, then that means that there will be a hypothesis that labels those d examples perfectly; okay? That’s what shattering means.
So let me just illustrate those in an example. So let’s say h is the class of all linear classifiers into e, and let’s say that s is this [inaudible] comprising two points; okay? So there are four possible labelings that computes with these two points. You can choose to label both positive; one positive, one negative, one negative, one positive or you can label both of them negative. And if the hypothesis class h classed all linear classifiers into the – then, for each of these training sets, I can sort of find a linear classifier that attains zero training error on each of these. Then on all possible labelings of this set of two points. And so I’ll say that the hypothesis class script h shatters this set s of two points; okay?
One more example – show you a larger example. Suppose my set s is now this set of three points; right? Then, I now have eight possible labelings for these three points; okay? And so for these three points, I now have eight possible labelings. And once again, I can – for each of these labelings, I can find the hypothesis in the hypothesis class that labels these examples correctly. And so once again, I see that – by definition, say, that my hypothesis class also shatters this set s.
Student: Right.
Instructor (Andrew Ng) :And then that – that terminology – h can realize any labeling on s. That’s obviously [inaudible]. Give it any set of labels and you can find a hypothesis that perfectly separates the positive and negative examples; okay? So how about this set? Suppose s is now this set of four points, then, you know, there are lots of labels. There are now 16 labelings we can choose on this; right? That’s one for instance, and this is another one; right? And so I can realize some labelings. But there’s no linear division boundary that can realize this labeling, and so h does not shatter this set of four points; okay? And I’m not really going to prove it here, but it turns out that you can show that in two dimensions, there is no set of four points that – the class of all linear classifiers can shatter; okay?
So here’s another definition. When I say that the – well, it’s called the VC dimension. These two people, Vapnik and Chervonenkis – so given a hypothesis class, the Vapnik and Chervonenkis dimension of h, which we usually write as VC of script h, is the size of the larger set that is shattered by this set – by h. And if a hypothesis class can shatter arbitrarily large sets, then the VC dimension is infinite. So just as a kind of good example: if h is the class of all linear classifiers into d, then the VC dimension of the set is equal to three because we saw just now that there is a size of – there was a set s of size three that it could shatter, and I don’t really prove it. But it turns out there is no sets of size four that it can shatter. And therefore, the VC dimension of this is three. Yeah?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?