<< Chapter < Page | Chapter >> Page > |
I’d like to say, back when I was a Ph.D. student, when I was working through this proof, there was sort of a solid week where I would wake up, and go to the office at 9:00 a.m. Then I’d start reading the book that led up to this proof. And then I’d read from 9:00 a.m. to 6:00 p.m. And then I’d go home, and then the next day, I’d pick up where I left off. And it sort of took me a whole week that way, to understand this proof, so I thought I would inflict that on you.
Just to tie a couple of loose ends: what I’m about to do is, I’m about to just mention a few things that will maybe, feel a little bit like random facts. But I’m just gonna tie up just a couple of loose ends. And so let’s see, it turns out that – just so it will be more strong with you – so this bound was proved for an algorithm that uses empirical risk minimization, for an algorithm that minimizes 0-1 training error. So one question that some of you ask is how about support vector machines; right? How come SVM’s don’t over fit? And in the sequel of – remember our discussion on support vector machines said that you use kernels, and map the features in infinite dimensional feature space. And so it seems like the VC dimension should be infinite; n plus one and n is infinite.
So it turns out that the class of linear separators with large margin actually has low VC dimension. I wanna say this very quickly, and informally. It’s actually, not very important for you to understand the details, but I’m going to say it very informally. It turns out that I will give you a set of points. And if I ask you to consider only the course of lines that separate these points of a large margin [inaudible], so my hypothesis class will comprise only the linear position boundaries that separate the points of a large margin. Say with a margin, at least gamma; okay. And so I won’t allow a point that comes closer. Like, I won’t allow that line because it comes too close to one of my points.
It turns out that if I consider my data points all lie within some sphere of radius r, and if I consider only the course of linear separators is separate to data with a margin of at least gamma, then the VC dimension of this course is less than or equal to r squared over four gamma squared plus one; okay? So this funny symbol here, that just means rounding up. This is a ceiling symbol; it means rounding up x. And it turns out you prove – and there are some strange things about this result that I’m deliberately not gonna to talk about – but turns they can prove that the VC dimension of the class of linear classifiers with large margins is actually bounded. The surprising thing about this is that this is the bound on VC dimension that has no dependents on the dimension of the points x.
So in other words, your data points x combine an infinite dimensional space, but so long as you restrict attention to the class of your separators with large margin, the VC dimension is bounded. And so in trying to find a large margin separator – in trying to find the line that separates your positive and your negative examples with large margin, it turns out therefore, that the support vector machine is automatically trying to find a hypothesis class with small VC dimension. And therefore, it does not over fit. Alex?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?