<< Chapter < Page | Chapter >> Page > |
So for example, this is a very nice data set. It looks like there’s a great decision boundary that separates the two [inaudible]. Well, what if I had just one outlier down here? I could still linearly separate this data set with something like that, but I’m somehow letting one slightly suspicious example skew my entire decision boundary by a lot, and so what I’m going to talk about now is the L1 norm soft margin SVM, which is a slightly modified formulation of the SVM optimization problem.
They will let us deal with both of these cases – one where one of the data’s just not linearly separable and two, what if you have some examples that you’d rather not get [inaudible] in a training set. Maybe with an outlier here, maybe you actually prefer to choose that original decision boundary and not try so hard to get that training example. Here’s the formulation. Our SVM primal problem was to minimize one-half [inaudible]W squared.
So this is our original problem, and I’m going to modify this by adding the following. In other words, I’m gonna add these penalty terms, CIs, and I’m going to demand that each of my training examples is separated with functional margin greater than or equal to one minus CI, and you remember if this is greater than zero – was it two lectures ago that I said that if the function margin is greater than zero, that implies you classified it correctly. If it’s less than zero, then you misclassified it.
By setting some of the CIs to be larger than one, I can actually have some examples with functional margin negative, and therefore I’m allowing my algorithm to misclassify some of the examples of the training set. However, I’ll encourage the algorithm not to do that by adding to the optimization objective, this sort of penalty term that penalizes setting CIs to be large. This is an optimization problem where the parameters are WB and all of the CIs and this is also a convex optimization problem. It turns out that similar to how we worked on the dual of the support vector machine, we can also work out the dual for this optimization problem.
I won’t actually do it, but just to show you the steps, what you do is you construct [inaudible] Alpha R, and I’m going to use Alpha and R to denote the [inaudible]multipliers no corresponding to this set of constraints that we had previously and this new set of constraints on the CI [inaudible] zero. This gives us a use of the [inaudible]multipliers. The [inaudible] will be optimization objective minus sum from plus CI minus – and so there’s our [inaudible]optimization objective minus or plus Alpha times each of these constraints, which are greater or equal to zero.
I won’t redivide the entire dual again, but it’s really the same, and when you derive the dual of this optimization problem and when you simplify, you find that you get the following. You have to maximize [inaudible], which is actually the same as before. So it turns out when you derive the dual and simply, it turns out that the only way the dual changes compared to the previous one is that rather than the constraint that the Alpha [inaudible]are greater than or equal to zero, we now have a constraint that the Alphas are between zero and C.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?