<< Chapter < Page | Chapter >> Page > |
So yi is +1. We would like w transpose xi + b to be very large. And it makes i – if, excuse me, if yi is -1, then we'd w transpose xi + b to be a large negative number. So we'd sort of like functional margins to be large.
We also said – functional margin is a strange property – that you can increase functional margin just by, say, taking your parameters, w and b, and multiplying them by 2.
And then we also defined the geometric margin, which was that we just – essentially, the functional margin divided by the normal w.
And so the geometric margin had the interpretation as being – I'll give you a few examples. The geometric margin, for example, is – has the interpretation as a distance between a training example and a hyperplane.
And it'll actually be a sin distance, so that this distance will be positive if you're classifying the example correctly. And if you misclassify the example, this distance – it'll be the minus of the distance, reaching the point, reaching the training example. And you're separating hyperplane. Where you're separating hyperplane is defined by the equation w transpose x + b = 0.
So – oh, well, and I guess also defined these things as the functional margin, geometric margins, respect to training set I defined as the worst case or the minimum functional geometric margin.
So in our development of the optimal margin classifier, our learning algorithm would choose parameters w and b so as to maximize the geometric margin. So our goal is to find the separating hyperplane that separates the positive and negative examples with as large a distance as possible between hyperplane and the positive and negative examples.
And if you go to choose parameters w and b to maximize this, [inaudible] one copy of the geometric margin is that you can actually scale w and b arbitrarily. So you look at this definition for the geometric margin. I can choose to multiply my parameters w and b by 2 or by 10 or any other constant. And it doesn't change my geometric margin.
And one way of interpreting that is you're looking at just separating hyperplane. You look at this line you're separating by positive and negative training examples. If I scale w and b, that doesn't change the position of this plane, though because the equation wh + b = 0 is the same as equation 2 w transpose x + 2b = 0. So it use the same straight line.
And what that means is that I can actually choose whatever scaling for w and b is convenient for me. And in particular, we use in a minute, I can [inaudible] perfect constraint like that the normal w [inaudible]1 because this means that you can find a solution to w and b. And then by rescaling the parameters, you can easily meet this condition, this rescaled w [inaudible] 1. And so I can add the condition like this and then essentially not change the problem.
Or I can add other conditions. I can actually add a condition that – excuse me, the absolute value of w1 = 1. I can have only one of these conditions right now [inaudible]. And adding condition to the absolute value – the first component of w must be to 1. And again, you can find the absolute solution and just rescale w and meet this condition.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?