<< Chapter < Page | Chapter >> Page > |
So let's look at the implications of this in terms of the KKT duo complementary condition again. So we have that alpha i is basically equal to 0. That necessarily implies that gi of w, b is equal to 0. In other words, this is an active constraint.
And what does this mean? It means that it actually turns out gi of wb equal to 0 that is – that means exactly that the training example xi, yi has functional margin equal to 1. Because this constraint was that the functional margin of every example has to be greater equal to 1. And so if this is an active constraint, it just – inequality holds that equality. That means that my training example i must have functional margin equal to exactly 1.
And so – actually, yeah, right now, I'll do this on a different board, I guess. So in pictures, what that means is that, you have some training sets, and you'll have some separating hyperplane. And so the examples with functional margin equal to 1 will be exactly those which are – so they're closest to my separating hyperplane.
So that's my equation. [Inaudible] equal to 0. And so in this – in this cartoon example that I've done, it'll be exactly – these three examples that have functional margin equal to 1, and all of the other examples as being further away than these three will have functional margin that is strictly greater than 1.
And the examples with functional margin equal to 1 will usually correspond to the ones where the corresponding Lagrange multipliers also not equal to 0. And again, it may not hold true. It may be the case that gi and alpha i equal to 0. But usually, when gi's not – is 0, alpha i will be non-0.
And so the examples of functional margin equal to 1 will be the ones where alpha i is not equal to 0.
One useful property of this is that as suggested by this picture and so true in general as well, it turns out that we find a solution to this – to the optimization problem, you find that relatively few training examples have functional margin equal to 1.
In this picture I've drawn, there are three examples with functional margin equal to 1. There are just few examples of this minimum possible distance to your separating hyperplane. And these are three – these examples of functional margin equal to 1 – they are what we're going to call the support vectors. And this needs the name support vector machine. There'll be these three points with functional margin 1 that we're calling support vectors.
And the fact that they're relatively few support vectors also means that usually, most of the alpha i's are equal to 0. So with alpha i equal to 0, for examples, though, not support vectors.
Let's go ahead and work out the actual optimization problem. So we have a [inaudible] margin optimization problem. So there we go and write down the margin, and because we only have inequality constraints where we really have gi star constraints, no hi star constraint. We have inequality constraints and no equality constraints, I'll only have Lagrange multipliers of type alpha – no betas in my generalized Lagrange. But my Lagrange will be one-half w squared minus. That's my Lagrange.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?