<< Chapter < Page | Chapter >> Page > |
That last constraint was the constraint I got from this – the sum of i – sum of i, yi alpha i equals to 0. But that's where that [inaudible] came from.
Let me just – I think in previous years that I taught this, where this constraint comes from is just – is slightly confusing. So let me just take two minutes to say what the real interpretation of that is. And if you don't understand it, it's not a big deal, I guess.
So when we took the partial derivative of the Lagrange with respect to b, we end up with this constraint that sum of i, yi, alpha i must be equal to 0. The interpretation of that, it turns out, is that if sum of i, yi, alpha i is not equal to 0, then theta d of wb is – actually, excuse me. Then theta d of alpha is equal to minus infinity for minimizing.
So in other words, it turns out my Lagrange is actually a linear function of my parameters b. And so the interpretation of that constraint we worked out previously was that if sum of i or yi, alpha i is not equal to 0, then theta d of alpha is equal to minus infinity. And so if your goal is to maximize as a function of alpha, theta d of alpha, then you've gotta choose values of alpha for which sum of yi alpha is equal to 0.
And then when sum of yi alpha is equal to 0, then theta d of alpha is equal to w of alpha. And so that's why we ended up deciding to maximize w of alpha subject to that sum of yi alpha is equal to 0.
Yeah, the – unfortunately, the fact of that d would be [inaudible] adds just a little bit of extra notation in our derivation of the duo. But by the way, and [inaudible]all the action of the optimization problem is with w because b is just one parameter.
So let's check. Are there any questions about this?
Okay, cool. So what derived a duo optimization problem – and really, don't worry about this if you're not quite sure where this was. Just think of this as we worked out this constraint, and we worked out, and we took partial derivative with respect to b, that this constraint has the [inaudible] and so I just copied that over here.
But so – worked out the duo of the optimization problem, so our approach to finding – to deriving the optimal margin classifier or support vector machine will be that we'll solve along this duo optimization problem for the parameters alpha star. And then if you want, you can then – this is the equation that we worked out on the previous board. We said that w – this [inaudible] alpha – w must be equal to that. And so once you solve for alpha, you can then go back and quickly derive w in parameters to your primal problem. And we worked this out earlier.
And moreover, once you solve alpha and w, you can then focus back into your – once you solve for alpha and w, it's really easy to solve for v, so that b gives us the interpretation of [inaudible] training set, and you found the direction for w. So you know where your separating hyperplane's direction is. You know it's got to be one of these things. And you know the orientation and separating hyperplane. You just have to decide where to place this hyperplane. And that's what solving b is.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?