<< Chapter < Page | Chapter >> Page > |
Whereas in contrast, if I choose to change Alpha one and Alpha two at the same time, then I still have a constraint and so I know Alpha one and Alpha two must satisfy that linear constraint but at least this way I can change Alpha one if I also change Alpha two accordingly to make sure this satisfies the constraint.
Student: [Inaudible].
Instructor (Andrew Ng) :So Zeta was defined [inaudible]. So on each iteration, I have some setting of the parameters, Alpha one, Alpha two, Alpha three and so on, and I want to change Alpha one and Alpha two, say. So from the previous iteration, let’s say I had not validated the constraint, so that holds true, and so I’m just defining Zeta to be equal to this, because Alpha one Y1 plus Alpha two Y2 must be equal to sum from I equals [inaudible]to M of that, and so I’m just defining this to be Zeta.
Student: [Inaudible].
Instructor (Andrew Ng) :On every iteration, you change maybe a different pair of Alphas to update. The way you do this is something I don’t want to talk about. I’ll say a couple more words about that, but the basic outline of the algorithm is on every iteration of the algorithm, you’re going to select some Alpha I and Alpha J to update like on this board.
So that’s an Alpha I and an Alpha J to update via some [inaudible] and then you use the procedure I just described to actually update Alpha I and Alpha J. What I actually just talked about was the procedure to optimize W with respect to Alpha I and Alpha J. I didn’t actually talk about the [inaudible]for choosing Alpha I and Alpha J.
Student: What is the function MW?
Instructor (Andrew Ng) :MW is way up there. I’ll just write it again. W of Alpha is that function we had previously. W of Alpha was the sum over I – this is about solving the – it was that thing. All of this is about solving the optimization problem for the SVM, so this is the objective function we had, so that’s W of Alpha.
Student: [Inaudible]? Exchanging one of the Alphas – optimizing that one, you can make the other one that you have to change work, right?
Instructor (Andrew Ng) :What do you mean works?
Student: It will get farther from its optimal.
Instructor (Andrew Ng) :Let me translate it differently. What we’re trying to do is we’re trying to optimize the objective function W of Alpha, so the metric of progress that we care about is whether W of Alpha is getting better on every iteration, and so what is true for coordinate assent and for SMO is on every iteration; W of Alpha can only increase. It may stay the same or it may increase. It can’t get worse.
It’s true that eventually, the Alphas will converge at some value. It’s true that in intervening iterations, one of the Alphas may move further away and then closer and further and closer to its final value, but what we really care about is that W of Alpha is getting better every time, which is true.
Just a couple more words on SMO before I wrap up on this. One is that John Platt’s original algorithm talked about a [inaudible] for choosing which values or pairs, Alpha I and Alpha J, to update next is one of those things that’s not conceptually complicated but it’s very complicated to explain in words.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?