<< Chapter < Page | Chapter >> Page > |
If I just try to change Alpha one, Alpha one is actually exactly determined as a function of the other Alphas because this was sum to zero. The SMO algorithm, by the way, is due to John Platt, a colleague at Microsoft. The SMO algorithm, therefore, instead of trying to change one Alpha at a time, we will try to change two Alphas at a time. This is called the SMO algorithm, in a sense the sequential minimal optimization and the term minimal refers to the fact that we’re choosing the smallest number of Alpha Is to change at a time, which in this case, we need to change at least two at a time.
So then go ahead and outline the algorithm. We will select two Alphas to change, some Alpha I and Alpha J [inaudible] – it just means a rule of thumb. We’ll hold all the Alpha Ks fixed except Alpha I and Alpha J and optimize W [inaudible]Alpha with respect to Alpha I and Alpha J subject to all the constraints. It turns out the key step which I’m going to work out is this one, which is how do you optimize W of Alpha with respect to the two parameters that you just chose to update and subject to the constraints? I’ll do that in a second.
You would keep running this algorithm until you have satisfied these convergence criteria up to Epsilon. What I want to do now is describe how to do this [inaudible] – how to optimize W of Alpha with respect to Alpha I and Alpha J, and it turns out that it’s because you can do this extremely efficiently that the SMO algorithm works well. The [inaudible]for the SMO algorithm can be done extremely efficiently, so it may take a large number of iterations to converge, but each iteration is very cheap.
Let’s talk about that. So in order to derive that step where we update in respect to Alpha I and Alpha J, I’m actually going to use Alpha one and Alpha two as my example. I’m gonna update Alpha one and Alpha two. In general, this could be any Alpha I and Alpha J, but just to make my notation on the board easier, I’m going to derive the derivation for Alpha one and Alpha two and the general [inaudible] completely analogous.
On every step of the algorithm with respect to constraint, that sum over I Alpha I YI is equal to zero. This is one of the constraints we had previously for our dual optimization problem. This means that Alpha one Y1 plus Alpha two Y2 must be equal to this, to which I’m going to denote by Zeta. So we also have the constraint that the Alpha Is must be between zero and C. We had two constraints on our dual. This was one of the constraints. This was the other one.
In pictures, the constraint that the Alpha Is is between zero and C, that is often called the Bosk constraint, and so if I draw Alpha one and Alpha two, then I have a constraint that the values of Alpha one and Alpha two must lie within this box that ranges from zero to C. And so the picture of the algorithm is as follows. I have some constraint that Alpha one Y1 plus Alpha two Y2 must equal to Zeta, and so this implies that Alpha one must be equal to Zeta minus Alpha two Y2 over Y1, and so what I want to do is I want to optimize the objective with respect to this.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?