<< Chapter < Page | Chapter >> Page > |
This is coordinate assent. One picture that’s kind of useful for coordinate assent is if you imagine you’re trying to optimize a quadratic function, it really looks like that. These are the contours of the quadratic function and the minimums here. This is what coordinate assent would look like. These are my [inaudible] call this Alpha two and I’ll call this Alpha one. My Alpha one Alpha two axis, and so let’s say I start down here. Then I’m going to begin by minimizing this with respect to Alpha one. I go there. And then at my new point, I’ll minimize with respect to Alpha two, and so I might go to someplace like that.
Then, I’ll minimize with respect to Alpha one goes back to Alpha two and so on. You’re always taking these axis-aligned steps to get to the minimum. It turns out that there’s a modification to this. There are variations of this as well. The way I describe the algorithm, we’re always doing this in alternating order. We always optimize with respect to Alpha one then Alpha two, then Alpha one, then Alpha two. What I’m about to say applies only in higher dimensions, but it turns out if you have a lot of parameters, Alpha one through Alpha M, you may not choose to always visit them in a fixed order.
You may choose which Alphas update next depending on what you think will allow you to make the most progress. If you have only two parameters, it makes sense to alternate between them. If you have higher dimensional parameters, sometimes you may choose to update them in a different order if you think doing so would let you make faster progress towards the maximum.
It turns out that coordinate assent compared to some of the algorithms we saw previously – compared to, say, Newton’s method, coordinate assent will usually take a lot more steps, but the chief advantage of coordinate assent when it works well is that sometimes the optimization objective W sometimes is very inexpensive to optimize W with respect to any one of your parameters, and so coordinate assent has to take many more iterations than, say, Newton’s method in order to converge.
It turns out that there are many optimization problems for which it’s particularly easy to fix all but one of the parameters and optimize with respect to just that one parameter, and if that’s true, then the inner group of coordinate assent with optimizing with respect to Alpha I can be done very quickly and cause [inaudible]. It turns out that this will be true when we modify this algorithm to solve the SVM optimization problem. Questions about this? Okay.
Let’s go ahead and apply this to our support vector machine dual optimization problem. It turns out that coordinate assent in its basic form does not work for the following reason. The reason is we have constrains on the Alpha Is. Mainly, what we can recall from what we worked out previously, we have a constraint that the sum of [inaudible] Y Alpha I must be equal to zero, and so if you fix all the Alphas except for one, then you can’t change one Alpha without violating the constraint.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?