<< Chapter < Page Chapter >> Page >

This derivation isn’t very hard, and you’re encouraged to go home and try to do it yourself. It’s really essentially the same math, and when you simply, it turns out you can simply the R of the [inaudible] multiplier away and you end up with just these constraints of the Alphas.

Just as an aside, I won’t derive these, either. It turns out that – remember, I wrote down the [inaudible] conditions in the last lecture. The necessary conditions for something to be an optimalsolution to constrain optimization problems. So if you used the [inaudible] conditions, it turns out you can actually derive conversions conditions, so we want to solve this optimization problem. When do we know the Alphas have converged to the global optimum?

It turns out you can use the following. I don’t want to say a lot about these. It turns out from the [inaudible] conditions you can derive these as the conversion conditions for an algorithm that you might choose to use to try to solve the optimization problem in terms of the Alphas.

That’s the L1 norm soft margin SVM, and this is the change the algorithm that lets us handle non-linearly separable data sets as well as single outliers that may still be linearly separable but you may choose not to separate [inaudible]. Questions about this? Raise your hand if this stuff makes sense at all. Great.

So the last thing I want to do is talk about an algorithm for actually solving this optimization problem. We wrote down this dual optimization problem with convergence criteria, so let’s come up with an efficient algorithm to actually solve this optimization problem. I want to do this partly to give me an excuse to talk about an algorithm called coordinate assent, which is useful to do.

What I actually want to do is tell you about an algorithm called coordinate assent, which is a useful algorithm to know about, and it’ll turn out that it won’t apply in the simplest form to this problem, but we’ll then be able to modify it slightly and then it’ll give us a very efficient algorithm for solving this [inaudible] optimization problem. That was the other reason that I had to derive the dual, not only so that we could use kernels but also so that we can apply an algorithm like the SMO algorithm.

First, let’s talk about coordinate assent, which is another [inaudible] optimization algorithm. To describe coordinate assent, I just want you to consider the problem of if we want to maximize some function W, which is a function of Alpha one through Alpha M with no constraints. So for now, forget about the constraint that the Alpha [inaudible]must be between zero and C. Forget about the constraint that some of YI Alpha I must be equal to zero. Then this is the coordinate assent algorithm.

It will repeat until convergence and will do for I equals one to M. The [inaudible] of coordinate assent essentially holds all the parameters except Alpha I fixed and then it just maximizes this function with respect to just one of the parameters. Let me write that as Alpha I gets updated as [inaudible]over Alpha I hat of W Alpha one Alpha I minus one. This is really the fancy way of saying hold everything except Alpha I fixed. Just optimize W by optimization objective with respect to only Alpha I. This is just a fancy way of writing it.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask