<< Chapter < Page | Chapter >> Page > |
And so if we just take what I wrote down as No. 2 of our previous optimization problem and add the scaling constraint, we then get the following optimization problem: min over wb.
I guess previously, we had a maximization over gamma hats divided by the normal w. So those maximize 1 over the normal w, but so that's the same as minimizing the normal w squared. It was great. Maximum normal w is min w – normal w squared. And then these are our constraints. Since I've added the constraint, the functional margin is over 1.
And this is actually my final – well, final formulation of the optimal margin classifier problem, at least for now.
So the picture to keep in mind for this, I guess, is that our optimization objective is once you minimize the normal w. And so our optimization objective is just the [inaudible] quadratic function. And [inaudible]those pictures [inaudible] can draw it. So it – if [inaudible]is w1 and w2, and you want to minimize the quadratic function like this – so quadratic function just has [inaudible] that look like this.
And moreover, you have a number of linear constraints in your parameters, so you may have linear constraints that eliminates that half space or linear constraint eliminates that half space [inaudible]. So there's that half space and so on.
And so the picture is you have a quadratic function, and you're ruling out various half spaces where each of these linear constraints. And I hope – if you can picture this in 3D, I guess [inaudible] kinda draw our own 3D, hope you can convince yourself that this is a convex problem that has no local optimum. But they be run great [inaudible]within this set of points that hasn't ruled out, then you convert to the global optimum.
And so that's the convex optimization problem. The – does this [inaudible] nice and [inaudible]. Questions about this?
Actually, just raise your hand if this makes sense. Okay, cool.
So this gives you the optimal margin classifier algorithm. And it turns out that this is the convex optimization problem, so you can actually take this formulation of the problem and throw it at off-the-shelf software – what's called a QP or quadratic program software.
This [inaudible] optimization is called a quadratic program, where the quadratic convex objective function and [inaudible]constraints – so you can actually download software to solve these optimization problems for you. Usually, as you wanna use the – use [inaudible] because you have constraints like these, although you could actually modify [inaudible]work with this, too.
So we could just declare success and say that we're done with this formulation of the problem. But what I'm going to do now is take a digression to talk about primal and duo optimization problems. And in particular, I'm going to – later, I'm going to come back and derive yet another very different form of this optimization problem. And the reason we'll do that is because it turns out this optimization problem has certain properties that make it amenable to very efficient algorithms.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?