<< Chapter < Page | Chapter >> Page > |
So you can actually think of logistic regression as trying to approximate empirical risk minimization. Where instead of using this step function, which is non-convex, and gives you a hard optimization problem, it uses this line above – this curve above. So approximate it, so you have a convex optimization problem you can find the – maximum likelihood it’s in the parameters for logistic regression. And it turns out, support vector machine also can be viewed as approximated dysfunction to only a little bit different – let’s see, support vector machine turns out, can be viewed as trying to approximate this step function two over different approximation that’s linear, and then – that sort of [inaudible]linear that – our results goes this [inaudible] there, and then it goes up as a linear function there. And that’s – that is called the hinge class.
And so you can think of logistic regression and the support vector machine as different approximations to try to minimize this step function; okay? And that’s why I guess, all the theory we developed – even though SVM’s and logistic regression aren’t exactly due to empirical risk minimization, the theory we develop often gives the completely appropriate intuitions for SVM’s, and logistic regression; okay. So that was the last of the loose ends. And if you didn’t get this, don’t worry too much about it. It’s a high-level message. It’s just that SVM’s and logistic regression are reasonable to think of as approximations – empirical risk minimization algorithms. What I want to do next is move on to talk about model selection. Before I do that, let me just check for questions about this. Okay. Cool.
Okay. So in the theory that we started to develop in the previous lecture, and that we sort of wrapped up with a discussion on VC dimension, we saw that there’s often a trade-off between bias and variance. And in particular, so it is important not to choose a hypothesis that’s either too simple or too complex. So if your data has sort of a quadratic structure to it, then if you choose a linear function to try to approximate it, then you would under fit. So you have a hypothesis with high bias. And conversely, we choose a hypothesis that’s too complex, and you have high variance. And you’ll also fail to fit. Then you would over fit the data, and you’d also fail to generalize well. So model selection algorithms provide a class of methods to automatically trade – make these tradeoffs between bias and variance; right?
So remember the cartoon I drew last time of generalization error? I drew this last time. Where on the x-axis was model complexity, meaning the number of – the degree of the polynomial; the [inaudible] regression function or whatever. And if you have too simple a model, you have high generalization error, those under fitting. And you if have too complex a model, like 15 or 14-degree polynomial to five data points, then you also have high generalization error, and you’re over fitting. So what I wanna do now is actually just talk about model selection in the abstract; all right? Some examples of model selection problems will include – well, I’ll run the example of – let’s say you’re trying to choose the degree of a polynomial; right? What degree polynomial do you want to choose?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?