<< Chapter < Page | Chapter >> Page > |
Another example of a model selection problem would be if you’re trying to choose the parameter [inaudible], which was the bandwidth parameter in locally awaited linear regression or in some sort of local way to regression. Yet, another model selection problem is if you’re trying to choose the parameter c [inaudible]and as the [inaudible]; right? And so one known soft margin is the – we had this optimization objective; right? And the parameter c controls the tradeoff between how much you want to set for your example. So a large margin versus how much you want to penalize in this class [inaudible]example. So these are three specific examples of model selection problems.
And let’s come up with a method for semantically choosing them. Let’s say you have some finite set of models, and let’s write these as m1, m2, m3, and so on. For example, this may be the linear classifier or this may be the quadratic classifier, and so on; okay? Or this may also be – you may also take the bandwidth parameter [inaudible] and discretize it into a range of values, and you’re trying to choose from the most – discrete of the values. So let’s talk about how you would select an appropriate model; all right? Well, one thing you could do is you can pick all of these models, and train them on you’re training set. And then see which model has the lowest training error. So that’s a terrible idea, and why’s that?
Student: [Inaudible].
Instructor (Andrew Ng) :Right. Cool. Because of the over fit; right. And those – some of you are laughing that I asked that. So that’d be a terrible idea to choose a model by looking at your training set because well, obviously, you end up choosing the most complex model; right? And you choose a 10th degree polynomial because that’s what fits the training set.
So we come to model selection in a training set – several standard procedures to do this. One is hold out cross validation, and in hold out cross validation, we teach a training set. And we randomly split the training set into two subsets. We call it subset – take all the data you have and randomly split it into two subsets. And we’ll call it the training set, and the hold out cross validation subset. And then, you know, you train each model on just trading subset of it, and test it on your hold out cross validation set. And you pick the model with the lowest error on the hold out cross validation subset; okay?
So this is sort of a relatively straightforward procedure, and it’s commonly used where you train on 70 percent of the data. Then test all of your models. And 30 percent, you can take whatever has the smallest hold out cross validation error. And after this – you actually have a chose. You can actually – having taken all of these hypothesis trained on 70 percent of the data, you can actually just output the hypothesis that has the lowest error on your hold out cross validation set. And optionally, you can actually take the model that you selected and go back, and retrain it on all 100 percent of the data; okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?