<< Chapter < Page | Chapter >> Page > |
So when I say output best hypothesis found, I mean of all of the hypothesis you’ve seen during this entire procedure, pick the one with the lowest cross validation error that you saw; okay? So that’s forward selection. So let’s see, just to give this a name, this is an incidence of what’s called wrapper feature selection. And the term wrapper comes from the fact that this feature selection algorithm that I just described is a forward selection or forward search. It’s a piece of software that you write that wraps around your learning algorithm. In the sense that to perform forward selection, you need to repeatedly make cause to your learning algorithm to train your model, using different subsets of features; okay? So this is called wrapper model feature selection.
And it tends to be somewhat computationally expensive because as you’re performing the search process, you’re repeatedly training your learning algorithm over and over and over on all of these different subsets of features. Let’s just mention also, there is a variation of this called backward search or backward selection, which is where you start with f equals the entire set of features, and you delete features one at a time; okay? So that’s backward search or backward selection. And this is another feature selection algorithm that you might use. Part of whether this makes sense is really – there will be problems where it really doesn’t even make sense to initialize f to be the set of all features.
So if you have 100 training examples and 10,000 features, which may well happen – 100 emails and 10,000 training – 10,000 features in email, then 100 training examples – then depending on the learning algorithm you’re using, it may or may not make sense to initialize the set f to be all features, and train them all by using all features. And if it doesn’t make sense, then you can train them all by using all features; then forward selection would be more common.
So let’s see. Wrapper model feature selection algorithms tend to work well. And in particular, they actually often work better than a different class of algorithms I’m gonna talk about now. But their main disadvantage is that they’re computationally very expensive. Do you have any questions about this before I talk about the other? Yeah?
Student: [Inaudible].
Instructor (Andrew Ng) :Yeah – yes, you’re actually right. So forward search and backward search, both of these are searcheristics, and you cannot – but for either of these you cannot guarantee they’ll find the best subset of features. It actually turns out that under many formulizations of the feature selection problems – it actually turns out to be an empty heart problem, to find the best subset of features. But in practice, forward selection backward selection work fine, and you can also envision other search algorithms where you sort of have other methods to search through the space up to the end possible feature subsets.
So let’s see. Wrapper feature selection tends to work well when you can afford to do it computationally. But for problems such as text classification – it turns out for text classification specifically because you have so many features, and easily have 50,000 features. Forward selection would be very, very expensive. So there’s a different class of algorithms that will give you – that tends not to do as well in the sense of generalization error. So you tend to learn the hypothesis that works less well, but is computationally much less expensive. And these are called the filter feature selection methods. And the basic idea is that for each feature i will compute some measure of how informative xi is about y; okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?