<< Chapter < Page Chapter >> Page >

So for example, you see the word buy or Viagra, those are words that are very useful. So you – words, some you spam and non-spam. You see the word Stanford or machine-learning or your own personal name. These are other words that are useful for telling you whether something is spam or non-spam. So in feature selection, we would like to select a subset of the features that may be or hopefully the most relevant ones for a specific learning problem, so as to give ourselves a simpler learning – a simpler hypothesis class to choose from. And then therefore, reduce the risk of over fitting. Even when we may have had 50,000 features originally.

So how do you do this? Well, if you have n features, then there are two to the n possible subsets; right? Because, you know, each of your n features can either be included or excluded. So there are two to the n possibilities. And this is a huge space. So in feature selection, what we most commonly do is use various searcheristics – sort of simple search algorithms to try to search through this space of two to the n possible subsets of features; to try to find a good subset of features. This is too large a number to enumerate all possible feature subsets.

And as a complete example, this is the forward search algorithm; it’s also called the forward selection algorithm. It’s actually pretty simple, but I’ll just write it out. My writing it out will make it look more complicated than it really is, but it starts with – initialize the sets script f to be the empty set, and then repeat for i equals one to n; try adding feature i to the set scripts f, and evaluate the model using cross validation. And by cross validation, I mean any of the three flavors, be it simple hold out cross validation or k-fold cross validation or leave one out cross validation. And then, you know, set f to be equal to f union, I guess. And then the best feature found is f 1, I guess; okay? And finally, you would – okay?

So forward selections, procedure is: follow through the empty set of features. And then on each generation, take each of your features that isn’t already in your set script f and you try adding that feature to your set. Then you train them all, though, and evaluate them all, though, using cross validation. And basically, figure out what is the best single feature to add to your set script f. In step two here, you go ahead and add that feature to your set script f, and you get it right. And when I say best feature or best model here – by best, I really mean the best model according to hold out cross validation. By best, I really mean the single feature addition that results in the lowest hold out cross validation error or the lowest cross validation error. So you do this adding one feature at a time.

When you terminate this a little bit, as if you’ve added all the features to f, so f is now the entire set of features; you can terminate this. Or if by some rule of thumb, you know that you probably don’t ever want more than k features, you can also terminate this if f is already exceeded some threshold number of features. So maybe if you have 100 training examples, and you’re fitting logistic regression, you probably know you won’t want more than 100 features. And so you stop after you have 100 features added to set f; okay? And then finally, having done this, output of best hypothesis found; again, by best, I mean, when learning this algorithm, you’d be seeing lots of hypothesis. You’d be training lots of hypothesis, and testing them using cross validation.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask