<< Chapter < Page Chapter >> Page >

All right. So that wraps up what I wanted to say about – oh, so that, more or less, wraps up what I wanted to say about Naïve Bayes, and it turns out that for text classification, the Naïve Bayes algorithm with this second event model, the last Naïve Bayes model I presented with the multinomial event model, it turns out that almost always does better than the first Naïve Bayes model I talked about when you’re applying it to the specific case – to the specific of text classification, and one of the reasons is hypothesized for this is that this second model, the multinomial event model, takes into account the number of times a word appears in a document, whereas the former model doesn’t.

I should say that in truth that actually turns out not to be completely understood why the latter model does better than the former one for text classification, and, sort of, researchers are still debating about it a little bit, but if you ever have a text classification problem, you know, Naïve Bayes Classify is probably not, by far, the best learning algorithm out there, but it is relatively straightforward to implement, and it’s a very good algorithm to try if you have a text classification problem, okay? Still a question? Yeah.

Student: So the second model is still positioning a variant, right? It doesn’t actually care where the words are.

Instructor (Andrew Ng) :Yes, all right.

Student: And, I mean, X variable, if my model like you had exclamation in, does that usually do better if you have enough data?

Instructor (Andrew Ng) :Yeah, so the question is, sort of, the second model, right? The second model, the multinomial event model actually doesn’t care about the ordering of the words. You can shuffle all the words in the email, and it does exactly the same thing. So in natural language processing, there’s actually another name; it’s called a Unique Grand Model in natural language processing, and there’s some other models like, sort of, say, higher order markup models that take into account some of the ordering of the words. It turns out that for text classification, the models like the bigram models or trigram models, I believe they do only very slightly better, if at all, but that’s when you’re applying them to text classification, okay?

All right. So the next thing I want to talk about is to start again to discussion of non-linear classifiers. So it turns out – well, and so the very first classification algorithm we talked about was logistic regression, which had the forming form for hypothesis, and you can think of this as predicting one when this estimator probability is greater or equal to 0.5 and predicting zero, right, when this is less than 0.5, and given a training set, right? Logistic regression will maybe do grade and descends or something or use Newton’s method to find a straight line that reasonably separates the positive and negative classes.

But sometimes a data set just can’t be separated by a straight line, so is there an algorithm that will let you start to learn these sorts of non-linear division boundaries? And so how do you go about getting a non-linear classifier? And, by the way, one cool result is that remember when I said – when we talked about generative learning algorithms, I said that if you assume Y given X is exponential family, right, with parameter A, and if you build a generative learning algorithm using this, right, plus one, if this is A to one. This is exponential family with natural parameter A to zero, right.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask