<< Chapter < Page | Chapter >> Page > |
In some of the tests I’ve even, by the way, said select these features, but this is one way to think about creating your feature vector, right, as zero and one values, okay? Moving on, yeah. Okay. Ask a question?
Student: I’m getting, kind of, confused on how you compute all those parameters.
Instructor (Andrew Ng) :On how I came up with the parameters?
Student: Correct.
Instructor (Andrew Ng) :Let’s see. So in Naive Bayes, what I need to do – the question was how did I come up with the parameters, right? In Naive Bayes, I need to build a model for PFX given Y and for PFY, right? So this is, I mean, in generous of learning algorithms, I need to come up with models for these. So how’d I model PFY? Well, I just those to model it using a Bernoulli distribution, and so PFY will be parameterized by that, all right?
Student: Okay.
Instructor (Andrew Ng) :And then how’d I model PFX given Y? Well, let’s keep changing bullets. My model for PFX given Y under the Naive Bayes assumption, I assume that PFX given Y is the product of these probabilities, and so I’m going to need parameters to tell me what’s the probability of each word occurring, you know, of each word occurring or not occurring, conditions on the email being spam or not spam email, okay?
Student: How is that Bernoulli?
Instructor (Andrew Ng) :Oh, because X is either zero or one, right? By the way I defined the feature vectors, XI is either one or zero, depending on whether words I appear as in the email, right? So by the way I define the feature vectors, XI – the XI is always zero or one. So that by definition, if XI, you know, is either zero or one, then it has to be a Bernoulli distribution, right? If XI would continue as then you might model this as Gaussian and say you end up like we did in Gaussian discriminant analysis. It’s just that the way I constructed my features for email, XI is always binary value, and so you end up with a Bernoulli here, okay? All right. I should move on.
So it turns out that this idea almost works. Now, here’s the problem. So let’s say you complete this class and you start to do, maybe do the class project, and you keep working on your class project for a bit, and it becomes really good, and you want to submit your class project to a conference, right? So, you know, around – I don’t know, June every year is the conference deadline for the next conference. It’s just the name of the conference; it’s an acronym.
And so maybe you send your project partners or senior friends even, and say, “Hey, let’s work on a project and submit it to the NIPS conference.” And so you’re getting these emails with the word “NIPS” in them, which you’ve probably never seen before, and so a piece of email comes from your project partner, and so you go, “Let’s send a paper to the NIPS conference.”
And then your stamp classifier will say PFX – let’s say NIPS is the 30,000th word in your dictionary, okay? So X30,000 given the 1, given Y = 1 will be equal to 0. That’s the maximum likelihood of this, right? Because you’ve never seen the word NIPS before in your training set, so maximum likelihood of the parameter is that probably have seen the word NIPS is zero, and, similarly, you know, in, I guess, non-spam mail, the chance of seeing the word NIPS is also estimated as zero.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?