<< Chapter < Page Chapter >> Page >

So the numerator says sum over all your emails and take into account all the emails that had class label one, take into account only of the emails that were spam because if Y equals zero, then this is zero, and this would go away, and then times sum over all the words in your spam email, and it counts up the number of times you observed the word K in your spam emails. So, in other words, the numerator is look at all the spam emails in your training set and count up the total number of times the word K appeared in this email. The denominator then is sum over I into our training set of whenever one of your examples is spam, you know, sum up the length of that spam email, and so the denominator is the total length of all of the spam emails in your training set.

And so the ratio is just out of all your spam emails, what is the fraction of words in all your spam emails that were word K, and that’s your estimate for the probability of the next piece of spam mail generating the word K in any given position, okay? At the end of the previous lecture, I talked about LaPlace smoothing, and so when you do that as well, you add one to the numerator and K to the denominator, and this is the LaPlace smoothed estimate of this parameter, okay? And similarly, you do the same thing for – and you can work out the estimates for the other parameters yourself, okay? So it’s very similar. Yeah?

Student: I’m sorry. On the right on the top, I was just wondering what the X of I is, and what the N of –

Instructor (Andrew Ng) :Right. So in this second event model, the definition for XI and the definition for N are different, right? So here – well, this is for one example XY. So here, N is the number of words in a given email, right? And if it’s the I email subscripting then this N subscript I, and so N will be different for different training examples, and here XI will be, you know, these values from 1 to 50,000, and XI is essentially the identity of the Ith word in a given piece of email, okay? So that’s why this is grouping, or this is a product over all the different words of your email of their probability the Ith word in your email, conditioned on Y. Yeah?

Student: [Off mic].

Instructor (Andrew Ng) :Oh, no, actually, you know what, I apologize. I just realized that overload the notation, right, and I shouldn’t have used K here. Let me use a different alphabet and see if that makes sense; does that make sense? Oh, you know what, I’m sorry. You’re absolutely right. Thank you. All right. So in LaPlace smoothing, that shouldn’t be K. This should be, you know, 50,000, if you have 50,000 words in your dictionary. Yeah, thanks. Great. I stole notation from the previous lecture and didn’t translate it properly. So LaPlace smoothing, right, this is the number of possible values that the random variable XI can take on. Cool. Raise your hand if this makes sense? Okay. Some of you, are there more questions about this? Yeah.

Student: On LaPlace smoothing, the denominator and the plus A is the number of values that Y could take?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask