<< Chapter < Page | Chapter >> Page > |
So, great. Oh, it turns out, just to mention one more thing that’s, kind of, cool. I said that if X given Y is Poisson, and you also go logistic posterior, it actually turns out there’s a more general version of this. If you assume X given Y = 1 is exponential family with parameter A to 1, and then you assume X given Y = 0 is exponential family with parameter A to 0, then this implies that PFY = 1 given X is also logistic, okay? And that’s, kind of, cool. It means that Y given X could be – I don’t know, some strange thing. It could be gamma because we’ve seen Gaussian right next to the – I don’t know, gamma exponential. They’re actually a beta.
I’m just rattling off my mental list of exponential family extrusions.
It could be any one of those things, so [inaudible] the same exponential family distribution for the two classes with different natural parameters than the posterior PFY given 1 given X – PFY = 1 given X would be logistic, and so this shows the robustness of logistic regression to the choice of modeling assumptions because it could be that the data was actually, you know, gamma distributed, and just still turns out to be logistic. So it’s the robustness of logistic regression to modeling assumptions.
And this is the density. I think, early on I promised two justifications for where I pulled the logistic function out of the hat, right? So one was the exponential family derivation we went through last time, and this is, sort of, the second one. That all of these modeling assumptions also lead to the logistic function. Yeah?
Student: [Off mic].
Instructor (Andrew Ng) :Oh, that Y = 1 given as the logistic then this implies that, no. This is also not true, right? Yeah, so this exponential family distribution implies Y = 1 is logistic, but the reverse assumption is also not true. There are actually all sorts of really bizarre distributions for X that would give rise to logistic function, okay?
Okay. So let’s talk about – those are first generative learning algorithm. Maybe I’ll talk about the second generative learning algorithm, and the motivating example, actually this is called a Naive Bayes algorithm, and the motivating example that I’m gonna use will be spam classification.
All right. So let’s say that you want to build a spam classifier to take your incoming stream of email and decide if it’s spam or not. So let’s see. Y will be 0 or 1, with 1 being spam email and 0 being non-spam, and the first decision we need to make is, given a piece of email, how do you represent a piece of email using a feature vector X, right? So email is just a piece of text, right? Email is like a list of words or a list of ASCII characters.
So I can represent email as a feature of vector X. So we’ll use a couple of different representations, but the one I’ll use today is we will construct the vector X as follows. I’m gonna go through my dictionary, and, sort of, make a listing of all the words in my dictionary, okay?
So the first word is RA. The second word in my dictionary is Aardvark, ausworth, okay? You know, and somewhere along the way you see the word “buy” in the spam email telling you to buy stuff. Tell you how you collect your list of words, you know, you won’t find CS229, right, course number in a dictionary, but if you collect a list of words via other emails you’ve gotten, you have this list somewhere as well, and then the last word in my dictionary was zicmergue, which pertains to the technological chemistry that deals with the fermentation process in brewing.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?