<< Chapter < Page | Chapter >> Page > |
Student: Can you give us arc x?
Instructor (Andrew Ng) :Oh, let’s see. So if you take – actually let me. So the min of – arcomatics means the value for Y that maximizes this.
Student: Oh, okay.
Instructor (Andrew Ng) :So just for an example, the min of X - 5 squared is 0 because by choosing X equals 5, you can get this to be zero, and the argument over X of X - 5 squared is equal to 5 because 5 is the value of X that makes this minimize, okay? Cool. Thanks for asking that.
Instructor (Andrew Ng) :Okay. Actually any other questions about this? Yeah?
Student: Why is distributive removing? Why isn’t [inaudible] –
Instructor (Andrew Ng) :Oh, I see. By uniform I meant – I was being loose here. I meant if PFY = 0 is equal to PFY = 1, or if Y is the uniform distribution over the set 0 and 1.
Student: Oh.
Instructor (Andrew Ng) :I just meant – yeah, if PFY = 0 zero = PFY given 1. That’s all I mean, see? Anything else?
All right. Okay. So it turns out Gaussian discriminant analysis has an interesting relationship to logistic regression. Let me illustrate that. So let’s say you have a training set – actually let me just go ahead and draw 1D training set, and that will kind of work, yes, okay.
So let’s say we have a training set comprising a few negative and a few positive examples, and let’s say I run Gaussian discriminate analysis. So I’ll fit Gaussians to each of these two densities – a Gaussian density to each of these two – to my positive and negative training examples, and so maybe my positive examples, the X’s, are fit with a Gaussian like this, and my negative examples I will fit, and you have a Gaussian that looks like that, okay?
Now, I hope this [inaudible]. Now, let’s vary along the X axis, and what I want to do is I’ll overlay on top of this plot. I’m going to plot PFY = 1 – no, actually, given X for a variety of values X, okay? So I actually realize what I should have done. I’m gonna call the X’s the negative examples, and I’m gonna call the O’s the positive examples. It just makes this part come in better.
So let’s take a value of X that’s fairly small. Let’s say X is this value here on a horizontal axis. Then what’s the probability of Y being equal to one conditioned on X? Well, the way you calculate that is you write PFY = 1 given X, and then you plug in all these formulas as usual, right? It’s PFX given Y = 1, which is your Gaussian density, times PFY = 1, you know, which is essentially – this is just going to be equal to phi, and then divided by, right, PFX, and then this shows you how you can calculate this.
By using these two Gaussians and my phi on PFY, I actually compute what PFY = 1 given X is, and in this case, if X is this small, clearly it belongs to the left Gaussian. It’s very unlikely to belong to a positive class, and so it’ll be very small; it’ll be very close to zero say, okay? And then we can increment the value of X a bit, and study a different value of X, and plot what is the PFY given X – PFY = 1 given X, and, again, it’ll be pretty small.
Let’s use a point like that, right? At this point, the two Gaussian densities have equal value, and if I ask if X is this value, right, shown by the arrow, what’s the probably of Y being equal to one for that value of X? Well, you really can’t tell, so maybe it’s about 0.5, okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?