<< Chapter < Page | Chapter >> Page > |
And then I’ll look at the negative examples, the O’s in this figure, and I’ll fit a Gaussian to that, and maybe I get a Gaussian centered over there. This is the concept of my second Gaussian, and together – we’ll say how later – together these two Gaussian densities will define a separator for these two classes, okay?
And it’ll turn out that the separator will turn out to be a little bit different from what logistic regression gives you. If you run logistic regression, you actually get the division bound to be shown in the green line, whereas Gaussian discriminant analysis gives you the blue line, okay?
Switch back to chalkboard, please. All right. Here’s the Gaussian discriminant analysis model, put into model PFY as a Bernoulli random variable as usual, but as a Bernoulli random variable and parameterized by parameter phi; you’ve seen this before. Model PFX given Y = 0 as a Gaussian – oh, you know what?
Yeah, yes, excuse me. I thought this looked strange. This should be a sigma, determined in a sigma to the one-half of the denominator there. It’s no big deal. It was – yeah, well, okay. Right. I was listing the sigma to the determining the sigma to the one-half on a previous board, excuse me.
Okay, and so I model PFX given Y = 0 as a Gaussian with mean mew0 and covariance sigma to the sigma to the minus one-half, and – okay? And so the parameters of this model are phi, mew0, mew1, and sigma, and so I can now write down the likelihood of the parameters as – oh, excuse me, actually, the log likelihood of the parameters as the log of that, right?
So, in other words, if I’m given the training set, then they can write down the log likelihood of the parameters as the log of, you know, the probative probabilities of PFXI, YI, right? And this is just equal to that where each of these terms, PFXI given YI, or PFYI is then given by one of these three equations on top, okay?
And I just want to contrast this again with discriminative learning algorithms, right? So to give this a name, I guess, this sometimes is actually called the Joint Data Likelihood – the Joint Likelihood, and let me just contrast this with what we had previously when we’re talking about logistic regression. Where I said with the log likelihood of the parameter’s theater was log of a product I = 1 to M, PFYI given XI and parameterized by a theater, right?
So back where we’re fitting logistic regression models or generalized learning models, we’re always modeling PFYI given XI and parameterized by a theater, and that was the conditional likelihood, okay, in which we’re modeling PFYI given XI, whereas, now, regenerative learning algorithms, we’re going to look at the joint likelihood which is PFXI, YI, okay?
So let’s see. So given the training sets and using the Gaussian discriminant analysis model to fit the parameters of the model, we’ll do maximize likelihood estimation as usual, and so you maximize your L with respect to the parameters phi, mew0, mew1, sigma, and so if we find the maximum likelihood estimate of parameters, you find that phi is – the maximum likelihood estimate is actually no surprise, and I’m writing this down mainly as a practice for indicating notation, all right?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?