<< Chapter < Page | Chapter >> Page > |
And then when you need to classify a new example, when you have a new patient, and you want to decide is this cancer malignant or benign, you then take your new cancer, and you match it to your model of malignant cancers, and you match it to your model of benign cancers, and you see which model it matches better, and depending on which model it matches better to, you then predict whether the new cancer is malignant or benign, okay?
So what I just described, just this cross of methods where you build a second model for malignant cancers and a separate model for benign cancers is called a generative learning algorithm, and let me just, kind of, formalize this. So in the models that we’ve been talking about previously, those were actually all discriminative learning algorithms, and studied more formally, a discriminative learning algorithm is one that either learns PFY given X directly, or even learns a hypothesis that outputs value 0, 1 directly, okay? So logistic regression is an example of a discriminative learning algorithm.
In contrast, a generative learning algorithm of models PFX given Y. The probability of the features given the class label, and as a technical detail, it also models PFY, but that’s a less important thing, and the interpretation of this is that a generative model builds a probabilistic model for what the features looks like, conditioned on the class label, okay? In other words, conditioned on whether a cancer is malignant or benign, it models probability distribution over what the features of the cancer looks like.
Then having built this model – having built a model for PFX given Y and PFY, then by Bayes rule, obviously, you can compute PFY given 1, conditioned on X. This is just PFX given Y = 1 × PFX ÷ PFX, and, if necessary, you can calculate the denominator using this, right? And so by modeling PFX given Y and modeling PFY, you can actually use Bayes rule to get back to PFY given X, but a generative model – generative learning algorithm starts in modeling PFX given Y, rather than PFY given X, okay?
We’ll talk about some of the tradeoffs, and why this may be a better or worse idea than a discriminative model a bit later. Let’s go for a specific example of a generative learning algorithm, and for this specific motivating example, I’m going to assume that your input feature is X and RN and are continuous values, okay?
And under this assumption, let me describe to you a specific algorithm called Gaussian Discriminant Analysis, and the, I guess, core assumption is that we’re going to assume in the Gaussian discriminant analysis model of that PFX given Y is Gaussian, okay?
So actually just raise your hand, how many of you have seen a multivariate Gaussian before – not a 1D Gaussian, but the higher range though? Okay, cool, like maybe half of you, two-thirds of you. So let me just say a few words about Gaussians, and for those of you that have seen it before, it’ll be a refresher.
So we say that a random variable Z is distributed Gaussian, multivariate Gaussian as – and the script N for normal with parameters mean U and covariance sigma squared. If Z has a density 1 over 2 Pi, sigma 2, okay? That’s the formula for the density as a generalization of the one dimension of Gaussians and no more the familiar bell-shape curve. It’s a high dimension vector value random variable Z.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?