<< Chapter < Page | Chapter >> Page > |
And again, just to continue summarizing what we did last time, we said that in factor analysis our model – let’s see. This is an unsupervised learning problem, and so we’re given an unlabeled training set where each SI is a vector in RN as usual. We want to model the density of X, and our model for X would be that we imagine there’s a latent random variable Z that’s generating this [inaudible] zero mean in identity covariance. And Z will be some low dimensional thing [inaudible]and we [inaudible]. And we imagine that X is generated as mu plus lambda Z plus epsilon where epsilon is a Gaussian random variable with mean zero and a covariance matrix psi. And so the parameters of this model are mu which N-dimensional vector, lambda, which is an N by D-dimensional vector – matrix, and psi which is N by N and is diagonal. N is a diagonal matrix. So the cartoon I drew last time for factor analysis was – I said that maybe that’s the typical example of data point ZI if – and in this example I had D equals one, N equals two. So Z in this example is one-dimensional. D equals one. And so you take this data, map it to say [inaudible]mu plus lambda Z and that may give you some set of points there.
And lastly, this model was envisioning that you’d place a little Gaussian bump around each of these say and sample – and the Xs are then maybe – that would be a typical sample of the Xs under this model. So how [inaudible] the parameters of this model? Well, the joint distribution of Z and X is actually Gaussian where parameters given by some vector [inaudible]mu XZ, and sum covariance matrix sigma. And [inaudible] what those two things are, this vector [inaudible]is a vector of zeroes appended to the vector mu. And the matrix sigma is this partitioned matrix.
We also worked this out last time. So you can ask what is the distribution of X under this model, and the answer is under this model X is Gaussian with mean mu and covariance lambda lambda [inaudible]plus psi. So let’s just take the second block of the mean vector and take that lower right hand corner block for the covariance matrix, and so this is really my formula for computing the marginal distribution of a Gaussian, except that I’m computing the marginal distribution of the second half of the vector rather than the first half. So this is the marginal distribution of X under my model. And so if you want to learn –
Student: [Inaudible] initial distribution [inaudible]?
Instructor (Andrew Ng) :Let’s see. Oh, yes. Yes, so in this one I’m breaking down the – this is really I’m specifying the conditional distribution of X given Z. So the conditional distribution of X given Z – this is Gaussian – would mean mu plus lambda Z and covariance psi. This is what that [inaudible]. So since this is the marginal distribution of X, given my training set of M unlabeled examples, I can actually write down the log likelihood of my training set. So the log likelihood of my training set – actually, no. Let’s just write down the likelihood. So the likelihood of my parameters given my training set is the product from I equals one to M of P of XI given the parameters.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?