<< Chapter < Page | Chapter >> Page > |
And it turns out actually all I did here was use the fact that P of X given Z times P of Z equals P over X comma Z. Right? That’s just combining this term and that term gives you the numerator in the original. And so in turns out that for factor analysis in these two terms, this is the only one that depends on the parameters. The distribution P of ZI has no parameters because ZI was just drawn from a Gaussian with zero mean and identity covariance. QI of Z was this fixed Gaussian. It doesn’t depend on the parameters theta, and so in the M step we really just need to maximize this one term with respect to all parameters, mu, lambda, and psi. So let’s see. There’s sort of one more key step I want to show but to get there I have to write down an unfortunately large amount of math, and let’s go into it. Okay.
So in the M step, we want to maximize and all expectations with respect to ZI drawn from the distributions QI, and sometimes I’ll be sloppy and just omit this. And now this distribution P of XI given ZI, that is a Gaussian density because XI given ZI – this is Gaussian with mean given by mu plus lambda Z and covariance psi. And so in this step of the derivation, I will actually go ahead and substitute in the formula for Gaussian density. So I will go ahead and take those and plug it into here, one over two pi to the N over two, psi inverse E to the dot, dot, dot. So I will go ahead and plug in the Gaussian density. And when you do that, you find that you get expectation – excuse me. I forgot to say to maintain – to not make the derivation too complicated, I’m actually just going to maximize this with respect to the parameters lambda. So let me just show how the – so you want to maximize this with respect to lambda, psi, and mu, but just to keep the amount of math I’m gonna do in class sane, I’m just going to show how to maximize this with respect to the matrix lambda, and I’ll pretend that psi and mu are fixed. And so if you substitute in the Gaussian density, you get some expected value of the constant. The constant may depend on psi but not on lambda. Then mine is this thing. And this quadratic term essentially came from the exponent in your Gaussian density. When I take log of exponent, then you end up with this quadratic term.
And so if you take the derivatives of the expression above with respect to the matrix lambda and you set that to zero – we want to maximize this expression with respect to the parameters lambda, so you take the derivative of this with respect to lambda – excuse me. That’s a minus sign – and set the derivate of this expression to zero because you set derivatives to zero to maximize things, right? When you do that and simplify, you end up with the following. And so that’s the – in the M step, this is the value you should get that you use to update your parameters lambda. And again, the expectations are with respect to ZI drawn from the distributions QI. So the very last step of this derivation is we need to work out what these two expectations are. And so the very first term EZI transpose I guess is just mu of ZI give XI transpose because the QI distribution has mean given by [inaudible]. To work out the other term, let me just remind you that if you have a random variable Z that’s Gaussian with mean mu and covariance sigma, then the covariance sigma is EZZ transpose minus EZ EZ transpose. That’s one of the definitions of the covariance, and so this implies that EZZ transpose equals sigma plus EZ EZ transpose. And so this second term here becomes sigma ZI to the mu I plus – given XI. Okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?