<< Chapter < Page | Chapter >> Page > |
So let’s see. So I want to find the direction U. Just as a reminder, if I have a vector U of norm one, then the length of our vector XI projected – then the vector XI projected onto U has length XI transpose U. To project a vector onto – to project X onto unit vector, the length of the projection’s just XI transpose U. And so to formalize my PCA problem, I’m going to choose a vector U to maximize – so I choose U and this is subject to the constraint that the norm of U, that the length of U is one – but I’m going to maximize – let’s see, sum from I equals one to M – the length of the projection of the vectors X onto U. In particular, I want the sum of square distances of the projections to be far from the origin. I want the projections of X onto U to have large variance.
And just to simplify some of the math later, let me put a one over M in front. And so that quantity on the right is equal to one over M. Let’s see. U transpose XI times XI transpose U, and so can simplify and I get – this is U transpose times [inaudible] U. So I want to maximize U transpose times some matrix times U subject to the constraint that the length of U must be equal to one. And so some of you will recognize that this means U must be the principal eigenvector of this matrix in the middle. So let me just write that down and say a few more words about it.
So this implies that U is the principal eigenvector of the matrix, and I’ll just call this matrix sigma. It actually turns out to be a covariance matrix, [inaudible] equals one to M transpose. Actually, let’s check. How many of you are familiar with eigenvectors? Oh, cool. Lots of you, almost all of you. Great. What I’m about to say is very extremely familiar, but it’s worth saying anyway. So if you have a matrix A and a vector U, and they satisfy AU equals lambda U, then this is what it means for U to be an eigenvector of the matrix A. And the value lambda here is called an eigenvalue. And so the principal eigenvector is just the eigenvector that corresponds to the largest eigenvalue. One thing that some of you may have seen, but I’m not sure that you have, and I just wanna relate this to stuff that you already know as well, is that optimization problem is really maximize U transpose U subject to the norm of U is equal to one and – let me write that constraint as that U transpose U equals one.
And so to solve this constrained optimization problem, you write down the Lagrangian [inaudible] lambda, where that’s the Lagrange multiplier because there’s a constraint optimization. And so to actually solve this optimization, you take the derivative of L with respect to U and that gives you sigma U minus lambda U. You set the derivative equal to zero and this shows that sigma U equals lambda U, and therefore the value of U that solves this constraint optimization problem that we care about must be an eigenvector of sigma. And in particular it turns out to be the principal eigenvector. So just to summarize, what have we done? We’ve shown that given a training set, if you want to find the principal axis of a variation of data, you want to find the 1-D axis on which the data really varies the most, what we do is we construct the covariance matrix sigma, the matrix sigma that I wrote down just now, and then you would find the principal eigenvector of the matrix sigma. And this gives you the best 1-D subspace onto which to project the data.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?