<< Chapter < Page | Chapter >> Page > |
And more generally, you would choose – if you wanna K-dimensional subspace onto which project your date, you would then choose U1 through UK to be the top K eigenvectors of sigma. And by top K eigenvectors, I just mean the eigenvectors corresponding to the K highest eigenvalues. I guess I showed this only for the case of a 1-D subspace, but this holds more generally. And now the eigenvectors U1 through UK represent – give you a new basis with which to represent your data.
Student: [Inaudible] diagonal?
Instructor (Andrew Ng) :Let’s see. So by convention, the PCA will choose orthogonal axes. So I think this is what I’m saying. Here’s one more example. Imagine that you’re a three-dimensional set, and imagine that your three-dimensional data set – it’s very hard to draw in 3-D on the board, so just let me try this. Imagine that here my X1 and X2 axes lie on the plane of the board, and the X3 axis points directly out of the board. Imagine that you have a data set where most of the data lies on the plane of the board, but there’s just a little bit of fuzz. So imagine that the X3 axis points orthogonally out of the board, and all the data lies roughly in the plane of the board, but there’s just a tiny little bit of fuzz so that some of the data lies just a couple of millimeters off the board.
So you run a PCA on this data. You find that U1 and U2 will be some pair of bases that essentially lie in the plane of the board, and U3 will be an orthogonal axis at points roughly out of the plane of the board. So if I reduce this data to two dimensions, then the bases U1 and U2 now give me a new basis with which to construct my lower dimensional representation of the data.
Just to be complete about what that means, previously we have say fairly high dimensional input data XI and RN, and now if I want to represent my data in this new basis given by U1 up to UK, instead I would take each of my original training examples XI and I would now represent it or replace it with a different vector which I call YI, and that would be computed as U1 transpose XI U2 transpose XI. And so the YIs are going to be K-dimensional where K will be less – your choice of K will be less than N, so this represents a lower dimensional representation of your data, and serves as an approximate representation of your original data where you’re using only K numbers to represent each training example rather than N numbers. Let’s see.
Student: [Inaudible] to have eigenvectors [inaudible]trivial eigenvectors?
Instructor (Andrew Ng) :Is it possible not to have eigenvectors but have trivial eigenvectors?
Student: [Inaudible] determined by [inaudible]? Is it a condition wherein [inaudible].
Instructor (Andrew Ng) :Let’s see. What do you mean by trivial eigenvectors?
Student: [Inaudible] linear algebra from [inaudible].
Instructor (Andrew Ng) :Oh, okay. Yes, so I see. Let me see if I can get this right. So there are some matrices that are – I think the term is degenerate – that don’t have a full set of eigenvectors. Sorry, deficient I think is what it’s called. Is that right, Ziko? Yeah. So some matrices are deficient and actually don’t have a full set of eigenvectors, like an N by N matrix that does not have N distinct eigenvectors, but that turns out not to be possible in this case because the covariance matrix sigma is symmetric, and symmetric matrices are never deficient, so the matrix sigma will always have a full set of eigenvectors.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?