<< Chapter < Page | Chapter >> Page > |
It’s possible that if you – there’s one other issue which is repeated eigenvalues. So for example, it turns out that in this example if my covariance matrix looks like this, it turns out that the identity of the first two eigenvectors is ambiguous. And by that I mean you can choose this to be U1 and this to be U2 or I can just as well rotate these eigenvectors and choose that to be U1 and that to be U2, or even choose that to be U1 and that to be U2, and so on.
And so when you apply PCA, one thing to keep in mind is sometimes eigenvectors can rotate freely within their subspaces when you have repeated eigenvalues or close to repeated eigenvalues. And so the way to think about the vectors U is think of as a basis with which to represent your data, but the basis vectors can sometimes rotate freely, and so it’s not always useful to look at the eigenvectors one at a time, and say this is my first eigenvector capturing whatever, the height of a person, or this is my second eigenvector, and it captures their skill at [inaudible] or whatever. That’s a very dangerous thing to do when you do PCA. What is meaningful is the subspace spanned by the eigenvectors, but looking at the eigenvectors one at a time is sometimes a dangerous thing to do because they can often freely. Tiny numerical changes can cause eigenvectors to change a lot, but the subspace spanned by the top K eigenvectors will usually be about the same.
It actually turns out there are multiple possible interpretations of PCA. I’ll just give one more without proof, which is that [inaudible] whatever – given a training set like this, another view of PCA is – and let me just choose a direction. This is not the principal components. I choose some direction and project my data onto it. This is clearly not the direction PCA would choose, but what you can think of PCA as doing is choose a subspace onto which to project your data so as to minimize the sum of squares differences between the projections and the [inaudible]points. So in other words, another way to think of PCA is trying to minimize the sum of squares of these distances between the dots, the points X and the dots onto which I’m projecting the data. It turns out they’re actually – I don’t know. There’s sort of nine or ten different interpretations of PCA. This is another one. There are a bunch of ways to derive PCA. You get play with some PCA ideas more in the next problem set.
What I want to do next is talk about a few applications of PCA. Here are some ways that PCA is used. One is visualization. Very often you have high dimensional data sets. Someone gives you a 50-dimensional data set, and it’s very hard for you to look at a data set that’s 50-dimensional and understand what’s going on because you can’t plot something in 50 dimensions. So common practice if you want to visualize a very high dimensional data set is to take your data and project it into say a 2-D plot, or project it into a 3-D plot, so you can render like a 3-D display on a computer so you can better visualize the data and look for structure. One particular example that I learned about of doing this recently was in Krishna Shenoy’s lab here in Stanford in which he had readings from 50 different parts of a monkey brain. I actually don’t know it was the number 50. It was tens of different parts of the monkey brain, and so you’d have these 50-dimensional readings, 50-dimensional vectors correspond to different amounts of electrical activity in different parts of the monkey brain. It was actually 50 neurons, but tens of neurons, but it was tens of neurons in the monkey brain, and so you have a 50-dimensional time series, and it’s very hard to visualize very high dimensional data.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?