<< Chapter < Page | Chapter >> Page > |
But what he understood is that – was use PCA to project this 50-dimensional data down to three dimensions, and then you can visualize this data in three dimensions just using 3-D plot, so you could visualize what the monkey is thinking over time. Another common application of PCA is compression, so if you have high dimensional data and you want to store it with [inaudible] numbers, clearly PCA is a great way to do this. It turns out also that sometimes in machine learning, sometimes you’re just given extremely high dimensional input data, and for computational reasons you don’t want to deal with such high dimensional data. And so fairly – one common use of PCA is taking very high dimensional data and represent it with low dimensional subspace – it’s that YI is the representation I wrote down just now – so that you can work with much lower dimensional data. And it turns out to be it’s this sort of – just seems to make a fact of life that when you’re given extremely high dimensional data almost all the time just in practice – of all the high dimensional data sets I’ve ever seen, very high dimensional data sets often have all their points lying on much lower dimensional subspaces, so very often you can dramatically reduce the dimension of your data and really be throwing too much of the information away. So let’s see. If you’re learning algorithms, it takes a long time to run very high dimensional data. You can often use PCA to compress the data to lower dimensions so that your learning algorithm runs much faster. And you can often do this with sacrificing almost no performance in your learning algorithm.
There’s one other use of PCA for learning which is – you remember when we talked about learning theory, we said that the more features you have, the more complex your hypothesis class, if say you’re doing linear classification. If you have more features, you have lots of features, then you may be more prone to overfitting. One other thing you could do with PCA is just use that to reduce the dimension of your data so that you have fewer features and you may be slightly less prone to overfitting. This particular application of PCA to learning I should say, it sometimes works. It often works. I’ll actually later show a couple of examples where sort of do this and it works. But this particular application of PCA, I find looking in the industry, it also seems to be a little bit overused, and in particular – actually, let me say more about that later. We’ll come back to that later. There’s a couple of other applications to talk about. One is outlier detection or anomaly detection. The idea is suppose I give you a data set. You may then run PCA to find roughly the subspace on which your data lies. And then if you want to find anomalies in future examples, you then just look at your future examples and see if the lie very far from your subspace.
This isn’t a fantastic anomaly detection algorithm. It’s not a very good – the idea is if I give you a data set, you may find a little dimensional subspace on which it lies, and then if you ever find a point that’s far from your subspace, you can factor it as an anomaly. So this really isn’t the best anomaly detection algorithm, but sometimes this is done. And the last application that I want to go into a little bit more in detail is matching or to find better distance calculations. So let me say what I mean by this. I’ll go into much more detail on this last one. So here’s the idea. Let’s say you want to do face recognition [inaudible]. So let’s say you want to do face recognition and you have 100 of 100 images. So a picture of a face is like whatever, some array of pixels, and the pixels have different grayscale values, and dependent on the different grayscale values, you get different pictures of people’s faces. And so [inaudible]you have 100 of 100 pixel images, and you think of each face as a 10,000-dimensional vector, which is very high dimensional. [Inaudible] cartoon to keep in mind, and you think of here’s my plot of my 100-dimensional space, and if you look at a lot of different pictures of faces, each face will be along what will be some point in this 100,000-dimensional space. And in this cartoon, I want you to think of it as – I mean most of this data lies on a relatively low dimensional subspace because in this 10,000-dimensional space, really not all points correspond to valid images of faces. The vast majority of values, the vast majority of 10,000-dimensional images just correspond to random noise like looking things and don’t correspond to a valid image of a face. But instead the space of possible face of images probably spans a much lower dimensional subspace. [Crosstalk]
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?