<< Chapter < Page | Chapter >> Page > |
So here’s the algorithm. Before running PCA normally you will preprocess the data as follows. I’ll just write this out, I guess. I know some of you are writing. So this is maybe just an unnecessary amount of writing to say that compute a mean of my training sets and subtract out the means, so now I’ve zeroed out the mean of my training sets. And the other step is I’ll compute the variance of each of features after zeroing out the mean, and then I’ll divide each feature by the standard deviation so that each of my features now has equal variance. So these are some standard preprocessing steps we’ll often do for PCA.
I’ll just mention that sometimes the second step is usually done only when your different features are on different scales. So if you’re taking measurements of people, one feature may be the height, and another may be the weight, and another may be the strength, and another may be how they age or whatever, all of these quantities are on very different scales, so you’d normally normalize the variance. Sometimes if all the XIs are the same type of thing – so for example if you’re working with images and the XIJs are the pixels that they’re – are on the same scale because they’re all pixel intensity values ranging from zero to 255, then you may omit this step. So after preprocessing, let’s talk about how we would find the main axes along which the data varies. How would you find a principal axes of variations [inaudible]? So to do that let me describe – let me just go through one specific example, and then we’ll formalize the algorithm. Here’s my training set comprising five examples. It has roughly zero mean. And the variance under X1 and X2 axes are the same. There’s X1, there’s X2 axes. And so the principal axes and variation of this data is roughly the positive 45-degree axis, so what I’d like to do is have my algorithm conclude that this direction that I’ve drawn with this arrow U is the best direction onto which to project the data, so the axis by which my data really varies.
So let’s think about how we formalize. One thing we want to look at is suppose I take that axis – this is really the axis onto which I want to project – that I want to use to capture most of the variation of my data. And then when I take my training set and project it onto this line, then I get that set of points. And so you notice that those dots, the projections of my training sets onto this axis, has very large variance. In contrast, if I were to choose a different direction – let’s say this is really [inaudible] the worst direction onto which to project my data. If I project all my data onto this axis, then I find that the projects of my data onto the purple line, onto this other line has much smaller variance, that my points are clustered together much more tightly. So one way to formalize this notion of finding the main axis of variations of data is I would like to find the vector U, I would like to find the direction U so that when I project my data onto that direction, the projected points vary as much as possible. So in other words, I’m gonna find a direction so that when I project my data onto it, the projections are largely widely spaced out. There’s large variance.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?