<< Chapter < Page | Chapter >> Page > |
If you’re faced with a new learning problem – if I give you some random thing to classify and you want to decide how to come up with a kernel, one way is to try to come up with the function P of XZ that is large, if you want to learn the algorithm to think of X and Z as similar and small. Again, this isn’t always true, but this is one of several intuitions. So if you’re trying to classify some brand new thing – you’re trying to classify [inaudible] or DNA sequences or something, some strange thing you want to classify, one thing you could do is try to come up with a kernel that’s large when you want the algorithm to think these are similar things or these are dissimilar.
And so this answers the question of let’s say I have something I want to classify, and let’s say I write down the function that I think is a good measure of how similar or dissimilar X and Z are for my specific problem. Let’s say I write down K of XZ equals E to the minus. Let’s say I write down this function, and I think this is a good measure of how similar X and Z are. The question, then, is is this really a valid kernel? In other words, to understand how we can construct kernels – if I write down the function like that, the question is does there really exist some Phi such that KXZ is equal to the inner product?
And that’s the question of is K a valid kernel. It turns out that there is a result that characterizes necessary and sufficient conditions for when a function K that you might choose is a valid kernel. I should go ahead show part of that result now.
Suppose K is a valid kernel, and when I say K is a kernel, what I mean is there does indeed exist some function Phi for which this holds true. Then let any set of points X1 up to XM be given. Let me define a matrix K. I apologize for overloading notation. K I’m going to use to denote both the kernel function, which is the function of X and Z as well as a matrix. Unfortunately, there aren’t enough alphabets. Well, that’s not true.
We need to find the kernel matrix to be an M-by-M matrix such that K subscript IJ is equal to the kernel function applied to two of my examples. Then it turns out that for any vector Z that’s indimensional, I want you to consider Z transpose KZ. By definition of matrix multiplication, this is that, and so KIJ is a kernel function between XI and XJ, so that must equal to this. I assume that K is a valid kernel function, and so there must exist such a value for Phi. This is the inner product between two feature vectors, so let me just make that inner product the explicit.
I’m going to sum over the elements of this vector, and I’m going to use Phi XI subscript K just to denote the K element of this vector. Just rearrange sums. You get sum over K. This next set may look familiar to some of you, which is just – right? Therefore, this is the sum of squares and it must therefore be greater than or equal to zero. Do you want to take a minute to look for all the steps and just make sure you buy them all? Oh, this is the inner product between the vector of Phi of XI and Phi of XJ, so the inner product between two vectors is the sum over all the elements of the vectors of the corresponding element.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?