<< Chapter < Page | Chapter >> Page > |
Student: [Inaudible].
Instructor (Andrew Ng) :Oh, yes it is. This is just A transpose B equals sum over K, AK BK, so that’s just this. This is the sum of K of the K elements of this vector. Take a look at this and make sure it makes sense. Questions about this? So just to summarize, what we showed was that for any vector Z, Z transpose KZ is greater than or equal to zero, and this is one of the standard definitions of a matrix, the matrix K being posisemidefinite when a matrix K is posisemidefinite, that is, K is equal to zero.
Just to summarize, what was shown is that if K is a valid kernel – in other words, if K is a function for which there exists some Phi such that K of XI XJ is the inner product between Phi of XI and Phi of XJ. So if K is a valid kernel, we showed, then, that the kernel matrix must be posisemidefinite. It turns out that the converse [inaudible] and so this gives you a test for whether a function K is a valid kernel.
So this is a theorem due to Mercer, and so kernels are also sometimes called Mercer kernels. It means the same thing. It just means it’s a valid kernel. Let K of XZ be given. Then K is a valid kernel – in other words, it’s a Mercer kernel, i.e., there exists a Phi such that KXZ equals Phi of X transpose Phi of Z – if and only if for any set of M examples, and this really means for any set of M points. It’s not necessarily a training set. It’s just any set of M points you may choose. It holds true that the kernel matrix, capital K that I defined just now, is symmetric posisemidefinite.
And so I proved only one direction of this result. I proved that if it’s a valid kernel, then K is symmetry posisemidefinite, but the converse I didn’t show. It turns out that this is necessary and a sufficient condition. And so this gives you a useful test for whether any function that you might want to choose is a kernel.
A concrete example of something that’s clearly not a valid kernel would be if you find an input X such that K of X, X – and this is minus one, for example – then this is an example of something that’s clearly not a valid kernel, because minus one cannot possibly be equal to Phi of X transpose Phi of X, and so this would be one of many examples of functions that will fail to meet the conditions of this theorem, because inner products of a vector itself are always greater than zero.
So just to tie this back explicitly to an SVM, let’s say to use a support vector machine with a kernel, what you do is you choose some function K of XZ, and so you can choose – and it turns out that function I wrote down just now – this is, indeed, a valid kernel. It is called the Galcean kernel because of the similarity to Galceans. So you choose some kernel function like this, or you may choose X transpose Z plus C to the D vector.
To apply a support vector machine kernel, you choose one of these functions, and the choice of this would depend on your problem. It depends on what is a good measure of one or two examples similar and one or two examples different for your problem. Then you go back to our formulation of support vector machine, and you have to use the dual formulation, and you then replace everywhere you see these things, you replace it with K of XI, XJ.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?