<< Chapter < Page Chapter >> Page >

I won’t talk about that here. If you want to learn about it, go ahead and look up John Platt’s paper on the SMO algorithm. The [inaudible] is pretty easy to read, and later on, we’ll also posting a handout on the course homepage with some of a simplified version of this [inaudible]that you can use in problems. You can see some of the process readings in more details.

One other thing that I didn’t talk about was how to update the parameter B. So this is solving all your Alphas. This is also the Alpha that allows us to get W. The other thing I didn’t talk about was how to compute the parameter B, and it turns out that again is actually not very difficult. I’ll let you read about that yourself with the notes that we’ll post along with the next problems.

To wrap up today’s lecture, what I want to do is just tell you briefly about a couple of examples of applications of SVMs. Let’s consider the problem of Handler’s Integer Recognition. In Handler’s Integer Recognition, you’re given a pixel array with a scanned image of, say, a zip code somewhere in Britain. This is an array of pixels, and some of these pixels will be on and other pixels will be off. This combination of pixels being on maybe represents the character one. The question is given an input feature vector like this, if you have, say, ten pixels by ten pixels, then you have a hundred dimensional feature vector, then [inaudible].

If you have ten pixels by ten pixels, you have 100 features, and maybe these are binary features of XB01 or maybe the Xs are gray scale values corresponding to how dark each of these pixels was. [Inaudible]. Turns out for many years, there was a neuronetwork that was a champion algorithm for Handler’s Integer Recognition. And it turns out that you can apply an SVM with the following kernel. It turns out either the polynomial kernel or the Galcean kernel works fine for this problem, and just by writing down this kernel and throwing an SVM at it, an SVM gave performance comparable to the very best neuronetworks.

This is surprising because support vector machine doesn’t take into account any knowledge about the pixels, and in particular, it doesn’t know that this pixel is next to that pixel because it’s just representing the pixel intensity value as a vector. And so this means the performance of SVM would be the same even if you were to shuffle all the pixels around. [Inaudible] let’s say comparable to the very best neuronetworks, which had been under very careful development for many years.

I want to tell you about one other cool example, which is SVMs are also used also to classify other fairly esoteric objects. So for example, let’s say you want to classify protein sequences into different classes of proteins. Every time I do this, I suspect that biologists in the room cringe, so I apologize for that. There are 20 amino acids, and proteins in our bodies are made up by sequences of amino acids. Even though there are 20 amino acids and 26 alphabets, I’m going to denote amino acids by the alphabet A through Z with apologizes to the biologists.

Here’s an amino acid sequence represented by a series of alphabets. So suppose I want to assign this protein into a few classes depending on what type of protein it is. The question is how do I construct my feature vector? This is challenging for many reasons, one of which is that protein sequences can be of different lengths. There are some very long protein sequences and some very short ones, and so you can’t have a feature saying what is the amino acid in the 100th position, because maybe there is no 100th position in this protein sequence. Some are very long. Some are very short.

Here’s my feature representation, which is I’m going to write down all possible combinations of four alphabets. I’m going to write down AAAA, AAAB, AAAC down to AAAZ and then AABA and so on. You get the idea. Write down all possible combinations of four alphabets and my feature representation will be I’m going to scan through this sequence of amino acids and count how often each of these subsequences occur. So for example, BAJT occurs twice and so I’ll put a two there, and none of these sequences occur, so I’ll put a zero there. I guess I have a one here and a zero there.

This very long vector will be my feature representation for protein. This representation applies no matter how long my protein sequence is. How large is this? Well, it turns out this is going to be in R20 to the four, and so you have a 160,000 dimensional feature vector, which is reasonably large, even by modern computer standards. Clearly, we don’t want to explicitly represent these high dimensional feature vectors. Imagine you have 1,000 examples and you store this as double [inaudible]. Even on modern day computers, this is big.

It turns out that there’s an efficient dynamic programming algorithm that can efficiently compute inner products between these feature vectors, and so we can apply this feature representation, even though it’s a ridiculously high feature vector to classify protein sequences. I won’t talk about the [inaudible] algorithm. If any of you have seen the [inaudible]algorithm for finding subsequences, it’s kind of reminiscent of that. You can look those up if you’re interested.

This is just another example of a cool kernel, and more generally, if you’re faced with some new machine-learning problem, sometimes you can choose a standard kernel like a Galcean kernel, and sometimes there are research papers written on how to come up with a new kernel for a new problem. Two last sentences I want to say. Where are we now? That wraps up SVMs, which many people would consider one of the most effective off the shelf learning algorithms, and so as of today, you’ve actually seen a lot of learning algorithms.

I want to close this class by saying congrats. You’re now well qualified to actually go and apply learning algorithms to a lot of problems. We’re still in week four of the quarter, so there’s more to come. In particular, what I want to do next is talk about how to really understand the learning algorithms and when they work well and when they work poorly and to take the tools which you now have and really talk about how you can use them really well. We’ll start to do that in the next lecture. Thanks.

[End of Audio]

Duration: 78 minutes

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask