<< Chapter < Page | Chapter >> Page > |
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Instructor (Andrew Ng): And that's the second one:
Microphone 2:
[Music playing.]
Instructor (Andrew Ng): Okay. So it turns out that beyond solving the cocktail party algorithm, this specific class of unsupervised learning algorithms are also applied to a bunch of other problems, like in text processing or understanding functional grading and machine data, like the magneto-encephalogram would be an EEG data. We'll talk about that more when we go and describe ICA or independent component analysis algorithms, which is what you just saw.
And as an aside, this algorithm I just showed you, it seems like it must be a pretty complicated algorithm, right, to take this overlapping audio streams and separate them out. It sounds like a pretty complicated thing to do. So you're gonna ask how complicated is it really to implement an algorithm like this? It turns out if you do it in MATLAB, you can do it in one line of code.
So I got this from Samuel Wyse at Toronto, U of Toronto, and the example I showed you actually used a more complicated ICA algorithm than this. But nonetheless, I guess this is why for this class I'm going to ask you to do most of your programming in MATLAB and Octave because if you try to implement the same algorithm in C or Java or something, I can tell you from personal, painful experience, you end up writing pages and pages of code rather than relatively few lines of code. I'll also mention that it did take researchers many, many years to come up with that one line of code, so this is not easy.
So that was unsupervised learning, and then the last of the four major topics I wanna tell you about is reinforcement learning. And this refers to problems where you don't do one-shot decision-making. So, for example, in the supervised learning cancer prediction problem, you have a patient come in, you predict that the cancer is malignant or benign. And then based on your prediction, maybe the patient lives or dies, and then that's it, right? So you make a decision and then there's a consequence. You either got it right or wrong. In reinforcement learning problems, you are usually asked to make a sequence of decisions over time.
So, for example, this is something that my students and I work on. If I give you the keys to an autonomous helicopter — we actually have this helicopter here at Stanford, — how do you write a program to make it fly, right? You notice that if you make a wrong decision on a helicopter, the consequence of crashing it may not happen until much later. And in fact, usually you need to make a whole sequence of bad decisions to crash a helicopter. But conversely, you also need to make a whole sequence of good decisions in order to fly a helicopter really well.
So I'm gonna show you some fun videos of learning algorithms flying helicopters. This is a video of our helicopter at Stanford flying using a controller that was learned using a reinforcement learning algorithm. So this was done on the Stanford football field, and we'll zoom out the camera in a second. You'll sort of see the trees planted in the sky. So maybe this is one of the most difficult aerobatic maneuvers flown on any helicopter under computer control. And this controller, which is very, very hard for a human to sit down and write out, was learned using one of these reinforcement learning algorithms.
Just a word about that: The basic idea behind a reinforcement learning algorithm is this idea of what's called a reward function. What we have to think about is imagine you're trying to train a dog. So every time your dog does something good, you say, "Good dog," and you reward the dog. Every time your dog does something bad, you go, "Bad dog," right? And hopefully, over time, your dog will learn to do the right things to get more of the positive rewards, to get more of the "Good dogs" and to get fewer of the "Bad dogs.”
So the way we teach a helicopter to fly or any of these robots is sort of the same thing. Every time the helicopter crashes, we go, "Bad helicopter," and every time it does the right thing, we go, "Good helicopter," and over time it learns how to control itself so as to get more of these positive rewards.
So reinforcement learning is — I think of it as a way for you to specify what you want done, so you have to specify what is a "good dog" and what is a "bad dog" behavior. And then it's up to the learning algorithm to figure out how to maximize the "good dog" reward signals and minimize the "bad dog" punishments.
So it turns out reinforcement learning is applied to other problems in robotics. It's applied to things in web crawling and so on. But it's just cool to show videos, so let me just show a bunch of them. This learning algorithm was actually implemented by our head TA, Zico, of programming a four-legged dog. I guess Sam Shriver in this class also worked on the project and Peter Renfrew and Mike and a few others. But I guess this really is a good dog/bad dog since it's a robot dog.
The second video on the right, some of the students, I guess Peter, Zico, Tonca working on a robotic snake, again using learning algorithms to teach a snake robot to climb over obstacles.
Below that, this is kind of a fun example. Ashutosh Saxena and Jeff Michaels used learning algorithms to teach a car how to drive at reasonably high speeds off roads avoiding obstacles.
And on the lower right, that's a robot programmed by PhD student Eva Roshen to teach a sort of somewhat strangely configured robot how to get on top of an obstacle, how to get over an obstacle. Sorry. I know the video's kind of small. I hope you can sort of see it. Okay?
So I think all of these are robots that I think are very difficult to hand-code a controller for by learning these sorts of learning algorithms. You can in relatively short order get a robot to do often pretty amazing things.
Okay. So that was most of what I wanted to say today. Just a couple more last things, but let me just check what questions you have right now. So if there are no questions, I'll just close with two reminders, which are after class today or as you start to talk with other people in this class, I just encourage you again to start to form project partners, to try to find project partners to do your project with. And also, this is a good time to start forming study groups, so either talk to your friends or post in the newsgroup, but we just encourage you to try to start to do both of those today, okay? Form study groups, and try to find two other project partners.
So thank you. I'm looking forward to teaching this class, and I'll see you in a couple of days.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?