<< Chapter < Page | Chapter >> Page > |
So I'll show you a specific example in a second, but here's a cocktail party that's I guess rather sparsely attended by just two people. But what we're gonna do is we'll put two microphones in the room, okay? And so because the microphones are just at slightly different distances to the two people, and the two people may speak in slightly different volumes, each microphone will pick up an overlapping combination of these two people's voices, so slightly different overlapping voices. So Speaker 1's voice may be more loud on Microphone 1, and Speaker 2's voice may be louder on Microphone 2, whatever.
But the question is, given these microphone recordings, can you separate out the original speaker's voices? So I'm gonna play some audio clips that were collected by Tai Yuan Lee at UCSD. I'm gonna actually play for you the original raw microphone recordings from this cocktail party. So this is the Microphone 1:
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Microphone 2:
Uno, dos, tres, cuatro, cinco, seis, siete, ocho, nueve, diez.
Instructor (Andrew Ng) : So it's a fascinating cocktail party with people counting from one to ten. This is the second microphone:
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Microphone 2:
Uno, dos, tres, cuatro, cinco, seis, siete, ocho, nueve, diez.
Instructor (Andrew Ng) : Okay. So in supervised learning, we don't know what the right answer is, right? So what we're going to do is take exactly the two microphone recordings you just heard and give it to an unsupervised learning algorithm and tell the algorithm which of these discover structure in the data [inaudible] or what structure is there in this data? And we actually don't know what the right answer is offhand.
So give this data to an unsupervised learning algorithm, and what the algorithm does in this case, it will discover that this data can actually be explained by two independent speakers speaking at the same time, and it can further separate out the two speakers for you. So here's Output 1 of the algorithm:
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Instructor (Andrew Ng) : And there's the second algorithm:
Microphone 2:
Uno, dos, tres, cuatro, cinco, seis, siete, ocho, nueve, diez.
Instructor (Andrew Ng): And so the algorithm discovers that, gee, the structure underlying the data is really that there are two sources of sound, and here they are. I'll show you one more example. This is a, well, this is a second sort of different pair of microphone recordings:
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Microphone 2:
[Music playing.]
Instructor (Andrew Ng): So the poor guy is not at a cocktail party. He's talking to his radio. There's the second recording:
Microphone 1:
One, two, three, four, five, six, seven, eight, nine, ten.
Microphone 2:
[Music playing.]
Instructor (Andrew Ng) : Right. And we get this data. It's the same unsupervised learning algorithm. The algorithm is actually called independent component analysis, and later in this quarter, you'll see why. And then output's the following:
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?