<< Chapter < Page | Chapter >> Page > |
And this sort of learning problem of learning to predict housing prices is an example of what's called a supervised learning problem. And the reason that it's called supervised learning is because we're providing the algorithm a data set of a bunch of square footages, a bunch of housing sizes, and as well as sort of the right answer of what the actual prices of a number of houses were, right?
So we call this supervised learning because we're supervising the algorithm or, in other words, we're giving the algorithm the, quote, right answer for a number of houses. And then we want the algorithm to learn the association between the inputs and the outputs and to sort of give us more of the right answers, okay?
It turns out this specific example that I drew here is an example of something called a regression problem. And the term regression sort of refers to the fact that the variable you're trying to predict is a continuous value and price.
There's another class of supervised learning problems which we'll talk about, which are classification problems. And so, in a classification problem, the variable you're trying to predict is discreet rather than continuous. So as one specific example — so actually a standard data set you can download online [inaudible] that lots of machine learning people have played with. Let's say you collect a data set on breast cancer tumors, and you want to learn the algorithm to predict whether or not a certain tumor is malignant. Malignant is the opposite of benign, right, so malignancy is a sort of harmful, bad tumor. So we collect some number of features, some number of properties of these tumors, and for the sake of sort of having a simple [inaudible]explanation, let's just say that we're going to look at the size of the tumor and depending on the size of the tumor, we'll try to figure out whether or not the tumor is malignant or benign.
So the tumor is either malignant or benign, and so the variable in the Y axis is either zero or 1, and so your data set may look something like that, right? And that's 1 and that's zero, okay? And so this is an example of a classification problem where the variable you're trying to predict is a discreet value. It's either zero or 1.
And in fact, more generally, there will be many learning problems where we'll have more than one input variable, more than one input feature and use more than one variable to try to predict, say, whether a tumor is malignant or benign. So, for example, continuing with this, you may instead have a data set that looks like this. I'm gonna part this data set in a slightly different way now. And I'm making this data set look much cleaner than it really is in reality for illustration, okay?
For example, maybe the crosses indicate malignant tumors and the "O"s may indicate benign tumors. And so you may have a data set comprising patients of different ages and who have different tumor sizes and where a cross indicates a malignant tumor, and an "O" indicates a benign tumor. And you may want an algorithm to learn to predict, given a new patient, whether their tumor is malignant or benign.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?