<< Chapter < Page | Chapter >> Page > |
So if it’s the case that theta transpose X is much greater than zero, the double greater than sign means these are much greater than, all right. So if theta transpose X is much greater than zero, then, you know, think of that as a very confident prediction that Y is equal to one, right? If theta transpose X is much greater than zero, then we’re gonna predict one then moreover we’re very confident it’s one, and the picture for that is if theta transpose X is way out here, then we’re estimating that the probability of Y being equal to one on the sigmoid function, it will be very close to one. And, in the same way, if theta transpose X is much less than zero, then we’re very confident that Y is equal to zero.
So wouldn’t it be nice – so when we fit logistic regression of some of the classifiers is your training set, then so wouldn’t it be nice if, right, for all I such that Y is equal to one. We have theta transpose XI is much greater than zero, and for all I such that Y is equal to zero, we have theta transpose XI is much less than zero, okay? So wouldn’t it be nice if this is true? That, essentially, if our training set, we can find parameters theta so that our learning algorithm not only makes correct classifications on all the examples in a training set, but further it’s, sort of, is very confident about all of those correct classifications. This is the first intuition that I want you to have, and we’ll come back to this first intuition in a second when we talk about functional margins, okay? We’ll define this later.
The second intuition that I want to convey, and it turns out for the rest of today’s lecture I’m going to assume that a training set is linearly separable, okay? So by that I mean for the rest of today’s lecture, I’m going to assume that there is indeed a straight line that can separate your training set, and we’ll remove this assumption later, but just to develop the algorithm, let’s take away the linearly separable training set. And so there’s a sense that out of all the straight lines that separate the training set, you know, maybe that straight line isn’t such a good one, and that one actually isn’t such a great one either, but maybe that line in the middle is a much better linear separator than the others, right?
And one reason that when you and I look at it this one seems best is because this line is just further from the data, all right? That is separates the data with a greater distance between your positive and your negative examples and division boundary, okay? And this second intuition, we’ll come back to this shortly, about this final line that I drew being, maybe, the best line this notion of distance from the training examples. This is the second intuition I want to convey, and we’ll formalize it later when we talk about geometric margins of our classifiers, okay?
So in order to describe support vector machine, unfortunately, I’m gonna have to pull a notation change, and, sort of, unfortunately, it, sort of, was impossible to do logistic regression, and support vector machines, and all the other algorithms using one completely consistent notation, and so I’m actually gonna change notations slightly for linear classifiers, and that will actually make it much easier for us – that’ll make it much easier later today and in next week’s lectures to actually talk about support vector machine.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?