<< Chapter < Page | Chapter >> Page > |
But the notation that I’m gonna use for the rest of today and for most of next week will be that my B equals Y, and instead of be zero, one, they’ll be minus one and plus one, and a development of a support vector machine we will have H, have a hypothesis output values to the either plus one or minus one, and so we’ll let G of Z be equal to one if Z is greater or equal to zero, and minus one otherwise, right? So just rather than zero and one, we change everything to plus one and minus one.
And, finally, whereas previously I wrote G subscript theta of X equals G of theta transpose X and we had the convention that X zero is equal to one, right? And so X is an RN plus one. I’m gonna drop this convention of letting X zero equals a one, and letting X be an RN plus one, and instead I’m going to parameterize my linear classifier as H subscript W, B of X equals G of W transpose X plus B, okay? And so B just now plays the role of theta zero, and W now plays the role of the rest of the parameters, theta one through theta N, okay? So just by separating out the interceptor B rather than lumping it together, it’ll make it easier for us to develop support vector machines. So – yes.
Student: [Off mic].
Instructor (Andrew Ng) :Oh, yes. Right, yes. So W is – right. So W is a vector in RN, and X is now a vector in RN rather than N plus one, and a lowercase b is a real number. Okay.
Now, let’s formalize the notion of functional margin and germesh margin. Let me make a definition. I’m going to say that the functional margin of the hyper plane WB with respect to a specific training example, XIYI is – WRT stands for with respect to – the function margin of a hyper plane WB with respect to a certain training example, XIYI has been defined as Gamma Hat I equals YI times W transpose XI plus B, okay?
And so a set of parameters, W, B defines a classifier – it, sort of, defines a linear separating boundary, and so when I say hyper plane, I just mean the decision boundary that’s defined by the parameters W, B. You know what, if you’re confused by the hyper plane term, just ignore it. The hyper plane of a classifier with parameters W, B with respect to a training example is given by this formula, okay? And interpretation of this is that if YI is equal to one, then for each to have a large functional margin, you want W transpose XI plus B to be large, right? And if YI is equal minus one, then in order for the functional margin to be large – we, sort of, want the functional margins to large, but in order for the function margins to be large, if YI is equal to minus one, then the only way for this to be big is if W transpose XI plus B is much less than zero, okay?
So this captures the intuition that we had earlier about functional margins – the intuition we had earlier that if YI is equal to one, we want this to be big, and if YI is equal to minus one, we want this to be small, and this, sort of, practice of two cases into one statement that we’d like the functional margin to be large. And notice this is also that so long as YI times W transpose XY plus B, so long as this is greater than zero, that means we classified it correctly, okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?