<< Chapter < Page | Chapter >> Page > |
All right. So the second step of the overall proof I want to execute is the following. The result of the training error is essentially that uniform conversions will hold true with high probability. What I want to show now is let’s assume that uniform conversions hold. So let’s assume that for all hypotheses H, we have that epsilon of H minus epsilon hat of H, is less than of the gamma. Okay? What I want to do now is use this to see what we can prove about the bound of – see what we can prove about the generalization error. So I want to know – suppose this holds true – I want to know can we prove something about the generalization error of H hat, where again, H hat was the hypothesis selected by empirical risk minimization. Okay?
So in order to show this, let me make one more definition, let me define H star, to be the hypothesis in my class script H that has the smallest generalization error. So this is – if I had an infinite amount of training data or if I really I could go in and find the best possible hypothesis – best possible hypothesis in the sense of minimizing generalization error – what’s the hypothesis I would get? Okay? So in some sense, it sort of makes sense to compare the performance of our learning algorithm to the performance of H star, because we sort of – we clearly can’t hope to do better than H star. Another way of saying that is that if your hypothesis class is a class of all linear decision boundaries, that the data just can’t be separated by any linear functions. So if even H star is really bad, then there’s sort of – it’s unlikely – then there’s just not much hope that your learning algorithm could do even better than H star. So I actually prove this result in three steps. So the generalization error of H hat, the hypothesis I chose, this is going to be less than equal to that, actually let me number these equations, right. This is – because of equation one, because I see that epsilon of H hat and epsilon hat of H hat will then gamma of each other. Now because H star, excuse me, now by the definition of empirical risk minimization, H hat was chosen to minimize training error and so there can’t be any hypothesis with lower training error than H hat. So the training error of H hat must be less than the equal to the training error of H star. So this is sort of by two, or by the definition of H hat, as the hypothesis that minimizes training error H hat.
And the final step is I’m going to apply this uniform conversions result again. We know that epsilon hat of H star must be moving gamma of epsilon of H star. And so this is at most plus gamma. Then I have my original gamma there. Okay? And so this is by equation one again because – oh, excuse me – because I know the training error of H star must be moving gamma of the generalization error of H star. And so – well, I’ll just write this as plus two gamma. Okay? Yeah?
Student: [Inaudible] notation, is epsilon proof of [inaudible]H hat that’s not the training error, that’s the generalization error with estimate of the hypothesis?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?