<< Chapter < Page | Chapter >> Page > |
And therefore, by the Hufting inequality we have to add the probability that the difference between training and generalization error, the probability that this is greater than gamma is less than to two, E to the negative two, gamma squared M. Okay? Exactly by the Hufting inequality.
And what this proves is that, for my fixed hypothesis HJ, my training error, epsilon hat will with high probability, assuming M is large, if M is large than this thing on the right hand side will be small, because this is two Es and a negative two gamma squared M. So this says that if my training set is large enough, then the probability my training error is far from generalization error, meaning that it is more than gamma, will be small, will be bounded by this thing on the right hand side. Okay?
Now, here’s the [inaudible] tricky part, what we’ve done is approve this bound for one fixed hypothesis, for HJ. What I want to prove is that training error will be a good estimate for generalization error, not just for this one hypothesis HJ, but actually for all K hypotheses in my hypothesis class script H. So let’s do it – well, better do it on a new board. So in order to show that, let me define a random event, let me define AJ to be the event that – let me define AJ to be the event that – you know, the difference between training and generalization error is more than gamma on a hypothesis HJ. And so what we put on the previous board was that the probability of AJ is less equal to two E to the negative two, gamma squared M, and this is pretty small. Now, What I want to bound is the probability that there exists some hypothesis in my class script H, such that I make a large error in my estimate of generalization error. Okay? Such that this holds true. So this is really just that the probability that there exists a hypothesis for which this holds. This is really the probability that A one or A two, or up to AK holds. The chance there exists a hypothesis is just well the priority that – for hypothesis one and make a large error in estimating the generalization error, or for hypothesis two and make a large error in estimating generalization error, and so on. And so by the union bound, this is less than equal to that, which is therefore less than equal to – is equal to that. Okay?
So let me just take one minus both sides – of the equation on the previous board – let me take one minus both sides, so the probability that there does not exist for hypothesis such that, that. The probability that there does not exist a hypothesis on which I make a large error in this estimate while this is equal to the probability that for all hypotheses, I make a small error, or at most gamma, in my estimate of generalization error. In taking one minus on the right hand side I get two KE to the negative two gamma squared M. Okay? And so – and the sign of the inequality flipped because I took one minus both sides. The minus sign flips the sign of the equality. So what we’re shown is that with probability – which abbreviates to WP – with probability one minus two KE to the negative two gamma squared M. We have that, epsilon hat of H will be – will then gamma of epsilon of H, simultaneously for all hypotheses in our class script H.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?