<< Chapter < Page | Chapter >> Page > |
And here’s the strategy, I’m going to – the first step in this prove I’m going to show that training error is a good approximation to generalization error, and then I’m going to show that this implies a bound on the generalization error of the hypothesis of [inaudible] empirical risk minimization. And I just realized, this class I guess is also maybe slightly notation heavy class round, instead of just introducing a reasonably large set of new symbols, so if again, in the course of today’s lecture, you’re looking at some symbol and you don’t quite remember what it is, please raise your hand and ask. [Inaudible]what’s that, what was that, was that a generalization error or was it something else? So raise your hand and ask if you don’t understand what the notation I was defining.
Okay. So let me introduce this in two steps. And the empirical risk strategy is I’m gonna show training errors that give approximation generalization error, and this will imply that minimizing training error will also do pretty well in terms of minimizing generalization error. And this will give us a bound on the generalization error of the hypothesis output by empirical risk minimization. Okay? So here’s the idea. So lets even not consider all the hypotheses at once, lets pick any hypothesis, HJ in the class script H, and so until further notice lets just consider there one fixed hypothesis. So pick any one hypothesis and let’s talk about that one.
Let me define ZI to be indicator function for whether this hypothesis misclassifies the IFE example – excuse me – or Z subscript I. Okay? So ZI would be zero or one depending on whether this one hypothesis which is the only one I’m gonna even consider now, whether this hypothesis was classified as an example. And so my training set is drawn randomly from sum distribution scripts d, and depending on what training examples I’ve got, these ZIs would be either zero or one. So let’s figure out what the probability distribution ZI is. Well, so ZI takes on the value of either zero or one, so clearly is a Benuve random variable, it can only take on these values.
Well, what’s the probability that ZI is equal to one? In other words, what’s the probability that from a fixed hypothesis HJ, when I sample my training set IID from distribution D, what is the chance that my hypothesis will misclassify it? Well, by definition, that’s just a generalization error of my hypothesis HJ. So ZI is a Benuve random variable with mean given by the generalization error of this hypothesis. Raise your hand if that made sense. Oh, cool. Great.
And moreover, all the ZIs have the same probability of being one, and all my training examples I’ve drawn are IID, and so the ZIs are also independent – and therefore the ZIs themselves are IID random variables. Okay? Because my training examples were drawn independently of each other, by assumption.
If you read this as the definition of training error, the training error of my hypothesis HJ, that’s just that. That’s just the average of my ZIs, which was – well I previously defined it like this. Okay? And so epsilon hat of HJ is exactly the average of MIID, Benuve random variables, drawn from Benuve distribution with mean given by the generalization error, so this is – well this is the average of MIID Benuve random variables, each of which has meaning given by the generalization error of HJ.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?