<< Chapter < Page Chapter >> Page >

And then, finally, in – there’s actually a version of this that you can take even further, which is when your set k equals m. And so that’s when you take your training set, and you split it into as many pieces as you have training examples. And this procedure is called leave one out cross validation. And what you do is you then take out the first training example, train on the rest, and test on the first example. Then you take out the second training example, train on the rest, and test on the second example. Then you take out the third example, train on everything, but your third example. Test on the third example, and so on. And so with this many pieces you are now making, maybe even more effective use of your data than k-fold cross validation.

But you could leave – leave one out cross validation is computationally very expensive because now you need to repeatedly leave one example out, and then run your learning algorithm on m minus one training examples. You need to do this a lot of times, and so this is computationally very expensive. And typically, this is done only when you’re extremely data scarce. So if you have a learning problem where you have, say, 15 training examples or something, then if you have very few training examples, leave one out cross validation is maybe preferred. Yeah?

Student: You know, that time you proved that the difference between the generalized [inaudible] by number of examples in your training set and VC dimension. So maybe [inaudible]examples into different groups, we can use that for [inaudible].

Instructor (Andrew Ng) :Yeah, I mean –

Student: - compute the training error, and use that for computing [inaudible] for a generalized error.

Instructor (Andrew Ng) :Yeah, that’s done, but – yeah, in practice, I personally tend not to do that. It tends not to be – the VC dimension bounds are somewhat loose bounds. And so there are people – in structure risk minimization that propose what you do, but I personally tend not to do that, though. Questions for cross validation? Yeah.

Student: This is kind of far from there because when we spend all this time [inaudible] but how many data points do you sort of need to go into your certain marginal [inaudible]?

Instructor (Andrew Ng) :Right.

Student: So it seems like when I’d be able to use that instead of do this; more analytically, I guess. I mean –

Instructor (Andrew Ng) :Yeah.

Student: [Inaudible].

Instructor (Andrew Ng) :No – okay. So it turns out that when you’re proving learning theory bounds, very often the bounds will be extremely loose because you’re sort of proving the worse case upper bound that holds true even for very bad – what is it – so the bounds that I proved just now; right? That holds true for absolutely any probability distribution over training examples; right? So just assume the training examples we’ve drawn, iid from some distribution script d, and the bounds I proved hold true for absolutely any probability distribution over script d. And chances are whatever real life distribution you get over, you know, houses and their prices or whatever, is probably not as bad as the very worse one you could’ve gotten; okay? And so it turns out that if you actually plug in the constants of learning theory bounds, you often get extremely large numbers.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask