<< Chapter < Page | Chapter >> Page > |
So let’s see. So what we conclude from this is that if you have a learning algorithm that tries to – for empirical risk minimization algorithms – in other words, less formally, for learning algorithms, they try to minimize training error. The intuition to take away from this is that the number of training examples you need is therefore, roughly, linear in the VC dimension of the hypotheses class. And more formally, this shows that sample complexity is upper bounded by the VC dimension; okay? It turns out that for most reasonable hypothesis classes, it turns out that the VC dimension is sort of very similar, I guess, to the number of parameters you model. So for example, you have model and logistic regression – linear classification.
In any dimensions – logistic regression in any dimensions is endless one parameters. And the VC dimension of which is the – of class of linear classifiers is always the endless one. So it turns out that for most reasonable hypothesis classes, the VC dimension is usually linear in the number of parameters of your model. Wherein, is most sense of low other polynomial; in the number of parameters of your model. And so this – the takeaway intuition from this is that the number of training examples you need to fit in those models is going to be let’s say, roughly, linear in the number of parameters in your model; okay?
There are some – somewhat strange examples where what I just said is not true. There are some strange examples where you have very few parameters, but the VC dimension is enormous. But I actually know of – all of the examples I know of that fall into that regime are somewhat strange and degenerate. So somewhat unusual, and not the source of not learning algorithms you usually use.
Let’s see, just other things. It turns out that – so this result shows the sample complexity is upper bounded by VC dimension. But if you have a number of training examples that are on the order of the VC dimension, then you find – it turns out that in the worse case some complexity is also lower bounded by VC dimension. And what that means is that if you have a perfectly nasty learning problem, say, then if the number of training examples you have is less than on the order of the VC dimension; then it is not possible to prove this bound. So I guess in the worse case, sample complexity in the number of training examples you need is upper bounded and lower bounded by the VC dimension.
Let’s see, questions about this?
Student: Does the proof of this assume any sort of finites of, like, finite [inaudible] like you have to just [inaudible]real numbers and [inaudible]?
Instructor (Andrew Ng) :Let’s see. The proof is not, no. I’ve actually stated the entirety of the theorem. This is true. It turns out in the proof – well, somewhere, regardless of the proof there’s a step reconstruction called an epsilon net, which is a very clever [inaudible]. It’ sort of in regardless of the proof, it is not an assumption that you need. In someway that sort of proof – that’s one-step that uses a very clever [inaudible] to prove this. But that’s not needed; it’s an assumption.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?