<< Chapter < Page | Chapter >> Page > |
If and as , then the histogram classifier risk converges to the Bayes risk for every distribution with marginal density , for some constant . Actually, the result holds for every distribution . For the more general theorem, refer to Theorem in A probabilistic Theory of Pattern Recognition by Luc Devroye, László Györfi and Gábor Lugosi. .
What the theorem tells us is that we need the number of partition cells to tend to infinity (to insure that the bias tends to zero), but they can'tgrow faster than the number of samples ( i.e., we want the number of samples per box tending to infinity to drive the variance to zero).
Let (the theoretical analog of ) and define
The function is the theoretical analog of (i.e., the function obtained by averaging over the partition cells). By the triangle inequality,
Let's first bound the estimation error. For any , let denote the histogram bin in which falls in. Define the random variable
If , then this random variable is simply . Note that
where . is simply the number of samples in cell labelled 1. Now is a fairly complicated random variable, but the conditional distribution of given is relatively simple. Note that
since is the probability of a sample in having the label 1 and we are conditioning on the event of observing samples in .
Now consider the conditional expectation
Next note that
by the Jensen's inequality, .
Therefore,
and
or in other words,
Now taking expectation with respect to
Now a key fact is that for any , . This follows from the assumption that the marginal density , for some constant , and as . This result is easily verified by contradiction. If , then is contradicted. Thus, for any there exists a such that and for sufficiently large. Therefore, for sufficiently large and every ,
where the expectation is with respect to the distribution of the sample . Thus,
where the expectation is now with respect to the distribution of the sampleand the marginal distribution of .
Next consider the approximation error , where the expectation is over alone. The function may not itself be continuous, but there is another function that is uniformly continuous and such that . Recall that uniformly continuous functions can be well approximated by piecewiseconstant functions.
By the triangle inequality,
where .
and since is uniformly continuous,
By taking sufficiently large, can be made arbitrarily small. So for large , .
Thus, we have shown
for sufficiently large . Since was arbitrary, we have shown that taking
satisfies
if
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?