<< Chapter < Page | Chapter >> Page > |
E.g. with .
LemmaSuppose is a semi-distance. Also suppose that we have constructed s.t. , . Take any estimator and define the test: as
Then , implies .
Suppose . Now
The previous lemma implies that
Therefore,
The third step follows since we are replacing the class of tests defined by by a larger class of ALL possible tests , and hence the taken over the larger class is smaller.
Now our goal throughout is going to be to find lower bounds for .
So we need to construct s.t. , and . Observe that this requires careful construction since the first condition necessitates that and are far from each other, while the second condition requires that and are close enough so that it is harder to distinguish them based on a given sample of data, and hence the probability of error is bounded away from 0.
We now try to lower bound the probability of error . We first consider the case , corresponding to binary hypothesis testing.
M = 1: Let and denote the two probability measures, i.e. distributions of the data under models 0 and 1. Clearly if and are very “close", then it is hard to distinguish the two hypotheses, and so is large.
A natural measure between probability measures is the total variation , defined as:
where and are the densities of and with respect to a common dominating measure and is any subset of the domain. We will lower bound the probability of error using the total variation distance. But first, we establish the following lemma.
LemmaRecall the definition of the total variation distance:
Observe that the set maximizing the right hand side is given by either or .
Let us pick . Then
For the second part, notice that
Now consider
We are now ready to tackle the lower bound on . In this case, we consider all tests . Equivalently, we can define , where is any subset of the domain.
So if is close to , then is small and the probability of error is large.
This is interesting, but unfortunately, it is hard to work with total variation, especially for multivariate distributions. Bounds involving the Kullback-Leibler divergence are much more convenient.
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?