<< Chapter < Page | Chapter >> Page > |
and plugging this parameter into minimizes the coding length among all possible parameters, . It is readily seen that
Suppose, however, that we were to encode with . Then the coding length would be
It can be shown that this coding length is suboptimal w.r.t. by bits. Keep in mind that doubling the number of parameter levels used by ouruniversal encoder requires an extra bit to encode the extra factor of 2 in resolution. It makes sense to expend this extra bit only if it buys us at least one other bit,meaning that , which implies that we encode to a resolution of , corresponding to levels. Again, this is a redundancy of bits per parameter.
Having described Rissanen's result intuitively, let us formalize matters. Consider , where is a compact set. Suppose that there exists an estimator such that
where . Then we have the following converse result.
Theorem 6 (Converse to universal coding [link] ) Given a parametric class that satisfies the above condition [link] , for all and all codes that do not know ,
except for a class of in whose Lebesgue volume shrinks to zero as increases.
That is, a universal code cannot compress at a redundancy substantialy below bits per parameter. Rissanen also proved the following achievable result in his seminal paper.
Theorem 7 (Achievable to universal coding [link] ) If is twice differentiable in for every , then there exists a universal code such that .
We have emphasized stationary parametric classes, but a parametric class can be nonstationary. Let us show how universal coding can be achieved for some nonstationary classes of sourcesby providing an example. Consider where
where and are both know i.i.d. sources. This is a piecewise i.i.d. source ; in each segment it is i.i.d., and there is an abrupt transition in statistics when the first segment ends and the second begins.
Here are two approaches to coding this source.
Merhav [link] provided redundancy theorems for this class of sources. Algorithmic approaches to the mixture appear in Shamir and Merhav [link] and Willems [link] .
The theme that is common to both approaches, the plug-in and the mixture, is that they lose approximately bits in encoding the location of the transition. Indeed, Merhav showed that the penalty for each transition in universal codingis approximately bits [link] . Intuitively, the reason that the redundancy required to encode the location of the transition is largerthan the from Rissanen [link] is because the location of the transitionmust be described precisely to prevent paying a big coding length penalty in encoding segments using the wrong i.i.d. statistics. In contrast, in encoding our Bernoulli example an imprecision of in encoding in the first part of the code yields only an bit penalty in the second part of the code.
It is well known that mixtures out-compress the plug-in. However, in many cases they do so by only a small amount per parameter. For example, Baron et al. showed that the plug-in for i.i.d. sourcesloses approximately 1 bit per parameter w.r.t. the mixture.
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?