0.3 Universal coding for classes of sources (Page 3/3)

Universal algorithms in signal Page 3 / 3

θ_{M L} = arg max {θ^{n_{x} (1)} {(1 - θ)}^{n_{x} (0)}},

and plugging this parameter $θ = θ_{M L}$ into $p_{θ} (x)$ minimizes the coding length among all possible parameters, $θ \in Λ$ . It is readily seen that

θ_{M L} = \frac{n_{x} (1)}{n} .

Suppose, however, that we were to encode with $θ^{'} = θ_{M L} + Δ$ . Then the coding length would be

l_{θ} (x) = - log ({(θ^{'})}^{n_{x} (1)}, {(1 - θ^{'})}^{n_{x} (0)}) .

It can be shown that this coding length is suboptimal w.r.t. $l_{θ_{M L}} (x)$ by $n \cdot O (Δ^{2})$ bits. Keep in mind that doubling the number of parameter levels used by ouruniversal encoder requires an extra bit to encode the extra factor of 2 in resolution. It makes sense to expend this extra bit only if it buys us at least one other bit,meaning that $n \cdot O (Δ^{2}) = 1$ , which implies that we encode $θ_{M L}$ to a resolution of $1 / \sqrt{n}$ , corresponding to $O (\sqrt{n})$ levels. Again, this is a redundancy of $\approx \frac{1}{2} log (n)$ bits per parameter.

Having described Rissanen's result intuitively, let us formalize matters. Consider ${p_{θ}, θ \in Λ}$ , where $Λ \subset R^{K}$ is a compact set. Suppose that there exists an estimator $\hat{θ}$ such that

\forall n \geq n (c) : p_{θ} \{∥, \hat{θ}, (x^{n}), - θ ∥ >, \frac{c}{\sqrt{n}}\} \leq δ (c),

where ${lim}_{c \to \infty} δ (c) = 0$ . Then we have the following converse result.

Theorem 6 (Converse to universal coding [link] ) Given a parametric class that satisfies the above condition [link] , for all $ϵ > 0$ and all codes $l$ that do not know $θ$ ,

r_{n} (l, θ) \geq (1 - ϵ) \frac{K}{2} \frac{log (n)}{n},

except for a class of $θ$ in $B_{ϵ} (n) \subseteq Λ$ whose Lebesgue volume shrinks to zero as $n$ increases.

That is, a universal code cannot compress at a redundancy substantialy below $\frac{1}{2} log (n)$ bits per parameter. Rissanen also proved the following achievable result in his seminal paper.

Theorem 7 (Achievable to universal coding [link] ) If $p_{θ} (x)$ is twice differentiable in $θ$ for every $x^{n}$ , then there exists a universal code such that $\forall θ \in Λ : r_{n} (l, θ) \leq (1 + ϵ) \frac{K}{2} \frac{log (n)}{n}$ .

Universal coding for piecewise i.i.d. sources

We have emphasized stationary parametric classes, but a parametric class can be nonstationary. Let us show how universal coding can be achieved for some nonstationary classes of sourcesby providing an example. Consider $Λ = {0, 1, . . ., n}$ where

p_{θ} (x^{n}) = Q_{1} (x_{1}^{θ}) \cdot Q_{2} (x_{θ + 1}^{n}),

where $Q_{1}$ and $Q_{2}$ are both know i.i.d. sources. This is a piecewise i.i.d. source ; in each segment it is i.i.d., and there is an abrupt transition in statistics when the first segment ends and the second begins.

Here are two approaches to coding this source.

Encode the best index $θ_{M L}$ using $⌈ log (n + 1) ⌉$ bits, then encode $p_{θ_{M L}} (x^{n})$ . This is known as two-part code or plug-in ; after encoding the index, we plug the best parameter into the distribution. Clearly,
$\begin{matrix} l (x) & = & min_{0 \leq θ \leq n} ⌈ - log p_{θ} (x) ⌉ + ⌈ log (n + 1) ⌉ \\ \leq & - log p_{θ} (x) + log (n + 1) + 2 . \end{matrix}$
The second approach is a mixture , we allocate weights for all possible parameters,
$\begin{matrix} l (x) & = & - log (\frac{1}{n + 1}, \sum_{i = 0}^{n}, p_{i}, (x^{n})) \\ < & - log (\frac{1}{n + 1}, p_{θ_{M L}}, (x^{n})) \\ = & - log (p_{θ_{M L}} (x)) + log (n + 1) . \end{matrix}$

Merhav [link] provided redundancy theorems for this class of sources. Algorithmic approaches to the mixture appear in Shamir and Merhav [link] and Willems [link] .

The theme that is common to both approaches, the plug-in and the mixture, is that they lose approximately $log (n)$ bits in encoding the location of the transition. Indeed, Merhav showed that the penalty for each transition in universal codingis approximately $log (n)$ bits [link] . Intuitively, the reason that the redundancy required to encode the location of the transition is largerthan the $\frac{1}{2} log (n)$ from Rissanen [link] is because the location of the transitionmust be described precisely to prevent paying a big coding length penalty in encoding segments using the wrong i.i.d. statistics. In contrast, in encoding our Bernoulli example an imprecision of $\frac{1}{\sqrt{n}}$ in encoding $θ_{M L}$ in the first part of the code yields only an $O (1)$ bit penalty in the second part of the code.

It is well known that mixtures out-compress the plug-in. However, in many cases they do so by only a small amount per parameter. For example, Baron et al. showed that the plug-in for i.i.d. sourcesloses approximately 1 bit per parameter w.r.t. the mixture.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Universal algorithms in signal processing and communications. OpenStax CNX. May 16, 2013 Download for free at http://cnx.org/content/col11524/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?

Ask

	7 Physiotherapy Flashcards Set 7 By Rhodes Start Flashcards
	10 Physiotherapy Modalities-Thermo By Rhodes Start Quiz
	Fluid Mechanics MCQ By Stephanie Redfern Start Quiz
	Back Safety Quiz - "Back in Action" By David Martin Start Quiz
	Vocabulary Week 1-3 By Rachel Woolard Start Quiz
	13 Lec:13 Hypothesis Testing P-values By Janet Forrester Start Quiz
	26 Biology 26 Seed Plants MCQ By OpenStax Start Quiz
	25 Toxicology Dr. Gustafson/Hamar quiz By Brooke Delaney Start Exam
	Unit 1 Geography Test By Qqq Qqq Start Flashcards
	Information Technology By Subramanian Divya Start Quiz