<< Chapter < Page | Chapter >> Page > |
We have discussed several parametric sources, and will now start developing mathematical tools in order to investigate properties of universal codes that offer universal compression w.r.t.a class of parametric sources.
Consider a class Λ of parametric models, where the parameter set characterizes the distribution for a specific source within this class, .
Consider the class of memoryless sources over an alphabet . Here we have
The goal is to find a fixed to variable length lossless code that is independent of , which is unknown, yet achieves
where expectation is taken w.r.t. the distribution implied by . We have seen for
that a code that is good for two sources (distributions) and exists, modulo the one bit loss [link] . As an expansion beyond this idea, consider
where is a prior.
Let us revisit the memoryless source, choose , and define the scalar parameter
Then
and
Moreover, it can be shown that
this result appears in Krichevsky and Trofimov [link] .
Is the source implied by the distribution an ergodic source? Consider the event . Owing to symmetry, in the limit of large the probability of this event under must be ,
On the other hand, recall that an ergodic source must allocate probability 0 or 1 to this flavor of event. Therefore, the source implied by is not ergodic.
Recall the definitions of and in [link] and [link] , respectively. Based on these definitions, consider the following,
We get the following quantity for mutual information between the random variable and random sequence ,
Note that this quantity represents the gain in bits that the parameter creates; more about this quantity will be mentioned later.
We now define the conditional redundancy ,
this quantifies how far a coding length function is from the entropy where the parameter is known. Note that
Denote by the collection of lossless codes for length- inputs, and define the expected redundancy of a code by
The asymptotic expected redundancy follows,
assuming that the limit exists.
We can also define the minimum redundancy that incorporates the worst prior for parameter,
while keeping the best code. Similarly,
Let us derive ,
where is the capacity of a channel from the sequence to the parameter [link] . That is, we try to estimate the parameter from the noisy channel.
In an analogous manner, we define
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?