<< Chapter < Page | Chapter >> Page > |
Proof. Part (1):
Part (2):
Part (3): This comes from the fist equality in the proof of (2), because we have the average of a monotonely non-increasing sequence.
Part (4): Both sequences are monotone non-increasing (parts (1) and (3)) and bounded below (by zero). Therefore, they both have a limit. Denote and .
Owing to part(2), . Therefore, it suffices to prove .
Now fix and take the limit for large . The inequality appears, which proves that both limits are equal.
Coding theorem : Theorem 3 yields for fixed to variable length coding that for a stationary source, there exists a lossless code such that the compressionrate obeys,
This can be proved, for example, by choosing , which is a Shannon code.As is increased, the compression rate converges to the entropy rate.
We also have a converse theorem for lossless coding of stationary sources. That is, .
Consider the sequence . Let denote a step , , where takes steps. Let be a function that operates on coordinates . An ergodic source has the property that empirical averages converge to statistical averages,
In block codes we want
We will be content with convergence in probability, and a.s. convergence is better.
Theorem 4 Let be a stationary ergodic source with , then for every , there exists such that ,
where .
The proof of this result is quite lengthy. We discussed it in detail, but skip it here.
Theorem 4 is called the ergodic theorem of information theory or the ergodic theorem of entropy. Shannon (48') proved convergence in probability for stationary ergodic Markov sources.McMillan (53') proved convergence for stationary ergodic sources. Brieman (57'/60') proved convergence with probability 1 for stationary ergodic sources.
In this section, we will discuss several parametric models and see what their entropy rate is.
Memoryless sources : We have seen for memoryless sources,
where there are parameters in total,
the parameters are denoted by , and is the alphabet.
Markov sources : The distribution of a Markov source is defined as
where . We must define initial probabilities and transition probabilities, . There are initial probabilities and transition probabilities, giving a total of parameters. Note that
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?