<< Chapter < Page Chapter >> Page >
θ M L = arg max { θ n x ( 1 ) ( 1 - θ ) n x ( 0 ) } ,

and plugging this parameter θ = θ M L into p θ ( x ) minimizes the coding length among all possible parameters, θ Λ . It is readily seen that

θ M L = n x ( 1 ) n .

Suppose, however, that we were to encode with θ ' = θ M L + Δ . Then the coding length would be

l θ ( x ) = - log ( θ ' ) n x ( 1 ) ( 1 - θ ' ) n x ( 0 ) .

It can be shown that this coding length is suboptimal w.r.t. l θ M L ( x ) by n · O ( Δ 2 ) bits. Keep in mind that doubling the number of parameter levels used by ouruniversal encoder requires an extra bit to encode the extra factor of 2 in resolution. It makes sense to expend this extra bit only if it buys us at least one other bit,meaning that n · O ( Δ 2 ) = 1 , which implies that we encode θ M L to a resolution of 1 / n , corresponding to O ( n ) levels. Again, this is a redundancy of 1 2 log ( n ) bits per parameter.

Having described Rissanen's result intuitively, let us formalize matters. Consider { p θ , θ Λ } , where Λ R K is a compact set. Suppose that there exists an estimator θ ^ such that

n n ( c ) : p θ θ ^ ( x n ) - θ > c n δ ( c ) ,

where lim c δ ( c ) = 0 . Then we have the following converse result.

Theorem 6 (Converse to universal coding  [link] ) Given a parametric class that satisfies the above condition [link] , for all ϵ > 0 and all codes l that do not know θ ,

r n ( l , θ ) ( 1 - ϵ ) K 2 log ( n ) n ,

except for a class of θ in B ϵ ( n ) Λ whose Lebesgue volume shrinks to zero as n increases.

That is, a universal code cannot compress at a redundancy substantialy below 1 2 log ( n ) bits per parameter. Rissanen also proved the following achievable result in his seminal paper.

Theorem 7 (Achievable to universal coding  [link] ) If p θ ( x ) is twice differentiable in θ for every x n , then there exists a universal code such that θ Λ : r n ( l , θ ) ( 1 + ϵ ) K 2 log ( n ) n .

Universal coding for piecewise i.i.d. sources

We have emphasized stationary parametric classes, but a parametric class can be nonstationary. Let us show how universal coding can be achieved for some nonstationary classes of sourcesby providing an example. Consider Λ = { 0 , 1 , . . . , n } where

p θ ( x n ) = Q 1 ( x 1 θ ) · Q 2 ( x θ + 1 n ) ,

where Q 1 and Q 2 are both know i.i.d. sources. This is a piecewise i.i.d. source ; in each segment it is i.i.d., and there is an abrupt transition in statistics when the first segment ends and the second begins.

Here are two approaches to coding this source.

  1. Encode the best index θ M L using log ( n + 1 ) bits, then encode p θ M L ( x n ) . This is known as two-part code or plug-in ; after encoding the index, we plug the best parameter into the distribution. Clearly,
    l ( x ) = min 0 θ n - log p θ ( x ) + log ( n + 1 ) - log p θ ( x ) + log ( n + 1 ) + 2 .
  2. The second approach is a mixture , we allocate weights for all possible parameters,
    l ( x ) = - log 1 n + 1 i = 0 n p i ( x n ) < - log 1 n + 1 p θ M L ( x n ) = - log ( p θ M L ( x ) ) + log ( n + 1 ) .

Merhav  [link] provided redundancy theorems for this class of sources. Algorithmic approaches to the mixture appear in Shamir and Merhav  [link] and Willems  [link] .

The theme that is common to both approaches, the plug-in and the mixture, is that they lose approximately log ( n ) bits in encoding the location of the transition. Indeed, Merhav showed that the penalty for each transition in universal codingis approximately log ( n ) bits  [link] . Intuitively, the reason that the redundancy required to encode the location of the transition is largerthan the 1 2 log ( n ) from Rissanen  [link] is because the location of the transitionmust be described precisely to prevent paying a big coding length penalty in encoding segments using the wrong i.i.d. statistics. In contrast, in encoding our Bernoulli example an imprecision of 1 n in encoding θ M L in the first part of the code yields only an O ( 1 ) bit penalty in the second part of the code.

It is well known that mixtures out-compress the plug-in. However, in many cases they do so by only a small amount per parameter. For example, Baron et al. showed that the plug-in for i.i.d. sourcesloses approximately 1 bit per parameter w.r.t. the mixture.

Questions & Answers

A golfer on a fairway is 70 m away from the green, which sits below the level of the fairway by 20 m. If the golfer hits the ball at an angle of 40° with an initial speed of 20 m/s, how close to the green does she come?
Aislinn Reply
cm
tijani
what is titration
John Reply
what is physics
Siyaka Reply
A mouse of mass 200 g falls 100 m down a vertical mine shaft and lands at the bottom with a speed of 8.0 m/s. During its fall, how much work is done on the mouse by air resistance
Jude Reply
Can you compute that for me. Ty
Jude
what is the dimension formula of energy?
David Reply
what is viscosity?
David
what is inorganic
emma Reply
what is chemistry
Youesf Reply
what is inorganic
emma
Chemistry is a branch of science that deals with the study of matter,it composition,it structure and the changes it undergoes
Adjei
please, I'm a physics student and I need help in physics
Adjanou
chemistry could also be understood like the sexual attraction/repulsion of the male and female elements. the reaction varies depending on the energy differences of each given gender. + masculine -female.
Pedro
A ball is thrown straight up.it passes a 2.0m high window 7.50 m off the ground on it path up and takes 1.30 s to go past the window.what was the ball initial velocity
Krampah Reply
2. A sled plus passenger with total mass 50 kg is pulled 20 m across the snow (0.20) at constant velocity by a force directed 25° above the horizontal. Calculate (a) the work of the applied force, (b) the work of friction, and (c) the total work.
Sahid Reply
you have been hired as an espert witness in a court case involving an automobile accident. the accident involved car A of mass 1500kg which crashed into stationary car B of mass 1100kg. the driver of car A applied his brakes 15 m before he skidded and crashed into car B. after the collision, car A s
Samuel Reply
can someone explain to me, an ignorant high school student, why the trend of the graph doesn't follow the fact that the higher frequency a sound wave is, the more power it is, hence, making me think the phons output would follow this general trend?
Joseph Reply
Nevermind i just realied that the graph is the phons output for a person with normal hearing and not just the phons output of the sound waves power, I should read the entire thing next time
Joseph
Follow up question, does anyone know where I can find a graph that accuretly depicts the actual relative "power" output of sound over its frequency instead of just humans hearing
Joseph
"Generation of electrical energy from sound energy | IEEE Conference Publication | IEEE Xplore" ***ieeexplore.ieee.org/document/7150687?reload=true
Ryan
what's motion
Maurice Reply
what are the types of wave
Maurice
answer
Magreth
progressive wave
Magreth
hello friend how are you
Muhammad Reply
fine, how about you?
Mohammed
hi
Mujahid
A string is 3.00 m long with a mass of 5.00 g. The string is held taut with a tension of 500.00 N applied to the string. A pulse is sent down the string. How long does it take the pulse to travel the 3.00 m of the string?
yasuo Reply
Who can show me the full solution in this problem?
Reofrir Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Universal algorithms in signal processing and communications. OpenStax CNX. May 16, 2013 Download for free at http://cnx.org/content/col11524/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?

Ask