<< Chapter < Page | Chapter >> Page > |
Consider . The goal of lossy compression [link] is to describe , also of length but possibly defined over another reconstruction alphabet , such that the description requires few bits and the distortion
is small, where is some distortion metric. It is well known that for every and distortion level there is a minimum rate , such that can be described at rate . The rate is known as the rate distortion (RD) function, it is the fundamental information theoretic limit of lossycompression [link] , [link] .
The invention of lossy compression algorithms has been a challenging problem for decades. Despite numerous applicationssuch as image compression [link] , [link] , video compression [link] , and speech coding [link] , [link] , [link] , there is a significant gap between theory and practice, and these practical lossy compressorsdo not achieve the RD function. On the other hand, theoretical constructions that achieve the RD function are impractical.
A promising recent algorithm by Jalali and Weissman [link] is universal in the limit of infinite runtime. Its RD performance is reasonable even with modest runtime.The main idea is that the distortion version of the input can be computed as follows,
where is the slope at the particular point of interest in the RD function, and is the empirical conditional entropy of order ,
where is a context of order , and as before is the number of times that the symbol appears following a context in . Jalali and Weissman proved [link] that when , the RD pair converges to the RD function asymptotically in . Therefore, an excellent lossy compression technique is to compute and then compress it. Moreover, this compression can be universal. In particular, the choice of context order ensures that universal compressors for context tress sources can emulate the coding length of the empirical conditional entropy .
Despite this excellent potential performance, there is still a tremendous challenge. Brute force computation of the globally minimum energysolution involves an exhaustive search over exponentially many sequences and is thus infeasible.Therefore, Jalali and Weissman rely on Markov chain Monte Carlo (MCMC) [link] , which is a stochastic relaxation approach to optimization. The crux of the matter is to definean energy function,
The Boltzmann probability mass function (pmf) is
where is related to temperature in simulated annealing, and is the normalization constant, which does not need to be computed.
Because it is difficult to sample from the Boltzmann pmf [link] directly, we instead use a Gibbs sampler , which computes the marginal distributions at all locations conditioned on the rest of being kept fixed. For each location, the Gibbs sampler resamples from the distribution of conditioned on as induced by the joint pmf in [link] , which is computed as follows,
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?