The KLT, which is the optimal orthogonal transform for transform coding, requires knowledge of the source statistics that may be unavailable in practice, and eigendecomposition that may be too expensive in practice. This motivates the consideration of other orthogonal transforms, like the DFT, DCT, and DHT. Here we discuss these and analyze the resulting performance when possible. In addition, we discuss practical matters like real-valued DFTs and fast DCT implementations.
Goal: Recall that the goal of the optimal orthogonal transform was tominimize the ratio of geometric to arithmetic output variances:
The ratio
[link] attains its maximum value
(
) when
are equal for all
k and takes on much smaller values when the
difference between the
(sometimes called the dynamic
range of
) is large.
Problem with KLT: The KLT, i.e., the orthogonal transform minimizing
[link] ,
is a function of the input signal statistics.Specifically, the KLT equals the eigenvector matrix of the input
autocorrelation matrix.Unfortunately, realistic signals are non-stationary, requiring
continual KLT redesign if optimality is to be preserved, andeigenvector computation is computationally intensive, especially
for large
N .
Thus, the question becomes: Are there
fixed orthogonal
transforms that do a good job of minimizing the ratio
[link] for “typical” input signals?
As we will see, the answer is yes...
DFT Intuitions: For the sake of intuition, lets first consider choosing
T as
an orthogonal DFT matrix.In this case, the coefficient variances
would
be samples of the power spectrum and the dynamic range of
would be determined by the relative input power
in different frequency bands.Recalling that asymptotic TC (
) performance
is determined by
, which has the same
geometric-to-arithmetic-average structure as
[link] :
we might intuit that the DFT is optimal as
.
The asymptotic optimality of the DFT can, in fact, be proven(see Jayant&Noll).
Of course, we don't have much reason to expect that the DFT wouldbe optimal for finite transform dimension
N .
Still, for many signals, it performs well.(See
[link] .)
Other Transforms: The most commonly used orthogonal transform in speech, image, audio,and video coding is the discrete cosine transform (DCT).
The excellent performance of the DCT follows from the fact that it isespecially suited to “lowpass” signals, a feature shared by most
signals in the previously mentioned applications.Note that there are plenty of signals for which the DCT performs
poorly—it just so happens that such signals are not frequentlyencountered in speech, image, audio, or video.
We will describe the DCT and provide intuition regarding it's good“lowpass” performance shortly.
Like the DFT, the DCT has fast algorithms which make it extremelypractical from an implementation standpoint.
Another reasonably popular orthogonal transform, requiring even lessin the way of computation, is the discrete Hadamard transform (DHT),
also described below.
[link] compares DFT, DCT, DHT, and KLT
for various transform lengths
N along with asymptotic TC
performance.
[link] (a) shows gain over PCM when using
a lowpass autoregressive (AR) source
generated from white
Gaussian noise
via:
See
[link] for the power spectra of these two processes.
The DHT: The
DHT is defined below for power-of-two
N :
Note that
H
N is orthogonal
Caution: outputs of the Matlab command
hadamard must be
scaled by
to produce
orthogonal DHT matrices! , i.e.,
.
As an example
[link] illustrates DHT basis vectors for the case
.
The primary advantage of the DHT is that its implementationcan be accomplished very efficiently.
[link] suggests that the DHT performs
nearly as well as the KLT for
and 4, but its performance
falls well short of optimal for larger
N .
The DFT: The normalized
Due to the norm-preserving scale factor
,
the DFT definitions above differ from those given in most digitalsignal processing textbooks. DFT from
to
is defined below
along with its inverse.
The normalized DFT can be represented by a symmetric
unitary
Outputs of the Matlab command
dftmtx must be
scaled by
to produce
unitary DFT matrices. matrix
:
By unitary, we mean that
, where
denotes complex conjugation.
Note that a unitary matrix is the complex-valued equivalentof an orthogonal matrix.
In practice, the
DFT is implemented using the fast
Fourier transform (FFT), which requires
complex multiply/adds when
N is a power of two.
The Real-Valued DFT: Since we assume real-valued
x
n , complex-valued DFT outputs
y
k might seem problematic since transmitting both real
and imaginary components would decrease our transmission efficiency.For a real-valued DFT input, however, the DFT outputs exhibit
conjugate symmetry which allows us to represent the
N complex
valued outputs with only
N real-valued numbers.
More precisely, real-valued DFT input
implies that
DFT output
has the property
which implies
A good method by which to select non-redundant components
of the DFT output is:
Compute complex-valued
using the standard DFT.
Construct real-valued
from
as follows:
The method above is convenient because (i) it preserves the frequency
ordering of the DFT and (ii) it preserves the norm of theDFT output vector, i.e.,
.
Using the conjugate symmetry property of DFT outputs, we canwrite the transformation from
from
as a matrix operation
:
The normalization feature guarantees that
is unitary
(which is easily checked by inspection).Then
, the product of two unitary matrices, is also
unitary.Since
is actually real-valued (since it takes
any real-valued
x to a real-valued
y
' ) it should be
referred to as orthogonal rather than unitary.Henceforth we rename
the
real-valued DFT matrix
It is easily checked that the basis vectors of
are sampled sines and cosines at the uniformly spaced frequencies
.
[link] gives an illustration of the real-valued DFT basis
vectors for the case
.
[dft and real-valued dft for
]
Recall that when
N is a power of 2, an
N -dimensional complex-valued
FFT requires
complex multiply/adds.
When the input is real, however, an
N -dimensional FFT may be computed
using
real multiply/adds (see Sorensen&Jones&Heideman&Burrus TASSP 87).
The DCT: The DCT is defined below along with its inverse
The DCT can be represented by an orthogonal matrix
:
See
[link] for an illustration of DCT basis vectors when
.
A Fast DCT: There are a number of fast algorithms to compute the DCT.The method presented below is based on the FFT and leads to
intuition concerning the good “lowpass” performance of the DCT.
Create a
-length mirrored version of
N -length
(see
[link] (c)):
Compute
, the
-point DFT of
:
Compute
, the
N -point DCT outputs, from
:
Assuming a real-valued input, the scheme outlined above can be
implemented using
where we are assuming use of the real-FFT described previously.
DCT vs. DFT Performance for Lowpass Signals:[link] suggests that DCT and DFT performance
both equal KLT performance asymptotically, i.e., as transform dimension
.
Indeed, this can be proven (see Jayant&Noll).
A more practical question is: How do DCT and DFT performances comparefor finite
N ?
To answer this question, we will investigate the effects of input datablock length on the DCT and DFT.
To start, consider the DFT of an
N -length input block
:
It can be seen that the DFT outputs
are samples
of the discrete-time Fourier transform (DTFT) at frequencies
:
Now lets consider a periodic extension of
which
repeats this
N -length sequence a total of
L times:
Above,
denotes “
” and
L is
assumed even (see
[link] (a)-(b)).
Here is the interesting point:
the DTFT of the
-length
periodic extension equals the DTFT of the original
N -length data block
when sampled at the frequencies
!
This implies that the overall spectral shape of
will be
inherited by the DFT outputs
.
So what is the overall shape of
?
Say that
is a lowpass process.
Being lowpass, we expect the time-domain sequence
to look
relatively “smooth.”If the starting and ending points of the
N -block,
are different, however, the periodic extension
will exhibit
time-domain discontinuities (see
[link] (b))
that are uncharacteristic of the process
.
These discontinuities imply that
will contain high-frequency
content not present in the power spectrum of the lowpass input process.Based on our previous findings, if artificial high-frequency content
exists at
, it must also exist at
.
In conclusion, the periodic extension
provides an intuitive
explanation of why short-block DFT analysis of lowpass signals oftenseems corrupted by “artificial” high-frequency content.
So why is this important?Recall that transform performance is proportional to the dynamic range
of transform output variances.If the DFT outputs corresponding to otherwise low spectral energy
are artificially increased due to short-window effects,the dynamic range of
will decrease, and DFT
performance will fall short of optimal.
Now lets consider the DCT.
From derivation of the fast algorithm, we know that the
N DCT output
magnitudes from length-
N input
are equal to
the first
N DFT output magnitudes from length-
input
—a mirrored version of
.
(See
[link] (c).)
Due to the mirroring effect, the periodic extension of
will not have the discontinuities present in the periodic extension
of
, and so a
-point DFT analysis of
will
not have “artificial” high frequency enhancement.Assuming that the process from which
was extracted is
lowpass, the DCT outputs will exhibit large dynamic range, andan improvement over DFT coding performance is expected.
This is confirmed by
[link] (a).
Receive real-time job alerts and never miss the right job again
Source:
OpenStax, An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding. OpenStax CNX. Sep 25, 2009 Download for free at http://cnx.org/content/col11121/1.2
Google Play and the Google Play logo are trademarks of Google Inc.
Notification Switch
Would you like to follow the 'An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding' conversation and receive update notifications?