We now wish to consider new model classes for signals.
Towards this end, let
be an orthonormal basis for
.
Thus for
we can write
where
.
We will now build an encoder and decoder and analyze its performance oncompact sets
.
For example, we might want to encode signals in the space
with norm
However, in this space the unit ball,
is not compact. To get a compact set we need more structure on the sequence
.
Hence we define
and we define the norm in this space as
the smallest
such that this holds. We now take
to get a compact set. Notice that when
is small the requirement for membership in
is very mild.
Next, suppose that we choose a target distortion level
.
Given
,
let
for
, where
.
We then choose
as the smallest integer so that
and thus
It follows from the requirement that
that
for each
.
Recall that
Since
,
Hence, the total number of indices in all of the
,
, is
.
To encode, for each
, we can send the following bits:
- Send
bits to identify each index in
, for
. This will require a total of
bits.
- Send one bit to identify the sign of
for each
,
. This will require
bits.
- Send
bits to describe each
, for
. This will require
bits.
Thus the total number of bits used in the encoding is
.
Notice that for each
,
, we can recover each
by
where the sign is given by the sign bit. It follows that
for every such coefficient. Here we have used the fact that knowing that
means that the first nonzero binary bit of
is the
-th bit.
To decode we simply set
We now analyze the error we have incurred in such an encoding. The square of the error will consist of two parts. The first corresponds to the
,
. For each such
we have
and so the total square error for this is
because
. The second part of the error corresponds to all the coefficients which have magnitude
.
We have that this sum does not exceed
Thus the total error we incur is
.
In summary, by allocating
bits we achieve distortion
. Equivalently, by allocating
bits, we achieve distortion
.
This is within a logarithmic factor of the optimal encoding given by Kolmogorov entropy of the class
. A slightly more careful argument can remove this logarithm.
The wavelet basis
In the method above we failed to achieve the optimal performance because of the cost involved in identifying which indices were in each
. We will now describe a method that can do better, using the Haar basis for
. Thus, we first define the scaling function
Next, we define the mother wavelet
We then define the remaining wavelets recursively. They are obtained by dilations and shifts of the mother wavelet on dyadic intervals:
where
are dyadic intervals. We denote by
the collection of all dyadic intervals contained in
. Then, the collection of functions
forms an orthonormal basis for
.
A key property of wavelets is that a tree structure can be placed on the coefficients due to the use of dyadic intervals in their construction. Thus, let
and
We define
as the smallest tree containing
. Given any binary tree of size
, we can encode the tree with at most
bits, in the process outperforming the encoder described in above.