A related question often arises in settings involving signal
dictionaries. Rather than finding the best approximation to asignal
using a fixed collection of
elements from
the dictionary
, one may often seek the best
-term representation to
among all possible
expansions that use
terms from the dictionary.
Compared to linear approximation, this type of nonlinearapproximation
[link] ,
[link] utilizes the ability of
the dictionary to adapt: different elements may be important forrepresenting different signals.
The
-term nonlinear approximation problem corresponds
to the optimization
(For the sake of generality, we consider general
and
norms in this section.) Due to the nonlinearity of the
set
for a given dictionary, solving this
problem can be difficult. Supposing
is an orthonormal basis
and
, the solution to
[link] is easily obtained by
thresholding: one simply computes the coefficients
and keeps the
largest (setting the remaining coefficients to zero).
The approximation error is then given simplyby the coefficients that are discarded:
When
is a redundant dictionary, however, the situation is
much more complicated. We mention more on this below (see also
[link] (b)).
Measuring approximation quality
One common measure for the quality of a dictionary
in
approximating a signal class is the fidelity of its
-term representations. Often one examines theasymptotic rate of decay of the
-term approximation
error as
grows large. Defining
for a given signal
we may consider the asymptotic decay of
as
. (We recall
the dependence of
[link] and hence
[link] on the
dictionary
.) In many cases, the function
will decay as
for some
, and when
represents a harmonic dictionary, faster
decay rates tend to correspond to smoother functions. Indeed, onecan show that when
is an orthonormal basis, then
will decay as
if and only
if
decays as
[link] .
Nonlinear approximation of piecewise smooth functions
Let
be a 1-D function. Supposing the
wavelet dictionary has more than
vanishing moments,
then
can be well approximated using its
largest
coefficients (most of which are at coarse scales). As
grows large, the nonlinear approximation error will
decay
We use the notation
,
or
, if there exists a constant
,
possibly large but not dependent on the argument
, such
that
. as
.