Where
is the new weight vector,
is the old weight vector, and
is a small step in the instantaneous error gradient
direction.
Interpretation in terms of weight error vector
Define
Where
is the optimal weight vector and
where
is the minimum error. The stochastic difference
equation is:
Convergence/stability analysis
Show that (tightness)
With probability 1, the weight error vector is bounded for all
.
Chebyshev's inequality is
and
where
is the squared bias. If
is finite for all
,
then
for all
.
Also,
Therefore
is finite if the diagonal elements of
are bounded.
Convergence in mean
as
. Take expectation of
using
smoothing property to simplify the calculation. We haveconvergence in mean if
is positive definite (invertible).
.
Bounded variance
Show that
, the weight vector error covariance is bounded for
all
.
We
could have
, but
; in which case the algorithm would not be
stable.
Recall that it is fairly straightforward to show that the
diagonal elements of the transformed covariance
tend to zero if
(
is the eigenvector matrix of
;
). The diagonal elements of
were denoted by
.
Thus, to guarantee boundedness of
we need to show that the "steady-state" values
.
We showed that
where
,
is the
eigenvalue of
(
), and
.
We found a sufficient condition for
that guaranteed that the
steady-state
's (and hence
) are bounded:
Where
is the input vector energy.
With this choice of
we have:
convergence in mean
bounded steady-state variance
This implies
In other words, the LMS algorithm is stable about the optimumweight vector
.
Learning curve
Recall that
and
. These imply
where
. So the MSE
Where
. So the limiting MSE is
Since
was required for convergence,
so that we see noisy adaptation leads to an MSE
larger than the optimal
To quantify the increase in the MSE, define the so-called
misadjustment :
We would of course like to keep
as small as possible.
Learning speed and misadjustment trade-off
Fast adaptation and quick convergence require
that we take steps as large as possible. In other words,learning speed is proportional to
; larger
means faster convergence. How
does
affect the
misadjustment?
To guarantee convergence/stability we require
Let's assume that in fact
so that there is no problem with convergence. This condition
implies
or
. From here we see that
This misadjustment
This shows that larger step size
leads to larger
misadjustment.
Since we still have convergence in mean, this
essentially means that with a larger step size we "converge"faster but have a larger variance (rattling) about
.