The mean-squared error for
any estimate of
a nonrandom parameter has a lower bound, the
Cramr-Rao Bound
(Cramr (1946) pp. 474-477) , which
defines the ultimate accuracy of
any estimation procedure. This lower bound, as shown later, is
intimately related to the maximum likelihood estimator.
We seek a "bound" on the mean-squared error matrix
M defined to be
M=((()-)(()-)T)=(T)
A matrix is "lower bounded" by a second matrix if the difference
between the two is a non-negative definite matrix. Define thecolumn matrix
x to be
x=(()--b()ddln(p(r)))
where
b() denotes the column matrix of estimator biases. To derive the
Cramr-Rao bound, evaluate
(xxT) .
(xxT)=(M-bbTI+dbdI+dbdTF)
where
dbd represents the matrix of partial derivatives of the bias
[Math Processing Error] and the matrix
F is
the
Fisher information matrix
F=(ddln(p(r))ddln(p(r))T)
Note that this matrix can alternatively be expressed as
F=-(T(ln(p(r))))
The notation
T means the matrix of all second partials of the quantity it
operates on (the gradient of the gradient). The matrix is knownas the
Hessian . Demonstrating the equivalence of
these two forms for the Fisher information is quiteeasy. Because
∫p(r)dr=1 for all choices of the parameter vector, the gradient of the
expression equals zero. Furthermore,
ddln(p(r))=ddp(r)p(r) . Combining these results yields
∫ddln(p(r))p(r)dr=vector(0)
Evaluating the gradient for this quantity (using the chain rule)
also yields zero.
∫T(ln(p(r)))p(r)+ddln(p(r))ddln(p(r))Tp(r)dr=0
or
(ddln(p(r))ddln(p(r))T)=-(T(ln(p(r))))
Calculating the expected value for the Hessian for is somewhat
easier than finding the expected value of the outer product ofthe gradient with itself. In the scalar case, we have
[Math Processing Error]
Returning to the derivation, the matrix
(xxT) is non-negative definite because it is a correlation
matrix. Thus, for any column matrix,
, the quadratic form
T(xxT) is non-negative. Choose a form for
that simplifies the
quadratic form. A convenient choice is
=(-(F(−1)(I+dbd)T))
where
is an arbitrary
column matrix. The quadratic form becomes in this case
T(xxT)=T(M-bbT-(I+dbd)F(−1)(I+dbd)T)
As this quadratic form must be non-negative, the
matrix expression enclosed in brackets must be non-negativedefinite. We thus obtain the well-known Cramr-Rao bound
on the mean-square error matrix.
(T)≥b()b()T+(I+dbd)F(−1)(I+dbd)T
This form for the Cramr-Rao Bound does
not mean that each term in the matrix of
squared errors is greater than the corresponding term in thebounding matrix. As stated earlier, this expression means that
the difference between these matrices is non-negativedefinite. For a matrix to be non-negative definite, each term on
the main diagonal must be non-negative. The elements of the maindiagonal of
(T) are the squared errors of the estimate of the individualparameters. Thus, for each parameter, the mean-squared
estimation error can be no smaller than
(((,,,i)-i)2)≥bi()2+(I+dbd)F(−1)(I+dbd)Ti,i
This bound simplifies greatly if the estimator is unbiased (
b=vector(0) ).
In this case, the Cramr-Rao bound becomes
(((,,,i)-i)2)≥F(−1)i,i
Thus, the mean-squared error for each parameter in a
multiple-parameter, unbiased-estimator problem can be no smallerthan the corresponding diagonal term in the
inverse for the Fisher information
matrix. In such problems, the estimate's error characteristicsof any parameter become intertwined with the other parameters in
a complicated way. Any estimator satisfying the Cramr-Rao
bound with equality is said to be
efficient .