iid.
MVUB and MLE estimator. Now suppose that we have prior knowledge that
. We might incorporate this by forming a new estimator
This is called a
truncated sample mean
estimator of
. Is
a better estimator of
than the sample mean
?
Let
denote the density of
. Since
,
. The density of
is given by
Now consider the MSE of the sample mean
.
Note
is biased (
).
Although
is MVUB,
is better in the MSE sense.
Prior information is aptly described by regarding
as a random variable with a
prior distribution
, which implies that we know
, but otherwise
is
abitrary.
The bayesian approach to statistical modeling
Prior distribution allows us to incorporate prior information
regarding unknown paremter--probable values of parameter aresupported by prior. Basically, the prior reflects what we
believe "Nature" will probably throw at us.
Elements of bayesian analysis
(a)
joint distribution
(b)
marginal distributions
where
is a
prior .
(c)
posterior distribution
which is the Binomial likelihood.
which is the Beta prior distriubtion and
Joint density
Marginal density
Posterior density
where
is the Beta density with parameters
and
Selecting an informative prior
Clearly, the most important objective is to choose
the prior
that best reflects the prior knowledge available to
us. In general, however, our prior knowledge is imprecise andany number of prior densities may aptly capture this
information. Moreover, usually the optimal estimator can't beobtained in closed-form.
Therefore, sometimes it is desirable to choose a
prior density that models prior knowledge
and is nicely matched in functional form to
so that the optimal esitmator (and posterior density)
can be expressed in a simple fashion.
Choosing a prior
1. informative priors
design/choose priors that are compatible with prior
knowledge of unknown parameters
2. non-informative priors
attempt to remove subjectiveness from Bayesian
procedures
designs are often based on invariance arguments
Suppose we want to estimate the variance
of a process, incorporating a prior that is amplitude-scaleinvariant (so that we are invariant to arbitrary amplitude
rescaling of data).
satisifies this condition.
where
is non-informative since it is invariant to
amplitude-scale.
Conjugate priors
Idea
Given
, choose
so that
has a simple functional form.
Conjugate priors
Choose
, where
is a family of densities (
e.g. ,
Gaussian family) so that the posterior density also belongsto that family.
conjugate prior
is a
conjugate prior for
if
iid. Rather than modeling
(which did not yield a closed-form estimator) consider
With
and
this Gaussian prior also reflects prior knowledge that it is
unlikely for
.
The Gaussian prior is also conjugate to the
Gaussian likelihood
so that the resulting posterior density is also a simple
Gaussian, as shown next.
First note that
where
.
where
.
Now let
Then by "completing the square" we have
Hence,
where
is the "unnormalized" Gaussian density and
is a constant, independent of
. This implies that
where
.
Now
Where
Interpretation
When there is little data
is small
and
.
When there is a lot of data
,
and
.
Interplay between data and prior knowledge
Small
favors prior.
Large
favors data.
The multivariate gaussian model
The multivariate Gaussian model is the most
important Bayesian tool in signal processing. It leads directly tothe celebrated Wiener and Kalman filters.
Assume that we are dealing with random vectors
and
. We will regard
as a signal vector that is to be
estimated from an observation vector
.
plays the same role as
did in earlier
discussions. We will assume that
is p1 and
is N1. Furthermore,
assume that
and
are
jointly Gaussian distributed
,
,
,
,
,
.
,
which is independent of
.
,
,
.
From our Bayesian perpsective, we are interested in
.
In this formula we are faced with
The inverse of this covariance matrix can be written as
where
. (Verify this formula by applying the right hand
side above to
to
get
.)