This module introduces the maximum likelihood estimator. We show how the MLE implements the likelihood principle. Methods for computing th MLE are covered. Properties of the MLE are discussed including asymptotic efficiency and invariance under reparameterization.
The
maximum likelihood estimator (MLE) is an
alternative to the minimum variance unbiased estimator (MVUE).For many estimation problems, the MVUE does not exist. Moreover,
when it does exist, there is no systematic procedure forfinding it. In constrast, the MLE does not necessarily satisfy any
optimality criterion, but it can almost always be computed,either through exact formulas or numerical techniques. For this reason,
the MLE is one of the most common estimation procedures used in practice.
The MLE is an important
type of estimator for the following reasons:
- The MLE implements the likelihood principle.
- MLEs are often simple and easy to compute.
- MLEs have asymptotic optimality properties
(consistency and efficiency).
- MLEs are invariant under reparameterization.
- If an efficient estimator exists, it is the MLE.
- In signal detection with unknown parameters
(composite hypothesis testing), MLEs are used in implementing thegeneralized likelihood ratio test (GLRT).
This module will discuss these properties in detail, with examples.
The likelihood principle
Supposed the data
is
distributed according to the density or mass function
. The
likelihood function for
is defined by
At first glance, the likelihood function is nothing new - it is
simply a way of rewriting the pdf/pmf of
. The difference between the
likelihood and the pdf or pmf is what is held fixed and whatis allowed to vary. When we talk about the likelihood, we view
the observation
as being fixed, and the parameter
as freely varying.
It is tempting to view the likelihood function
as a probability density for
, and to think of
as the conditional density of
given
. This approach to parameter
estimation is called
fiducial inference ,
and is not accepted by most statisticians.One potential problem, for
example, is that in many cases
is not integrable (
) and thus cannot be normalized. A more
fundamental problem is that
is viewed as a fixed
quantity, as opposed to random. Thus, it doesn't make senseto talk about its density. For the likelihood to be properly
thought of as a density, a
Bayesian approach is required.
The likelihood principle effectively states that all information we haveabout the unknown parameter
is contained in the likelihood function.
Likelihood principle
The information brought by an observation
about
is entirely
contained in the likelihood function
. Moreover, if
and
are two observations depending
on the same parameter
, such that there
exists a constant
satisfying
for every
, then they bring
the same information about
and must lead to
identical estimators.
In the statement of the likelihood principle, it is
not assumed that the two observations
and
are generated according to the same
model, as long as the model is parameterized by
.