The pdf of the joint distribution shown in (1) is known as the
likelihood function . If the sample
were not independently drawn, the pdf of joint distribution could not be written in such a simple form because of the
covariance among the members of the sample would not be equal to zero . The logarithm of this function (or as it is referred to, the log of the likelihood function) is given by the sum
The maximum likelihood method involves choosing as estimators of the unknown parameters of the distribution the values that maximize the likelihood function. However, because the logarithm is a monotonically increasing function
, maximizing the log of the likelihood function is equivalent to maximizing the likelihood function. The following example of this procedure illustrates how to derive ML estimators.
The ml estimator of the population mean and population variance.
Assume that
Consider a sample of size n drawn independently from this distribution. The likelihood function is the product of the pdf of each observation or:
Thus, the log of the likelihood function of this sample is
In the ML method we want to find the estimators of the mean and variance,
and
, that maximize the log of the likelihood function. Substituting in the parameter estimates into the log of the likelihood function gives our problem as:
Setting the derivatives of the log of the likelihood function with respect to
and
equal to 0 gives:
Solving these two equations simultaneously gives:
Notice the fact that the estimator of the population mean is equal to the sample mean, a result that is the same as the one you found in your introductory statistics course. However, the
unbiased estimator of the population variance used in that course is
Thus, one of the common "problems" with using a ML estimator is that quite often they are
biased estimators of a population parameter. On the other hand, under very general conditions ML estimators are
consistent , are
asymptotically efficient , and have an
asymptotically normal distribution (these are desirable large sample size characteristics of potential estimators and are discussed in advanced statistics courses).
Application of the ml method to regressions
The discussion above illustrates the basics of the ML method—you form the log of the likelihood function and then find the values of the parameter estimates that maximize this function. In most cases the maximization will not yield answers in closed form—that is, you cannot find a neat algebraic formula as we did for the population mean. However, you can use computer programs to search for the values of the parameter estimates that maximize this function. Thus, in most cases in advanced regression models you often will treat the ML method as a “black box” and not concern yourself with the estimation details. However, I illustrate one more example of the ML technique.
The ml estimators for a simple regression.
Assume that we want to estimate the population parameters for the regression model
where we assume that
-
-
for
-
and
(this assumption allows us to ignore the estimation of the intercept term), and
-
is a non-stochastic variable.
The assumption of a normally distributed error term implies that
Thus, the pdf of the error term is
and, thus, the likelihood function
is:
and the log of the likelihood function is
We find the estimators
and
in the same manner as we did for the sample mean and variance. Differentiating the log of the likelihood function and setting these first derivatives equal to 0 gives the following two first-order conditions:
and
Thus, the ML estimators are:
Notice that in this simple case the ML estimator of
is the same as the OLS estimator of
. Also, notice that the ML estimator of
is biased—the (unbiased) OLS estimator of
is
You can use the examples in this module as the basis of your understanding of the ML method. When you see that the ML method is used in a computer program, you can be fairly certain that the program uses one of the many optimizing subroutines to find the maximum of the log of the likelihood program. You can consult the help files with the computer program to see what underlying distribution is used to set up the log of the likelihood function. A concept related to the maximum likelihood estimation method worth exploring is
the likelihood ratio test (see the module by Don Johnson entitled
The Likelihood Ratio Test for an introduction to this key statistical test.)
Exercises
Consider the following functions. For each of them, (1) prove that the function is a pdf; (2) calculate the mean and variance of each distribution, and (3) find the maximum likelihood estimator of the parameter
Sketch a graph of each of the distributions for a representative value of
-
where
and
-
where
and