The thick red line shows the regression that would result from using OLS to estimate either of the two structural equations. As illustrated, an OLS estimate of the slope estimate will be biased. We need to use some other estimation technique than OLS.
Estimation
As noted earlier, the basic problem created by the endogeneity problem is that the endogenous explanatory variable is correlated with the error term. The most logical approach would be to replace this variable with one that is not correlated with the error term but highly correlated with the endogenous variable. Consider the value of the price predicted by the
reduced form equation (5):
where
is the OLS estimate of
and
Clearly,
is correlated with
It also is true that the covariance between
and
goes to zero as the sample size increasing. Thus, we can use (8) to construct a variable that will produce a consistent estimator of
It is this conclusion that underlies the strategy of both two-stage least squares (TSLQ) and instrumental variable (IV) estimators.
Two-stages least squares
The easiest way to understand two-stage least squares is to think of the estimation process as being in the following two steps (although the computer programs calculate the estimators in one step):
Stage 1: obtain a OLS predictions for any endogenous variable on the right-hand side of the equation to be estimated using as the explanatory variables all of the exogenous variables in the system.
Stage 2: estimate the parameters of the equation using OLS and replacing the endogenous variable on the right-hand side of the equation by the its predictions as obtained in step 1.
For obvious reasons he TSLS method works best when the full model is specified or when you know and can measure all of the exogenous variables in the system.
Instrumental variables (iv)
While the use of instrumental variable (IV) estimators is appropriate in a large number of situations, the two situations where they are most commonly used are (1) in the presence of endogenous explanatory variables and (2) in cases when errors arise in the measurement of an explanatory variable (or the
errors-in-variables problem). Since I have already described the endogeneity problem, I now turn to a brief discussion of errors-in-variables.
Consider the following simple model:
In this model the researcher observes
but not the desired
because of some random measurement error. Using OLS to estimate (9) using the observable
instead of the correct
is equivalent to estimating:
The important thing to note in estimating (10) using OLS is that the explanatory variable,
, is correlated with the error term,
As was the case with the endogeneity problem, the OLS estimate of
is biased. Murray (2006) summarizes the situation as follows:
In both examples, ordinary least squares estimation is biased because an explanatory variable in the regression is correlated with the error term in the regression. Such a correlation can result from an endogenous explanator, a mismeasured explanator, an omitted explanator, or a lagged dependent variable among the explanators. I call all such explanators “troublesome.” Instrumental variable estimation can consistently estimate coefficients when ordinary least squares cannot—that is, the instrumental variable estimate of the coefficient will almost certainly be very close to the coefficient’s true value if the sample is sufficiently large—despite troublesome explanators. [Murray (2006a): 112]