<< Chapter < Page | Chapter >> Page > |
An alternative approach to the sample selectivity problem is to use a maximum likelihood estimator . Heckman (1974) originally suggested estimating the parameters of the model by maximizing the average log likelihood function:
where is the probability density function for the bivariate normal distribution. Fortunately, Stata offers a single command for calculating either the two-step or the maximum likelihood estimators.
Estimation of the two versions of the Heckman sample selectivity bias models is straightforward in Stata . The command is:
.heckman depvar [varlist], select(varlist_s) [twostep]
or
.heckman depvar [varlist], select(depvar_s = varlist_s) [twostep]
The syntax for maximum-likelihood estimates is:
.heckman depvar [varlist] [weight][if exp] [in range], select([depvar_s =] varlist_s [, offset(varname) noconstant]) [ robust cluster(varname) score(newvarlist|stub*) nshazard(newvarname) mills(newvarname) offset(varname) noconstant constraints(numlist) first noskip level(#) iterate(0) nolog maximize_options ]
The predict command has these options, among others:
xb , the default, calculates the linear predictions from the underlying regression equation.
ycond calculates the expected value of the dependent variable conditional on the dependent variable being observed/selected; E(y | y observed).
yexpected calculates the expected value of the dependent variable (y*), where that value is taken to be 0 when it is expected to be unobserved; y* = P(y observed) * E(y | y observed). The assumption of 0 is valid for many cases where nonselection implies non-participation (e.g., unobserved wage levels, insurance claims from those who are uninsured, etc.) but may be inappropriate for some problems (e.g., unobserved disease incidence).
Examples of these two commands are:
. heckman wage educ age, select(married children educ age)
. predict yhat
These two command would use the maximum likelihood estimate of the equations (1) wage as a function of education and age using a selection equation that used marital status, number of children, education level, and age to explain which individuals are participating in the labor force. The help file in Stata provides additional information on the structure of the Heckman command and is well worth printing out if you are dealing with a sample selectivity bias problem.
We will illustrate various issues of selection bias using the data set available from the Stata site. Retrieve the data set by entering:
. use http://www.stata-press.com/data/imeus/womenwk, clear
This data set has 2,000 observations of 15 variables. We can use the describe command (.describe) to get a brief description of the data set:
obs: 2,000 | ||||
vars: 15 | 9 Nov 2004 20:23 | |||
size: 142,000 | (86.5% of memory free) | |||
Variable Name | Storage Type | Display Format | Value Label | Variable Label |
c1 | double | %10.0g | ||
c2 | double | %10.0g | ||
u | double | %10.0g | ||
v | (7,2) | %10.0g | ||
country | float | %9.0g | ||
age | int | %8.0g | ||
education | int | %8.0g | ||
married | byte | %8.0g | ||
children | int | %8.0g | ||
select | float | %9.0g | ||
wageful | float | %9.0g | ||
wage | float | %9.0g | ||
lw | float | %9.0g | ||
work | float | %9.0g | ||
lwf | float | %9.0g |
Notification Switch
Would you like to follow the 'Econometrics for honors students' conversation and receive update notifications?