The command for estimating the random-effects model is:
. xtreg depvar [varlist], re
If the part of the command with the comma and either re or fe is omitted,
Stata will assume that you want to estimate the random-effects model.
Understanding stata output
To understand the
Stata output we need to return to the algebra of the model. Assume that we are fitting a model of the following form:
We can sum (13) over
t (holding the individual unit constant) and divide by
T to get:
where
and
Thus, (14) uses the mean values for each cross-sectional unit. We can subtract (14) from (13) to get:
Equations (13), (14), and (15) are the basis of
Stats’s estimates of the parameters of the model. In particular, the command
xtreg, fe uses OLS to estimate (15); this is known as the
fixed-effects estimator (or the
within estimator). The command
xtreg, be uses OLS to estimate (14) and is known as the
between estimator. The command
xtreg, re —the random-effects estimator—is a weighted average of the between and within estimators, where the weight is a function of the variances of and ( and respectively).
See Cameron and Trivedi (2005: 705] for a detailed discussion of the random-effects estimator.
In general, you will not make use of the between estimator. However, these three equations do lie at the basis of the goodness-of-fit measures that
Stata reports. In particular,
Stata output reports three “R-squareds”
R-squared is in quotes in this line because these R-squareds do not have all the properties of OLS R-squareds. —the
overall-R
2 the
between-R
2 and the
within-R
2 These three R-squareds are derived using one of the three equations. In particular, the overall-
R
2 uses (13); the between-
R
2 uses (14); and the within-
R
2 uses (15).
A panel data analysis using
Stata
In this example we follow the example offered in the
Stata manual and use a large data set from the National Longitudinal Survey of wage data on 28,534 women who were between 14 and 26 years of age in 1968. The women were surveyed in each of the 21 years between 1968 and 1988 except for the six years 1974, 1976, 1979, 1981, 1984, and 1986. The study is focused on the determinants of wage levels, as measured by the natural logarithm of real wages.
Figure 1 shows the commands used to put the data into
Stata . The first command (
set memory 5m ) increases the size of the memory that the program uses; I did this because of the large sample size. The
use command accesses that data from the
Stata web site. The
describe command calls up a description of the variables. Figure 2 presents a summary of the data using the command
summerize .
There are several transformations of the variables that we will need. In particular, we want to include the squares of several of the variables in our regression—age (
age ), work experience (
ttl_exp ), and job tenure (
tenure ). The reason we want to use the square of these variables is that we have reason to believe that wages have a non-linear relationship with these variables. For instance, consider the number of years a worker has been on the job,
Tenure . Theory suggests that wages increase over a worker’s work-life at a decreasing rate. Thus, if the equation we are estimating is
what we expect is that:
and
The only way that this last equation can be true is if
Moreover, if this is true, the first-derivative implies that
Also, notice that we can determine the number of years in a job when wages reach a peak;
y reaches a maximum at the age where
. or when
The fact that
guarantees that this point is indeed a maximum.