1 Introduction
Panel data refers to the data with repeated time-series observations (T) for
a large number (N) of cross-sectional units (e.g., individuals, households, or
firms). An important advantage of using these data is that they allow re-
searchers to control for unobservable heterogeneity, that is, systematic differ-
ences across cross-sectional units (e.g., individuals, households, firms, coun-
tries). Error-components models have been widely used to control for these
individual differences. These models assume that stochastic error terms have
two components: an unobservable time-invariant individual effect which cap-
tures the unobservable individual heterogeneity and the usual random noise.
The most popular estimation methods for panel data models are the within and
the generalized least squares (GLS) estimators. For the panel data with large
N and small T , the appropriate choice of estimators depends on whether or not
regressors are correlated with the unobservable individual effect. An important
advantage of using the within estimator (least squares on data transformed into
deviations from individual means) is that it is consistent even if regressors are
correlated with the individual effect. However, a serious defect of the estimator
is its inability to estimate the impact of time-invariant regressors.1 The GLS
estimator is often used in the literature as a treatment of this problem, but it
is not without its own defect: The consistency of the GLS crucially depends
on a strong assumption that no regressor is correlated with the effect (random
effects assumption). Use of the estimator thus requires a statistical test that
can empirically validate this strong assumption. A Hausman statistic (1978) is
commonly used for this purpose (e.g., Hausman and Taylor, 1981; Cornwell and
Rupert, 1988; or Baltagi and Khanti-Akom, 1990).
In this paper, we study the asymptotic properties of the within, GLS esti-
mators and the Hausman statistic for a general error-components model with
both large numbers of cross-section and time-series observations. The GLS esti-
mator has been known to be asymptotically equivalent to the within estimator
for the cases with infinite N and T (see, for example, Hsiao, Chapter 3, 1986;
Matyas and Sevestre, Chapter 4, 1992; and Baltagi, Chapter 2, 1995). This
asymptotic equivalence result has been obtained using a naive sequential limit
method (T → ∞ followed by N →∞) and some strong assumptions such as
fixed regressors. This result naturally raises a couple of questions regarding the
asymptotic properties of the Hausman test. Firstly, the Hausman statistic could
be viewed as a distance measure between the within and GLS estimators. Then,
does the equivalence result indicate that the Hausman statistic should have a
degenerating or nonstandard asymptotic distribution under the random effects
assumption? Secondly, does the equivalence result also imply that the Haus-
1 Estimation of the effect of a certain time invariant variable on a dependent variable
could be an important task in a broad range of empirical research. Examples would be the
labor studies about the effects of schooling or gender on individual workers’ earnings, and
the macroeconomic studies about the effect of a country’s geographic location (e.g., whether
the country is located in Europe or Asia) on its economic growth. The within estimator is
inappropriate for such studies.