Stata Technical Bulletin
25
To compute Small and Hsiao’s test, the sample is divided into two random subsamples of approximately equal size. The
unrestricted MNLM is estimated on both subsamples. The weighted average of the coefficients from the two samples is defined
as follows:
g',a = (⅛)g' + [1^⅛H
where Sβ is a vector of estimates from the unrestricted model on the first subsample and Sf- is its counterpart for the second
subsample. Next, a restricted sample is created from the second subsample by eliminating all cases with a chosen value of the
dependent variable. The MNLM is estimated using the restricted sample yielding the estimates β^2 and the likelihood L(β^2β
The Small-Hsiao statistic is the difference:
SH = -2 - i(∕¾s2)]
SH is asymptotically distributed as a chi-squared with the degrees of freedom equal to K + 1, where K is the number of
independent variables.
For both the Hausman test and the Small-Hsiao test, multiple tests of IIA are possible. Assuming that the MNLM is estimated
with base category Base, J — 1 tests can be computed by excluding each of the remaining categories to form the restricted
model. By changing the base category, a test can also be computed that excludes Base. Note that results differ depending on
which base category was used to estimate the model.
Acknowledgments
For information on related programs and future updates to this program, please check www.indiana.eduZ'jsl650Zpost.htm
References
Hausman, J. A. and D. McFadden. 1984. Specification tests for the multinomial logit model. Econometrica 52: 1219-1240.
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage.
Long, J. S. and J. Freese. 2000. sg152: Listing and interpreting transformed coefficients from certain regression models. Stata Technical Bulletin 57:
27-34.
McFadden, D., W. Tye, and K. Train. 1976. An application of diagnostic tests for the independence from irrelevant alternatives property of the
multinomial logit model. Transportation Research Board Record 637: 39-45.
Rogers, W. H. 1995. sqv10: Expanded multinomial comparisons. Stata Technical Bulletin 23: 26-28. Reprinted in Stata Technical Bulletin Reprints,
vol. 4, pp. 181 - 183.
Small, K. A. and C. Hsiao. 1985. Multinomial logit specification tests. International Economic Review 26: 619-627.
sg156 Mean score method for missing covariate data in logistic regression models
Marie Reilly, Epidemiology & Public Health, University College Cork, Ireland, [email protected]
Agus Salim, Department of Statistics, University College Cork, Ireland, [email protected]
Abstract: The command meanscor that implements the mean score method of Reilly and Pepe (1995) for incorporating
incomplete cases into logistic regression analysis through a weighted regression model is introduced and illustrated.
Keywords: missing data, mean score method, logistic regression.
Background
Missing data is a common problem in statistical analysis. Perhaps the most popular approach when confronted with missing
data is excluding the incomplete cases from analysis and proceeding to analyze the complete cases using standard methods.
While valid under certain assumptions regarding the missingness mechanism, this approach results in a loss of precision due
to the ignored observations. The mean score method of Reilly and Pepe (1995) allows us to incorporate the incomplete cases
into logistic regression analysis through a weighted regression model. For random missingness, this results in an improvement
in efficiency over the analysis of complete cases only. More importantly, the method is applicable to a wide range of patterns
of missingness known as MAR (missing at random), where missingness may depend on the completely observed variables but
not on the unobserved value of the incompletely observed variable(s).
Syntax
meanscor depvar [ indepvars ] [if exp∖ [in range] [, first(rrrlitf) second (vrrlist) odd(#) ]