Stata Technical Bulletin
sg57 An immediate command for two-way tables
Nicholas J. Cox, University of Durham, UK, FAX (011)-44-91-374-2456, [email protected]
The syntax for the tab2i command is
tab2i #11 #12 [...] ∖ #21 #22 [∙∙∙] [∖ ∙∙∙] [. replace ]
where #x1, #x2, etc., are zeros or positive integers showing the frequencies in a two-way table, and backslashes separate rows
of the table. There must be at least two rows and at least two columns in the table.
Option
replace indicates that the variables listed by the command are to be left as the current data in place of whatever data were
there. These variables are row and column indices, observed and expected frequencies, and Pearson and adjusted residuals.
Explanation
A chi-squared test for association of the row and column variables in a two-way table of frequencies is featured in most first
courses in statistics. In Stata, this test is provided by the immediate command tabi or by the command tabulate. However,
neither produces output of expected (fitted, predicted) frequencies or of residuals. Most data analysts wish to glance at least
briefly at such results.
tab2i is an alternative to tabi that does produce this output. In a two-way table of frequencies, the observed frequency in
row a and column j of the table yij is compared with the expected frequency yij. Under the null hypothesis of independence,
the expected frequencies are calculated from row totals yt+, column totals y+j, and the table total y++ by
Vij —
yi+ У+j
У++
The chi-squared statistic is then
2 _ ∖ ' (yij yij)2
λ -ʌ ʌ-
The residuals produced by tab2i come in two flavors. First, Pearson residuals (also called standardized or chi-residuals)
are the (appropriately signed) square roots of each cell’s contribution to the Pearson chi-squared statistic. The Pearson residuals
are thus
¾⅛, ~ ytj
Vyij
Under the null hypothesis, the Pearson residuals approximately follow Gaussian (normal) distributions with mean 0 and variance
less than 1. Consequently, one rough rule of thumb is to look especially carefully at any residual greater than 2 in magnitude.
Second, adjusted residuals are Pearson residuals divided by an estimate of their standard error
1-2i+y1-2±n
У++ Jy y++J
so that they are distributed more like Gaussians with mean 0 and variance 1.
Example
Jacqueline Tivers (1985) interviewed 400 women with young children in the London Borough of Merton in September
1977. In one analysis, she looked at the cross-tabulation of the age at which women finished full-time education and whether
they used a library regularly. The table of frequencies did not come with a chi-squared statistic or residuals.