16
Stata Technical Bulletin
STB-22
Repair I Record 19781 |
Freq. |
Percent |
Cum. |
— | |||
1 I |
2 |
2.90 |
2.90 |
2 I |
8 |
11.59 |
14.49 |
3 I |
30 |
43.48 |
57.97 |
4 I |
18 |
26.09 |
84.06 |
5 I |
11 |
15.94 |
100.00 |
— | |||
Total I |
69 |
100.00 |
. regress price r2 r3 r4 r5 length displ weight mpg
Source I |
SS |
df |
MS |
Number of obs = 69 T?/ O CΛ'∣ _ Λ Λ~t | ||
^^ ^^ ------ ----+∙ |
— |
Г ∖ V , VV√ |
— V . ∙± I | |||
Model I |
267197833 |
8 33399729.2 |
Prob > F |
= 0.0000 | ||
Residual ∣ |
309599125 |
60 5159985.42 |
R-square |
= 0.4632 | ||
— A ,5Q17, | ||||||
^^ ^^ ^^—^^ ^^—+∙ |
— |
Adj R-square |
~ U♦O√ɪ I | |||
Total I |
576796959 |
68 8482308.22 |
Root MSE |
= 2271.6 | ||
— price I |
Coef. |
Std. Err. |
t |
p>∣t∣ |
[957. Conf. |
— Interval] |
^^ ^^ ^^—^^ ^^—+∙ |
— | |||||
r2 I |
907.3499 |
1817.764 |
0.499 |
0.619 |
-2728.719 |
4543.419 |
r3 I |
1105.359 |
1668.122 |
0.663 |
0.510 |
-2231.381 |
4442.099 |
r4 I |
2147.658 |
1702.115 |
1.262 |
0.212 |
-1257.08 |
5552.395 |
r5 I |
3816.672 |
1787.51 |
2.135 |
0.037 |
241.1194 |
7392.226 |
length I |
-117.3064 |
40.65207 |
-2.886 |
0.005 |
-198.6226 |
-35.99012 |
displ I |
8.447532 |
8.423298 |
1.003 |
0.320 |
-8.401571 |
25.29664 |
weight I |
4.089227 |
1.597143 |
2.560 |
0.013 |
.8944658 |
7.283989 |
≡Pg I |
-129.2005 |
84.52707 |
-1.529 |
0.132 |
-298.2799 |
39.87876 |
_cons I |
15158.53 |
6179.409 |
2.453 |
0.017 |
2797.871 |
27519.19 |
According to these estimates, cars with fair repair records cost an average of $907 more than cars with poor repair records.
The gap increases with each improvement in repair record. Cars with excellent repair records cost an average of $3,817 more
than cars with poor repair records.
The question may now arise: Which pairs of groups (categories of rep78) can we legitimately claim are different from each
other; which of these differences are unlikely to have arisen by chance? The answer hinges on what we view as “legitimate”.
The aggressive investigator might argue that groups 1 and 5 are different on the strength of the s statistic for the coefficient
on r5 (t = 2.135, with a p value of .037). The cautious investigator (or journal editor), however, would counter that many
comparisons of the different groups could have been made. Perhaps this test was selected for focus solely because it happens
to show a “significant” difference. And when multiple comparisons are made, the probability under the null of finding, say, a
t statistic as large as 2.135 is greater than .037. But how much greater is it—that is, what is the correct p value for this t statistic
when multiple comparisons are made?
There are many philosophical views on this problem. I examine the mechanics of one view—traditional adjustment for
multiple comparisons—in the context of regression-like models. (See [5s] oneway for a discussion of this approach in an ANOVA
context.) This view provides methods for making each test more conservative when there are multiple comparisons, so the overall
probability of making a Type I error for any pairwise comparison (declaring a difference significant when it is merely due to
chance) remains less than a predetermined value, such as 5 percent. We discuss three widely used approaches: the Bonferroni,
Sidak, and Scheffe tests.
The Bonferroni test is the simplest to implement. In this method, the cautious investigator would note that 10 pairs of
groups could have been compared and treat a reported p value of 0.037 as if it were 10 × 0.037 = 0.37. It would take a t value
of 2.9146 to be “significant” at the 5 percent level according to this logic. Using the Bonferroni rule, the contrast r5 vs. rl just
misses attaining significance.
The Sidak test is almost identical to the Bonferroni, unless the number of comparison groups is quite large. In our example,
the relevant critical value is about the same; a t statistic must be at least 2.9063 to be significant. The Scheffe test is even
more conservative, requiring a t statistic of 3.178. The Scheffe procedure is designed to hold for any linear combination of the
categories, not just for contrasts (comparisons of any two categories).
There is another consideration in a regression model that doesn’t arise in the one-way ANOVA context. In the ANOVA, the
means are guaranteed to be independent, since they come from independent samples. In the regression, a common adjustment
introduces correlation between the category means. The Scheffe method is a conservative answer that applies equally well in
the regression and ANOVA models. The Bonferroni and Sidak methods can become non-conservative in a regression context.