Stata Technical Bulletin
25
label variable hybcor "hybrid 90% c.i. correct?"
label values hybcor correct
*
generate stulo=orb-t95*bootSE
generate stuhi=orb-t05*bootSE
generate Stuwide=Stuhi-Stulo
label variable Stuwide "width percentile-t interval"
generate Stucor=O
replace Stucor=I if stulo<3 & 3<stuhi
label variable stucor "student 90% c.i. correct?"
label values stucor correct
*
label data "n=80 bootstrap∕Monte Carlo"
label variable orb "Y=3X + Xe,e~chi2-1,X~chi2"
save boot4.dta, replace
end
example4.ado creates a dataset, boot4.dta, containing information on the width and inclusion rates of four types of “90%
confidence” intervals: standard t-table, bootstrap percentile, hybrid bootstrap percentile (equation [3]), and bootstrap percentile-t
intervals (equation [5]). Here are results based on 400 Monte Carlo samples:
Variable ∣ |
Obs |
Mean |
Std. Dev. |
Min |
Max |
iterate I |
400 |
200. ε |
115.6143 |
1 |
— 400 |
orb I |
400 |
2.999104 |
.5820095 |
2.119488 |
5.551427 |
orSE I |
400 |
.17784 |
.0804133 |
.0574202 |
.4905333 |
bootb I |
400 |
3.003376 |
.5399735 |
2.153254 |
5.26923 |
bootSE I |
400 |
.4232904 |
.2956025 |
.0889773 |
1.708986 |
poε I |
400 |
2.402226 |
.2288134 |
2.019951 |
3.410539 |
p9ε I |
400 |
3.748616 |
1.005218 |
2.370434 |
7.698514 |
toε I |
400 |
-6.176697 |
4.773216 |
-33.64367 |
-1.241619 |
t9ε I |
400 |
3.466492 |
1.069656 |
1.383403 |
7.744884 |
Stanlo I |
400 |
2.7035 |
.4828733 |
1.942544 |
4.943048 |
stanhi I |
400 |
3.294708 |
.6929325 |
2.242965 |
6.159807 |
Stanwide ∣ |
400 |
.5912081 |
.2677761 |
.1912093 |
1.633476 |
stancor I |
400 |
.3475 |
.4767725 |
0 |
1 |
perwide ∣ |
400 |
1.346389 |
.9071145 |
.2817495 |
5.096023 |
percor I |
400 |
.7625 |
.4260841 |
0 |
1 |
hyblo I |
400 |
2.249593 |
.4123866 |
.7103348 |
3.764181 |
hybhi I |
400 |
3.595982 |
1.019064 |
2.19773 |
8.523996 |
hybwide ∣ |
400 |
1.346389 |
.9071145 |
.2817495 |
5.096023 |
hybcor I |
400 |
.615 |
.4872047 |
0 |
1 |
stulo I |
400 |
1.359049 |
1.242871 |
-5.784867 |
3.046353 |
stuhi I |
400 |
6.822549 |
7.106885 |
2.264905 |
53.62768 |
Stuwide I |
400 |
5.4635 |
8.198419 |
.3112769 |
57.97469 |
stucor I |
400 |
.9025 |
.2970089 |
0 |
1 |
Means of stancor, percor, hybcor, and stucor indicate the proportion of “90% confidence” intervals that actually
contained β = 3. Of course the standard t-table interval fails completely: only about 35% of these “90%” intervals contain the
true parameter. The narrow intervals dictated by this method drastically understate actual sampling variation (Figure 5). Neither
bootstrap percentile approach succeeds either, obtaining about 76% and 61% coverage. (Theoretically the hybrid-percentile
method should work better than percentiles, but in experiments it often seems not to.) But the studentized or percentile-t method
seemingly works: 90% of its “90% confidence” intervals contain 3.
The percentile-t method succeeds by constructing much wider confidence intervals, which more accurately reflect true
sampling variation. The median width of percentile-t intervals is 2.66, compared with only .59 for standard t-table intervals. The
mean percentile-t interval width (5.46) reflects the pull of occasional extremely wide intervals, as seen in Figure 6.
In Hamilton (1992), I report on another OLS experiment using a somewhat less pathological regression model. There too,
bootstrap percentile-t methods achieved nominal coverage rates (over 1,000 Monte Carlo samples) when other methods did not.
That discussion includes a closer look at how studentization behaves in the presence of outliers. Bootstrap confidence intervals
based on robust estimators and standard errors (for example, see Hamilton 1991b) might achieve equally good coverage with
narrower intervals—one of many bootstrap/Monte Carlo experiments worth trying.
Warning: even with just 100 Monte Carlo samples and 2,000 bootstrap resamplings of each, example4.ado requires hours
of computing time and over three megabytes of disk space. Scaled-down experiments can convey a feel for such work, and
explore promising new ideas. For example, change the two “while 7,_mcit<101 {” statements to “while 7,_mcit<51 {”
and change “while 7,_bs ample <2001 {” to “while 7,_bsample<101 {”. Full-scale efforts might easily require four million
iterations per model/sample size (2,000 bootstrap resamplings for each of 2,000 Monte Carlo samples), tying up a desktop