20
Stata Technical Bulletin
STB-4
save bootl.dta, replace
* Final steps read the log file ,,bootdatl. log” , label
* variables, and save dataset bootl.dta containing means from
* 1,000 bootstrap resamplings of the original source.dta data,
end
When debugging or modifying ado-files, type program drop _all between run attempts, so that Stata will forget any previous
buggy versions.
Applied to New York population density, example 1 .ado generates the bootstrap distribution graphed in Figure 1. When the
data contain outliers, bootstrapping often produces odd-looking sampling distributions—a thought-provoking antidote to routine
normality assumptions. Bootstrapping here also yields a somewhat lower standard error estimate, but no evidence of bias:
mean standard error
original sample 9661 3474
bootstrap—data resampling 9662 3395
What else can we conclude from the Figure 1 results? Several authors have recommended using bootstrap percentiles
directly as confidence-interval bounds. For example, one might form a “90% confidence” interval from the bootstrap 5th and
95th percentiles. Unfortunately, this often works poorly.
Peter Hall (1988) observes that if the sampling distribution is asymmetrical (like Figure 1), using 5th and 95th percentiles
as low and high confidence-interval endpoints is “backwards.” For example, 90% of sample v values fall between the 5th (δ.05)
and 95th (b.95) percentiles of b’s sampling distribution:
b.05 < b < b.95 [la]
Writing [1a] as a distance above and below the true parameter β:
β + (b.05 -β')<b<β+ (b.g5 - β') [lb]
Confidence intervals rearrange this inequality to isolate β:
b - (b.95 - β) < β < b - (b.05 - β) [2]
This suggests a better (“hybrid”) bootstrap-percentile confidence interval formula:
b-(b*95-b)<∕3<b-(b*05-b) [3]
where b is the original sample statistic, and b*95 and b*05 represent bootstrap 95th and 5th percentiles.
Monte Carlo research finds that with or without this asymmetry correction, bootstrap-percentile confidence intervals often
achieve less than nominal coverage. Strategies for improvement include accelerated bias correction (BCα) and percentile-t
methods. The simpler of the two, percentile-t, first obtains studentized values:
t* = (b* -b)∕SEζ [4]
then uses bootstrap percentiles of t* to form confidence intervals, for example:
b - t*95SEb < β <b- t*05SEb [5]
The standard error of ——SEb—might be estimated from either the original sample or (better) from the bootstrap standard
deviation.
Bootstrapping Regression
In many instances, bootstrapping a mean (as in example 1. ado) has no advantage over inferences based on the Central
Limit Theorem. Bootstrapping helps more with multivariable methods like regression, where the classic inferential procedures
depend on a longer list of often-false assumptions. Some bootstrapping methods implicitly make similar assumptions, while
others abandon them—obtaining quite different results.