Stata Technical Bulletin
23
But Does It Work?
The bootstrap’s growing popularity derives partly from hope; its actual performance sometimes disappoints. Monte Carlo
simulation provides one way to evaluate bootstrapping objectively. The simulation generates samples according to a known
(user-designed) model; we then apply bootstrapping to discover (for example) how often bootstrap-based confidence intervals
actually contain the model parameters. example4.ado does this, embedding data resampling within a Monte Carlo simulation.
At the heart of example4.ado is a misspecified regression model. The usual standard errors and tests assume:
Y = βo + βι X + e [6]
with X fixed in repeated samples, and errors (e) normally, independently, and identically distributed (normal i.i.d.). But this
Monte Carlo simulation generates data according to the model:
Y = O + 3X + Xe [7]
with e distributed as χ2(l) — 1. (Note that this has a mean of 0 and a variance of 2.) X values, drawn from a χ2(1) distribution,
vary randomly. In Figure 3, 5,000 data points illustrate the problematic nature of model [7]: it challenges analysis with leverage,
outliers, skewed errors and heteroscedasticity. A Monte Carlo experiment drawing 10,000 random n=80 samples according to [7],
and analyzing them by ordinary least squares (OLS) reveals a nasty-looking sampling distribution (Figure 4). As expected, OLS
estimates are unbiased: the mean slope over 10,000 random samples (b = 2.99988) is indistinguishable from /3=3. Otherwise,
model [7] demolishes the usual OLS assumptions, and also those of residual resampling. Can data resampling still produce valid
inferences?
example4.ado explores this question. As listed here it calls for 100, n=80 Monte Carlo samples, with B=2,000 bootstrap
iterations per sample. (Results reported later represent 400 Monte Carlo samples, however.) For each Monte Carlo sample,
it obtains “90% confidence” intervals based on standard t-table procedures and three bootstrap methods: using 5th and 95th
percentiles; Hall’s “hybrid” percentile-reversal method (equation [3]); and the studentized or percentile-t method (equation [5]).
Finally, it calculates the width of each interval and checks whether the interval actually contains the parameter /3 = 3.
program define example4
* Monte Carlo simulation of bootstrap confidence intervals for a misspecified
* Cheteroscedastic, nonnormal errors) regression model. Generates IOO Monte
* Carlo samples, and resamples each of them 2,000 times,
drop _all
set more 1
set maxobs 2100
set seed 33333
capture erase example4.log
macro define .mcit=l
while ⅝-mcit<101 -(
quietly drop _all
quietly set obs 80
quietly generate X=CinvnormCuniform())) ^ 2
quietly generate Y=3*X+X*CCinvnormCuniformC)) ^ 2)-1)
* Previous two lines define the true model,
quietly regress Y X
macro define _orb=_b[X]
macro define _orSE=%_orb/sqrtC_resultC6))
* Perform the original-sample regression, storing slope
* as _orb and standard error _orSE.
quietly generate XX=.
quietly generate YY=.
quietly generate randnum=.
macro define _bsample=l
capture erase bstemp.log
log using bstemp.log
log off
while ⅝-bsample<2001 -(
* Begin bootstrap iterations, indexed by .bsample.
quietly replace randnum=intC-N*uniformC))+l
quietly replace XX=X[randnum]
quietly replace YY=YErandnum]
quietly regress YY XX
* Data resampling, not assuming i.i.d. errors,
log on
display %_orb
display ⅝.orSE
display _b[XX]
More intriguing information
1. Imitation in location choice2. Real Exchange Rate Misalignment: Prelude to Crisis?
3. The name is absent
4. El impacto espacial de las economías de aglomeración y su efecto sobre la estructura urbana.El caso de la industria en Barcelona, 1986-1996
5. Evidence-Based Professional Development of Science Teachers in Two Countries
6. The name is absent
7. Forecasting Financial Crises and Contagion in Asia using Dynamic Factor Analysis
8. The Challenge of Urban Regeneration in Deprived European Neighbourhoods - a Partnership Approach
9. Motivations, Values and Emotions: Three Sides of the same Coin
10. The name is absent