22
Stata Technical Bulletin
STB-4
program define example3
* residual resampling regression bootstrap
* assumes variables ”Y” and ,,X,' in ’’source.dta”
*
set more 1
drop „all
set maxobs 2000
* If source.dta contains > 2,000 cases, set maxobs higher,
quietly use source.dta
quietly drop if Y==. ∣ X==.
quietly regress Y X
capture predict Yhat
capture predict e, resid
quietly replace e=e/sqrt(l-((_result(3)+l)/_result(l)))
* Previous two commands obtain full-sample regression
* residuals, and ’’fatten” them, dividing by:
* sqrt(l - K/_N)
* where K is # of model parameters and _N is sample size,
macro define _coefX=_b[X]
quietly save, replace
capture erase bootdat3.log
log using bootdat3.log
log off
set seed Illl
macro define „bsample 1
while ⅝-bsample<1001 -(
quietly use source.dta, clear
quietly generate ee=e[int(_N*uniform())+l]
quietly generate YY=Yhat+ee
quietly regress YY X
* We resample residuals only, then generate bootstrap
* Y values (called YY) by adding bootstrap residuals (ее)
* to predicted values from the original-sample
* regression (Yhat). Finally, regress these bootstrap
* YY values on original-sample X.
macro define _bSE=_b[X]∕sqrt(„result(6))
log on
display ⅝-bsample
display „b[_cons]
display _b[X]
display 7»_bSE
display (_b [X]-%_coefX)/%_bSE
display
log off
macro define _bsample=%_bsample+l
>
log close
drop „all
infile bsample bcons bcoefX bSE StucoefX using bootdat3.log
label variable bsample ’’bootstrap sample number”
label variable bcons ’’sample Y-intercept, bO”
label variable bcoefX ’’sample coefficient on X, bl”
label variable bSE ’’sample standard error of bl”
label variable StucoefX ’’studentized coefficient on X”
label data ’’regression boot∕residual resampling”
save boot3.dta, replace
end
To summarize our results in the regression of New York air pollution (Y) on population density (X):
slope standard error
original sample |
5.67∙10-6 |
7.13∙10-r |
bootstrap—data resampling |
6.24∙10-6 |
21.0∙10-r |
bootstrap—residual resampling |
5.66∙10-e |
7.89∙10-r |
Since they both assume fixed X and i.i.d. errors, results from residual resampling resemble results from the original-sample
regression (but with about 10% higher standard error). In contrast, data resampling obtains a standard error almost three times
the original-sample estimate, and a radically nonnormal distribution (skewness=3.6, kurtosis=18.3) centered right of the original-
sample regression slope. The differences in sampling distributions seen in Figure 2 dramatize how crucial the fixed-X and i.i.d.
errors assumptions are.
More intriguing information
1. The name is absent2. ENERGY-RELATED INPUT DEMAND BY CROP PRODUCERS
3. The name is absent
4. The name is absent
5. Improving behaviour classification consistency: a technique from biological taxonomy
6. The name is absent
7. Revisiting The Bell Curve Debate Regarding the Effects of Cognitive Ability on Wages
8. Natural hazard mitigation in Southern California
9. Wounds and reinscriptions: schools, sexualities and performative subjects
10. An Investigation of transience upon mothers of primary-aged children and their school