The name is absent



22


Stata Technical Bulletin


STB-4


program define example3

* residual resampling regression bootstrap

* assumes variables ”Y” and ,,X,' in ’’source.dta”

*

set more 1
drop „all
set maxobs 2000

* If source.dta contains > 2,000 cases, set maxobs higher,
quietly use source.dta
quietly drop if Y==. ∣ X==.
quietly regress Y X
capture predict Yhat
capture predict e, resid
quietly replace e=e/sqrt(l-((_result(3)+l)/_result(l)))
* Previous two commands obtain full-sample regression
* residuals, and ’’fatten” them, dividing by:

*          sqrt(l - K/_N)

* where K is # of model parameters and _N is sample size,
macro define _coefX=_b[X]
quietly save, replace
capture erase bootdat3.log
log using bootdat3.log
log off
set seed Illl

macro define „bsample 1
while ⅝-bsample<1001 -(
quietly use source.dta, clear
quietly generate ee=e[int(_N*uniform())+l]
quietly generate YY=Yhat+ee
quietly regress YY X

* We resample residuals only, then generate bootstrap

* Y values (called YY) by adding bootstrap residuals (ее)
* to predicted values from the original-sample

* regression (Yhat). Finally, regress these bootstrap
* YY values on original-sample X.

macro define _bSE=_b[X]∕sqrt(„result(6))
log on
display ⅝-bsample
display „b[_cons]
display _b[X]
display 7»_bSE

display (_b [X]-%_coefX)/%_bSE
display
log off
macro define _bsample=%_bsample+l
>
log close
drop „all

infile bsample bcons bcoefX bSE StucoefX using bootdat3.log
label variable bsample ’’bootstrap sample number”
label variable bcons ’’sample Y-intercept, bO”
label variable bcoefX ’’sample coefficient on X, bl”
label variable bSE ’’sample standard error of bl”
label variable StucoefX ’’studentized coefficient on X”
label data ’’regression boot∕residual resampling”
save boot3.dta, replace
end

To summarize our results in the regression of New York air pollution (Y) on population density (X):

slope standard error

original sample

5.67∙10-6

7.13∙10-r

bootstrap—data resampling

6.24∙10-6

21.0∙10-r

bootstrap—residual resampling

5.66∙10-e

7.89∙10-r

Since they both assume fixed X and i.i.d. errors, results from residual resampling resemble results from the original-sample
regression (but with about 10% higher standard error). In contrast, data resampling obtains a standard error almost three times
the original-sample estimate, and a radically nonnormal distribution (skewness=3.6, kurtosis=18.3) centered right of the original-
sample regression slope. The differences in sampling distributions seen in Figure 2 dramatize how crucial the fixed-X and i.i.d.
errors assumptions are.



More intriguing information

1. The name is absent
2. Assessing Economic Complexity with Input-Output Based Measures
3. TECHNOLOGY AND REGIONAL DEVELOPMENT: THE CASE OF PATENTS AND FIRM LOCATION IN THE SPANISH MEDICAL INSTRUMENTS INDUSTRY.
4. Momentum in Australian Stock Returns: An Update
5. Public-Private Partnerships in Urban Development in the United States
6. Life is an Adventure! An agent-based reconciliation of narrative and scientific worldviews
7. Education as a Moral Concept
8. Opciones de política económica en el Perú 2011-2015
9. Placentophagia in Nonpregnant Nulliparous Mice: A Genetic Investigation1
10. The name is absent
11. The Environmental Kuznets Curve Under a New framework: Role of Social Capital in Water Pollution
12. The name is absent
13. The name is absent
14. The name is absent
15. The Trade Effects of MERCOSUR and The Andean Community on U.S. Cotton Exports to CBI countries
16. The name is absent
17. The effect of classroom diversity on tolerance and participation in England, Sweden and Germany
18. FISCAL CONSOLIDATION AND DECENTRALISATION: A TALE OF TWO TIERS
19. Who is missing from higher education?
20. Achieving the MDGs – A Note