The name is absent



Stata Technical Bulletin

21


The bootstrapping method of examplel.ado, data resampling, generalizes to resampling entire cases. In two-variable
regression, this means we resample (X,Y) pairs as in example2.ado.2

program define example2

* data-resampling regression bootstrap

* assumes variables "Y" and ,,X,' in "source.dta"
*
set more 1
drop „all
set maxobs 2000

* If source.dta contains > 2,000 cases, set maxobs higher,
quietly use source.dta
quietly drop if Y==.
X==.
save, replace
quietly regress Y X
macro define _coefX=_b[X]

* _coefX equals the original-sample regression coefficient on X
capture erase bootdat2.log
log using bootdat2.log
log off
set seed Illl

macro define „bsample 1
while ⅝-bsample<1001 -(

* For confidence intervals or tests, we need 2000 or more

* bootstrap samples.

quietly use source.dta, clear

generate randnum=int(-N*uniform())+l
quietly generate YY=Y[randnum]
quietly generate XX=X[randnum]
quietly regress YY XX

* The last three commands randomly resample (X,Y) pairs

* from the data.

macro define _bSE=_b[XX]∕sqrt(„result (6))
log on
display ⅝-bsample
display „b[_cons]
display _b[XX]
display 7»_bSE

display (_b[XX]-⅜.coefX)∕⅜.bSE

* Calculated either way, this command obtains a

* studentized coefficient:

* (bootstrap coef. - original coef.)∕SE of bootstrap coef.
display
log off
macro define _bsample=%_bsample+l
>
log close
drop „all

infile bsample bcons bcoefX bSE StucoefX using bootdat2.log
label variable bsample "bootstrap sample number"
label variable bcons "sample Y-intercept, bθ"
label variable bcoefX "sample coefficient on X, bl"
label variable bSE "sample standard error of bl"
label variable StucoefX "studentized coefficient on X"
label data "regression boot∕data resampling"
save boot2.dta, replace
end

Figure 2 shows two distributions obtained by bootstrapping the regression of New York air pollution on population
density. Data resampling (at top in Figure 2) does not make the usual regression assumptions of fixed X and independent,
identically distributed (i.i.d.) errors. Consequently it often yields larger standard error estimates and skewed, multimodal sampling
distributions. If the usual assumptions are false, we are right to abandon them, and bootstrapping may provide better guidance.
If the assumptions are true, on the other hand, data resampling is too pessimistic.

Since it scrambles the case sequence, data resampling is also inappropriate with time or spatial series. We could get bootstrap
time series in which 1969 appears three times, and 1976 not at all, for instance.

Residual resampling, an alternative regression bootstrap approach, retains the fixed-X and i.i.d.-errors assumptions. Residuals
from the original-sample regression, divided by ʌ/l —
K∕eN, are resampled and added to original-sample Y values to generate
bootstrap
Y* values, which then are regressed on original-sample X values. example3. ado illustrates, using the same two-variable
model as example2.ado. Results appear at bottom in Figure 2. Comments explain features new since example2.ado.



More intriguing information

1. The name is absent
2. New urban settlements in Belarus: some trends and changes
3. The name is absent
4. The name is absent
5. The name is absent
6. BODY LANGUAGE IS OF PARTICULAR IMPORTANCE IN LARGE GROUPS
7. The name is absent
8. The name is absent
9. The name is absent
10. Public infrastructure capital, scale economies and returns to variety
11. Geography, Health, and Demo-Economic Development
12. Influence of Mucilage Viscosity On The Globule Structure And Stability Of Certain Starch Emulsions
13. The duration of fixed exchange rate regimes
14. Insecure Property Rights and Growth: The Roles of Appropriation Costs, Wealth Effects, and Heterogeneity
15. Om Økonomi, matematik og videnskabelighed - et bud på provokation
16. The name is absent
17. The name is absent
18. Improvements in medical care and technology and reductions in traffic-related fatalities in Great Britain
19. Categorial Grammar and Discourse
20. Biological Control of Giant Reed (Arundo donax): Economic Aspects