The name is absent

Stata Technical Bulletin

The bootstrapping method of examplel.ado, data resampling, generalizes to resampling entire cases. In two-variable
regression, this means we resample (X,Y) pairs as in example2.ado.²

program define example2

* data-resampling regression bootstrap

* assumes variables "Y" and ^,,X^,' in "source.dta"
*
set more 1
drop „all
set maxobs 2000

* If source.dta contains > 2,000 cases, set maxobs higher,
quietly use source.dta
quietly drop if Y==. ∣ X==.
save, replace
quietly regress Y X
macro define _coefX=_b[X]

* _coefX equals the original-sample regression coefficient on X
capture erase bootdat2.log
log using bootdat2.log
log off
set seed Illl

macro define „bsample 1
while ⅝-bsample<1001 -(

* For confidence intervals or tests, we need 2000 or more

* bootstrap samples.

quietly use source.dta, clear

generate randnum=int(_-N*uniform())+l
quietly generate YY=Y[randnum]
quietly generate XX=X[randnum]
quietly regress YY XX

* The last three commands randomly resample (X,Y) pairs

* from the data.

macro define _bSE=_b[XX]∕sqrt(„result (6))
log on
display ⅝_-bsample
display „b[_cons]
display _b[XX]
display 7»_bSE

display (_b[XX]-⅜.coefX)∕⅜.bSE

* Calculated either way, this command obtains a

* studentized coefficient:

* (bootstrap coef. - original coef.)∕SE of bootstrap coef.
display
log off
macro define _bsample=%_bsample+l
>
log close
drop „all

infile bsample bcons bcoefX bSE StucoefX using bootdat2.log
label variable bsample "bootstrap sample number"
label variable bcons "sample Y-intercept, bθ"
label variable bcoefX "sample coefficient on X, bl"
label variable bSE "sample standard error of bl"
label variable StucoefX "studentized coefficient on X"
label data "regression boot∕data resampling"
save boot2.dta, replace
end

Figure 2 shows two distributions obtained by bootstrapping the regression of New York air pollution on population
density. Data resampling (at top in Figure 2) does not make the usual regression assumptions of fixed X and independent,
identically distributed (i.i.d.) errors. Consequently it often yields larger standard error estimates and skewed, multimodal sampling
distributions. If the usual assumptions are false, we are right to abandon them, and bootstrapping may provide better guidance.
If the assumptions are true, on the other hand, data resampling is too pessimistic.

Since it scrambles the case sequence, data resampling is also inappropriate with time or spatial series. We could get bootstrap
time series in which 1969 appears three times, and 1976 not at all, for instance.

Residual resampling, an alternative regression bootstrap approach, retains the fixed-X and i.i.d.-errors assumptions. Residuals
from the original-sample regression, divided by ʌ/l — K∕eN, are resampled and added to original-sample Y values to generate
bootstrap Y* values, which then are regressed on original-sample X values. example3. ado illustrates, using the same two-variable
model as example2.ado. Results appear at bottom in Figure 2. Comments explain features new since example2.ado.

More intriguing information

1. Internationalization of Universities as Internationalization of Bildung
2. ARE VOLATILITY EXPECTATIONS CHARACTERIZED BY REGIME SHIFTS? EVIDENCE FROM IMPLIED VOLATILITY INDICES
3. Problems of operationalizing the concept of a cost-of-living index
4. Om Økonomi, matematik og videnskabelighed - et bud på provokation
5. The name is absent
6. The voluntary welfare associations in Germany: An overview
7. Herman Melville and the Problem of Evil
8. The name is absent
9. Yield curve analysis
10. Temporary Work in Turbulent Times: The Swedish Experience