The name is absent



18


Stata Technical Bulletin


STB-4


15


10


deltax


pred

Plot of Delta X^2 versus PRED

Figure 1


Figure 2


References

Hamilton, L. C. 1992. Regression with Graphics. Pacific Grove, CA: Brooks/Cole Publishing Company.

Hosmer, D. W. and S. Lemeshow. 1989. Applied Logistic Regression. New York: John Wiley & Sons.

ssi2 Bootstrap programming

Lawrence C. Hamilton, Dept. of Sociology, University of New Hampshire

Bootstrapping refers to a process of repeatedly sampling (with replacement) from the data at hand. Instead of trusting theory
to tell us about the sampling distribution of an estimator
b, we approximate that distribution empirically. Drawing B bootstrap
samples of size
n (from an original sample also size n) obtains B new estimates, each denoted b*. The bootstrap distribution
of
b* forms a basis for standard errors or confidence intervals (Efron and Tibshirani, 1986; for an introduction see Stine in Fox
and Long, 1990). This empirical approach seems most attractive in situations where the estimator is theoretically intractable, or
where the usual theory rests on untenable assumptions.

Bootstrapping requires fewer assumptions but more computing than classical methods. The January 1991 Stata News (p.6-7)
described two general bootstrap programs, bootsamp.ado and boot.ado. Help files document these programs, which provide
a relatively easy way to start bootstrapping. Even with these ready-made programs, however, users must do some programming
themselves and know exactly what they want. This can be tricky: bootstrapping is fraught with nonobvious choices and with
“obvious” solutions that don’t work. Researchers have the best chance of successful bootstrapping when they can write programs
to fit specific analytical needs. Towards this goal I reinvent the wheel below, showing the construction of several simple bootstrap
programs. Rather than being general-purpose routines like boot. ado or bootsamp. ado, these four examples are problem-specific
but illustrate a general, readily modified approach.

The first three examples expect to find raw data in a file named source.dta, with variables called X and Y. For illustration
I employ data from Zupan, 1973, on the population density (X) and air pollution levels (Y) in 21 New York counties:1

county

X

Y

ι.

New York

61703.7

.388

2.

Kings

38260.87

.213

3.

Bronx

33690.48

.295

4.

Queens

17110.09

.307

ε.

Hudson

13377.78

.209

6.

Essex

7382.813

.142

7.

Passaic

2284.946

.054

S.

Union

ειi6.εoε

.161

9.

Nassau

4660.403

.15

10.

Westchester

1921.839

.072

11.

Richmond

4034.483

.059

12.

Bergen

3648.069

.112

13.

Middlesex

1697.444

.076

14.

Fairfield

1583.113

.065

ιε.

New Haven

1149.426

.053

16.

Suffolk

1329.114

.072

17.

Rockland

905.028

.052

18.

Monmouth

781.9706

.0325

19.

Somerset

527.6873

.029



More intriguing information

1. The name is absent
2. The name is absent
3. The name is absent
4. A Theoretical Growth Model for Ireland
5. Bargaining Power and Equilibrium Consumption
6. PACKAGING: A KEY ELEMENT IN ADDED VALUE
7. The Determinants of Individual Trade Policy Preferences: International Survey Evidence
8. Pricing American-style Derivatives under the Heston Model Dynamics: A Fast Fourier Transformation in the Geske–Johnson Scheme
9. The name is absent
10. Factores de alteração da composição da Despesa Pública: o caso norte-americano
11. The name is absent
12. HACCP AND MEAT AND POULTRY INSPECTION
13. Models of Cognition: Neurological possibility does not indicate neurological plausibility.
14. Survey of Literature on Covered and Uncovered Interest Parities
15. The name is absent
16. The name is absent
17. Tastes, castes, and culture: The influence of society on preferences
18. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach
19. The name is absent
20. Revisiting The Bell Curve Debate Regarding the Effects of Cognitive Ability on Wages