Stata Technical Bulletin
19
20. Morris 683.7607 .0316
21. Mercer 1228.07 .049
A Simple Bootstrap
example 1. ado performs data resampling, the simplest kind of bootstrap. From an original sample with n cases, we draw
bootstrap samples (also size n) by random sampling with replacement. This is accomplished by letting Stata’s random-number
function uniform () choose the observation numbers (explicit subscripts) of cases included in each bootstrap sample. As written,
examplel. ado executes B=1,000 iterations—adequate for standard-error estimation but probably too few for confidence intervals.
Number of iterations, variable names, and other features can easily be changed or generalized. Comment lines (beginning with
*) briefly explain what the program is doing.
program define examplel
* The first line tells Stata we are going to define a program
* named "example1" . This program bootstraps the mean of a
* variable named "X" , from a dataset called "source.dta" .
* To apply examplel.ado to your own data:
*
* . use <yourfile.dta>
* . rename <yourvar> X
* . keep if X~=.
* . save source, replace
*
set more 1
* Tells Stata to wait only 1 second before scrolling a full
* screen. Default: waits for keyboard input before scrolling,
drop .all
capture erase bootdatl.log
set maxobs 2000
* For confidence intervals or other applications using
* bootstrap-distribution tail percentiles, at least B=2,000
* bootstrap iterations are needed. Simpler purposes, including
* standard error estimation, require substantially fewer
* iterations.
* If source.dta contains > 2,000 cases, set maxobs higher,
log using bootdatl.log
log off
* Log file bootdatl.log will record bootstrap results,
set seed Illl
* Sets the random-generator seed. We can repeat the random
* sequence later by using the same seed, or avoid repeating it
* by choosing a different seed (any large odd number).
macro define .bsample 1
* .bsample counts the number of bootstrap samples.
* .bsample is the name of this macro; ⅝.bsample refers to
* the macro's current contents:
while ⅝.bsample<1001 -(
quietly use source.dta, clear
quietly drop if X==.
quietly generate XX=X[int(.N*uniform())+1]
* Variable XX holds randomly resampled X values. The
* expression int(.N*uniform())+l generates random integers
* from 1 through _N (sample size).
quietly summarize XX
log on
display ⅝.bsample
display .result(3)
display
log off
* For each bootstrap sample, the log file contains the
* sample number and mean of XX.
macro define .bsample=⅝.bsample+l
>
* Curly brackets enclose "while ⅝.bsample<1001" loop,
log close
drop .all
infile bsample bmean using bootdatl.log
label variable bsample "bootstrap sample number"
label variable bmean "sample mean of X"
label data "bootstrap mean"