Stata Technical Bulletin
19
This method suffers from two disadvantages. First, it requires that the results of the individual simulations be displayed on the
screen which is, at best, inelegant. Second, the simulation cannot itself be logged, meaning that those of us who keep notebooks
of printed logs backing up important results are prevented from doing so.
An alternative programming approach does not have those problems and is therefore widely used in the ado-files we at
Stata Corp. write. It might be called the append method because the approach amounts to adding observations, one at a time, to
a data set being maintained on disk:
create a temporary data set
repeat {
draw a sample
make a calculation
use the temporary data set
append the calculated result(s) to the end of the data
resave the temporary data set
}
erase the temporary data set
This approach is used in Stata’s boot and bsqreg commands; see [5s] boot and [5s] qreg. While not suffering from the
disadvantages of the display-and-infile method, it has its own disadvantage—it is slow.
There is a third way simulations could be programmed in Stata. It could be called the buffered-append method because,
while it is basically the append method, rather than adding observations one at a time to the data, results are temporarily buffered
in memory and then, periodically, the buffers are used to update the data:
create a temporary data set
repeat {
draw a sample
make a calculation
save the results in memory somewhere
when memory is full {
use the temporary data set
append the buffered results to the data
resave the temporary data set
}
}
use the temporary data set
This method has the potential to be faster because the costly use and resave occurs less often. The post commands do this. In
outline, their use is
postfile ... using ...
repeat {
draw a sample
make a calculation
post ...
}
postclos
use the data set
Example
Let us consider the coverage of the 95%, t-based confidence interval for the mean applied to log-normal populations. To
explain, the central limit theorem assures us that, asymptotically, distributions of means are normally distributed regardless of the
underlying distribution of the population. In finite samples, less can be said, but if the underlying population follows a normal
distribution and if one uses estimates of the mean and standard deviation, the mean will follow a d distribution with n — 1
degrees of freedom. (Note that as n ~÷ ∞, the a approaches the normal, so the finite-sample result is consistent with the central
limit theorem.)
In real life, people often apply confidence intervals calculated on the basis of t distributions to means calculated on data that
are far from normal. Do they, on average, nevertheless generate correct predictions? That is, a 95% confidence interval should