The name is absent



Stata Technical Bulletin

19


This method suffers from two disadvantages. First, it requires that the results of the individual simulations be displayed on the
screen which is, at best, inelegant. Second, the simulation cannot itself be logged, meaning that those of us who keep notebooks
of printed logs backing up important results are prevented from doing so.

An alternative programming approach does not have those problems and is therefore widely used in the ado-files we at
Stata Corp. write. It might be called the append method because the approach amounts to adding observations, one at a time, to
a data set being maintained on disk:

create a temporary data set

repeat {

draw a sample

make a calculation

use the temporary data set

append the calculated result(s) to the end of the data

resave the temporary data set

}

erase the temporary data set

This approach is used in Stata’s boot and bsqreg commands; see [5s] boot and [5s] qreg. While not suffering from the
disadvantages of the display-and-infile method, it has its own disadvantage—it is slow.

There is a third way simulations could be programmed in Stata. It could be called the buffered-append method because,
while it is basically the append method, rather than adding observations one at a time to the data, results are temporarily buffered
in memory and then, periodically, the buffers are used to update the data:

create a temporary data set
repeat {

draw a sample

make a calculation

save the results in memory somewhere
when memory is full {
use the temporary data set
append the buffered results to the data
resave the temporary data set

}

}

use the temporary data set

This method has the potential to be faster because the costly use and resave occurs less often. The post commands do this. In
outline, their use is

postfile ... using ...

repeat {

draw a sample
make a calculation
post ...

}

postclos

use the data set

Example

Let us consider the coverage of the 95%, t-based confidence interval for the mean applied to log-normal populations. To
explain, the central limit theorem assures us that, asymptotically, distributions of means are normally distributed regardless of the
underlying distribution of the population. In finite samples, less can be said, but if the underlying population follows a normal
distribution and if one uses estimates of the mean and standard deviation, the mean will follow a
d distribution with n — 1
degrees of freedom. (Note that as n ~÷ ∞, the
a approaches the normal, so the finite-sample result is consistent with the central
limit theorem.)

In real life, people often apply confidence intervals calculated on the basis of t distributions to means calculated on data that
are far from normal. Do they, on average, nevertheless generate correct predictions? That is, a 95% confidence interval should



More intriguing information

1. Outsourcing, Complementary Innovations and Growth
2. Categorial Grammar and Discourse
3. The name is absent
4. A Rare Case Of Fallopian Tube Cancer
5. On the Desirability of Taxing Charitable Contributions
6. The name is absent
7. International Financial Integration*
8. The name is absent
9. KNOWLEDGE EVOLUTION
10. Gerontocracy in Motion? – European Cross-Country Evidence on the Labor Market Consequences of Population Ageing
11. The name is absent
12. Orientation discrimination in WS 2
13. The name is absent
14. The name is absent
15. ISO 9000 -- A MARKETING TOOL FOR U.S. AGRIBUSINESS
16. Does Presenting Patients’ BMI Increase Documentation of Obesity?
17. Program Semantics and Classical Logic
18. EXPANDING HIGHER EDUCATION IN THE U.K: FROM ‘SYSTEM SLOWDOWN’ TO ‘SYSTEM ACCELERATION’
19. The name is absent
20. Improving Business Cycle Forecasts’ Accuracy - What Can We Learn from Past Errors?