The name is absent



Stata Technical Bulletin

21


We now have 10,000 means and variances from independent 100-observation log-normal data sets. On a 25MHz 486, this took
about 14 minutes.

Our log-normal population was based on z = ww with и ~ N(0,1), so the true mean of z is e1Z2 ≈ 1.6487213. Let Xj
and s2 represent the calculated mean and variance of the Jth sample. Then the 95% confidence bounds that would be calculated

by a standard t test are Xj ± t.95
that the mean is e1Z2:


ʌ/s2∕100. Making these calculations, we can mark each sample as rejecting or not rejecting

. gen se = sqrt(var∕100)

. gen lower = mean - invt(100-1, .95)*se

. gen upper = mean + invt(100-l, .95)*se

. gen accept = lower<exp(l∕2) & exp(l∕2)<upper

. count if accept
9198

Thus, the coverage of our 95% test is only 92%—the confidence intervals are too narrow. We performed this experiment “only”
10,000 times, so we should verify that the observed 92% differs from 95% due to more than chance:

. cii IOOOO 9198

— Binomial Exact —

Variable Obs         Mean Std. Err.       [95% Conf. Interval]

---------+-------------------------------------------------------------

I 10000         .9198     .002716         .9142983    .9250475

A 95% confidence interval for the coverage is .914 to .925. (Moreover, given a probability of .95, the chances of observing
9198 or fewer successes in 10,000 trials is virtually 0, as you can verify for yourself by typing ‘bitest 10000 9188 .95’.

So, if the standard t test performs poorly, what about the central-limit-theorem result? Rather than using f.95, what if we
use Z95
? The result will be worse: t intervals are wider than normal intervals and we have already determined that the intervals
are too narrow. It will not, however, make much difference since t.95 ≈ 1.97 for 99 degrees of freedom whereas Z95 ≈ 1.96.
For the record:

. drop lower upper accept

. gen lower = mean - 1.96*se

. gen upper = mean + 1.96*se

. gen accept = lower<exp(l∕2) & exp(l∕2)<upper

. count if accept

9169

Performance

As I find myself running simulations more and more these days, I went to the effort of timing the display-and-infile, append,
and buffered-append (post) alternatives. The good news is that buffered-append is substantially faster than the append method.
The bad news is that display-and-infile is still the fastest way to run simulations in Stata:

replications

display and infile     append     buffered append

(seconds)        (seconds)        (seconds)

100

500

1000

5.22             11.81            8.57

25.71             73.77           40.76

51.13            155.99           81.62

The timings above were performed on a 25MHz 486 running Intercooled Stata under DOS.

postfile also provides an every() option which controls how often buffers are flushed. The documentation above
recommends you never specify this option. Using the same simulation with 500 replications, I performed timings for different

values of every():

every()

time (sec.)

every ()

time (sec.)

2

64.87

32

40.81

4

49.49

64

41.30

8

43.55

128

43.77

16

41.96

200

46.41

Between every (16) and every (64) the function is virtually flat.



More intriguing information

1. Inhimillinen pääoma ja palkat Suomessa: Paluu perusmalliin
2. Backpropagation Artificial Neural Network To Detect Hyperthermic Seizures In Rats
3. The name is absent
4. ESTIMATION OF EFFICIENT REGRESSION MODELS FOR APPLIED AGRICULTURAL ECONOMICS RESEARCH
5. The open method of co-ordination: Some remarks regarding old-age security within an enlarged European Union
6. Regional science policy and the growth of knowledge megacentres in bioscience clusters
7. The name is absent
8. Ein pragmatisierter Kalkul des naturlichen Schlieβens nebst Metatheorie
9. The name is absent
10. Methods for the thematic synthesis of qualitative research in systematic reviews