The name is absent

Stata Technical Bulletin

We now have 10,000 means and variances from independent 100-observation log-normal data sets. On a 25MHz 486, this took
about 14 minutes.

Our log-normal population was based on z = ww with и ~ N(0,1), so the true mean of z is e¹Z² ≈ 1.6487213. Let Xj
and s² represent the calculated mean and variance of the Jth sample. Then the 95% confidence bounds that would be calculated

by a standard t test are Xj ± t.₉₅that the mean is e¹Z²:

ʌ/s²∕100. Making these calculations, we can mark each sample as rejecting or not rejecting

. gen se = sqrt(var∕100)

. gen lower = mean - invt(100-1, .95)*se

. gen upper = mean + invt(100-l, .95)*se

. gen accept = lower<exp(l∕2) & exp(l∕2)<upper

. count if accept
9198

Thus, the coverage of our 95% test is only 92%—the confidence intervals are too narrow. We performed this experiment “only”
10,000 times, so we should verify that the observed 92% differs from 95% due to more than chance:

. cii IOOOO 9198

— Binomial Exact —

Variable ∣ Obs Mean Std. Err. [95% Conf. Interval]

---------₊-------------------------------------------------------------

I 10000 .9198 .002716 .9142983 .9250475

A 95% confidence interval for the coverage is .914 to .925. (Moreover, given a probability of .95, the chances of observing
9198 or fewer successes in 10,000 trials is virtually 0, as you can verify for yourself by typing ‘bitest 10000 9188 .95’.

So, if the standard t test performs poorly, what about the central-limit-theorem result? Rather than using f.₉₅, what if we
use Z₉₅ ? The result will be worse: t intervals are wider than normal intervals and we have already determined that the intervals
are too narrow. It will not, however, make much difference since t.₉₅ ≈ 1.97 for 99 degrees of freedom whereas Z₉₅ ≈ 1.96.
For the record:

. drop lower upper accept

. gen lower = mean - 1.96*se

. gen upper = mean + 1.96*se

. gen accept = lower<exp(l∕2) & exp(l∕2)<upper

. count if accept

9169

Performance

As I find myself running simulations more and more these days, I went to the effort of timing the display-and-infile, append,
and buffered-append (post) alternatives. The good news is that buffered-append is substantially faster than the append method.
The bad news is that display-and-infile is still the fastest way to run simulations in Stata:

replications

display and infile append buffered append

(seconds) (seconds) (seconds)

100

500

1000

5.22 11.81 8.57

25.71 73.77 40.76

51.13 155.99 81.62

The timings above were performed on a 25MHz 486 running Intercooled Stata under DOS.

postfile also provides an every() option which controls how often buffers are flushed. The documentation above
recommends you never specify this option. Using the same simulation with 500 replications, I performed timings for different

values of every():	every()	time (sec.)	every ()	time (sec.)
	2	64.87	32	40.81
	4	49.49	64	41.30
	8	43.55	128	43.77
	16	41.96	200	46.41

Between every (16) and every (64) the function is virtually flat.

More intriguing information

1. The name is absent
2. Update to a program for saving a model fit as a dataset
3. Bird’s Eye View to Indonesian Mass Conflict Revisiting the Fact of Self-Organized Criticality
4. Learning and Endogenous Business Cycles in a Standard Growth Model
5. O funcionalismo de Sellars: uma pesquisa histδrica
6. HACCP AND MEAT AND POULTRY INSPECTION
7. Incorporating global skills within UK higher education of engineers
8. Response speeds of direct and securitized real estate to shocks in the fundamentals
9. Industrial districts, innovation and I-district effect: territory or industrial specialization?
10. The name is absent