20
Stata Technical Bulletin
STB-20
include the true mean 95% of the time. If the calculation results in an interval that is too wide, however, that too-wide interval
will include the mean more than 95% of the time. If it is too narrow, that interval will include the mean less than 95% of the
time.
Thus, we could take some distribution—we will use the log normal—and draw samples from it. We could calculate the
mean and perform the classic t test, recording whether the true mean (which we know) lies in the interval. If we do this enough
times, we can answer the question, at least with respect to the log-normal distribution. (A variable z is log-normally distributed
if z = e“ and и is normally distributed. If и has mean μ and variance σ2, then h has median, not mean, eμ; the mean of z is
eμeσh2.)
Let us begin by constructing a data set of means and variances for 100-observation samples of a log-normal distribution:
program define Insim
version 3.1
postfile mean var using results, replace
quietly -(
local i = 1
while 'i' <= 10000 {
drop .all
set obs 100
gen z = exp(invnorm(uniform()))
summarize z
post .result(3) .result(4)
local i="iz+l
ɪ
postclos
end
The heart of this program are the three lines in the middle, the first two of which are
set obs 100
gen z = exp(invnorm(uniform()))
and correspond to drawing our sample. The third line
summarize z
calculates our results. summarize, in addition to displaying summary statistics (which we suppress with the quietly { }
surrounding the code), stores the sample mean in .result (3) and variance in .result (4). The new post command allows us
to save those results. Prior to using post, however, the postfile must be declared. This we did at the outset of our program,
declaring that we would be saving two results which we would call mean and var (for mean and variance) in a data set called
results.dta. Then, when we are all done, we must inform post with the postclos command.
The rest of the program was merely concerned with performing the experiment 10,000 times:
local i = 1
while 'i' <= 10000 -(
local i='iz+l
The results of running our program are
. Insim
. describe
Contains data from results.dta
Obs: 10000 (max= 19997)
Vars: 2 (max= 99)
Width: 8 (max= 200)
1. mean float %9.0g
2. var float %9.0g
Sorted by:
. summarize
Variable ∣ |
Obs |
Mean |
Std. Dev. |
Min |
Max |
________ _q__ . mean I |
10000 |
1.648349 |
.2165937 |
1.022719 |
4.280587 |
var I |
10000 |
4.720659 |
6.208903 |
.6215334 |
450.1076 |