34
Stata Technical Bulletin
STB-58
probability of only 90% under the Conover method, compared to 94% under the cendif method. The two methods show little
or no difference, either in geometric mean confidence interval width or in coverage probability, when the variances are equal
and the Conover assumption is therefore true. From the results so far, I would recommend the cendif method as an improved
version of the Conover method, offering insurance against the possibility that the Conover assumption is wildly wrong, at little
or no price in performance if the Conover assumption is right. However, I am planning to carry out further simulations on the
two methods and to report the results in due course.
Example 1
In the auto data, we compare weights of American and foreign cars. We use cid and cendif to estimate the median
difference:
. cid weight,by(foreign) median unpaired
Rank-based confidence interval for difference in medians by foreign
Variable ∣ Obs Estimate K [95% Conf. Interval]
---------+-------------------------------------------------------------
weight I 74 1095 406 720 1350
. cendif weight,by(foreign)
Y-variable: weight (Weight (lbs.))
Grouped by: foreign (Car type)
Group numbers:
Car type ∣ Freq. Percent Cum.
------------+-----------------------------------
Domestic ∣ 52 70.27 70.27
Foreign I 22 29.73 100.00
------------+-----------------------------------
Total I 74 100.00
Transformation: Fisherzs z
95% confidence interval(s) for percentile difference(s)
between values of weight in first and second groups:
Percent Pctl-Dif Minimum Maximum
rl 50 1095 750 1330
We note that the median difference in weight is 1,095 pounds according to both cid and cendif. However, the confidence
limits given by cendif are 750 and 1,330 pounds, whereas the confidence limits given by cid are 720 and 1,350 pounds. This
is because foreign cars are fewer in number and less variable in weight than American cars, and cid assumes equal variances,
whereas cendif allows for unequal variances. If we carry out equal-variance and unequal-variance t tests (not shown), we find
a similar difference in the width of the confidence limits for the mean difference.
cendif can also calculate confidence intervals for percentiles other than medians. These contain information about the
degree of overlap between the two populations. Here, we estimate the 25th, 50th, and 75th percentile differences, using the
centile option.
. cendif weight,by(foreign) ce(25 50 75)
Y-variable: weight (Weight (lbs.))
Grouped by: foreign (Car type)
Group numbers:
Car type ∣ Freq. Percent Cum.
------------+-----------------------------------
Domestic ∣ 52 70.27 70.27
Foreign I 22 29.73 100.00
------------+-----------------------------------
Total I 74 100.00
Transformation: Fisherzs z
95% confidence interval(s) for percentile difference(s)
between values of weight in first and second groups:
Percent |
Pctl-Dif |
Minimum |
Maximum | |
rl |
25 |
485 |
100 |
810 |
r2 |
50 |
1095 |
750 |
1330 |
r3 |
75 |
1555 |
1320 |
1790 |
If we want to estimate percentile ratios of weight, rather than percentile differences, then we simply take logs and use the
eform option.
. gene logwt=log(weight)
. cendif Iogwt,by(foreign) ce(25 50 75) eform