Stata Technical Bulletin
13
Example
Examples of using centile to estimate the 5th, 50th and 95th centiles of the variable price (car price in dollars) in the
Stata example file auto.dta are given below.
. format price 7.S.2f
. summarize price, detail
Price
Percentiles |
Smallest | |||
Γ/. |
3291.00 |
3291.00 | ||
57. |
3748.00 |
3299.00 | ||
107. |
3895.00 |
3667.00 |
Obs |
74 |
257. |
4195.00 |
3748.00 |
Sum of Wgt. |
74 |
507. |
5006.50 |
Mean |
6165.26 | |
Largest |
Std. Dev. |
2949.50 | ||
757. |
6342.00 |
13466.00 | ||
907. |
11385.00 |
13594.00 |
Variance |
8699525.97 |
957. |
13466.00 |
14500.00 |
Skewness |
1.65 |
997. |
15906.00 |
15906.00 |
Kurtosis |
4.82 |
. centile price, centile(5 50 95)
— Binom. Interp. —
Variable ∣ Obs Percent Centile [95% Conf. Interval]
---------+-------------------------------------------------------------
price I I |
74 |
5 50 95 |
3727.75 5006.50 13498.00 |
3291.23 4593.57 11061.53 |
3914.16 5717.90 15865.30 |
. centile price |
, c(5 |
50 95) cci |
— Binomial |
Exact — | |
Variable I |
Obs |
Percent |
Centile |
[957. Conf. |
Interval] |
— | |||||
price I |
74 |
5 |
3727.75 |
3291.00 |
3955.00 |
I |
50 |
5006.50 |
4589.00 |
5719.00 | |
I |
95 |
13498.00 |
10372.00 |
15906.00 |
Notice that Stata’s summarize formula for interpolation of centiles gives somewhat different results than that used by centile
(see Formulæ). Also, the confidence limits with the cci option in force are defined to fall exactly on sample values and are
slightly wider than those with the default (nocci) option.
. centile price, c(5 50 95) normal
— Normal, based on observed centiles —
Variable I |
Obs |
Percent |
Centile |
[957. Conf |
. Interval] |
— | |||||
price I |
74 |
5 |
3727.75 |
3211.19 |
4244.31 |
I |
50 |
5006.50 |
4096.68 |
5916.32 | |
I |
95 |
13498.00 |
5426.81 |
21569.19 | |
. centile price |
`t c(5 |
50 95) meansd | |||
— |
Normal, based |
on mean and |
std. dev. — | ||
Variable ∣ |
Obs |
Percent |
Centile |
[957. Conf |
. Interval] |
— | |||||
price I |
74 |
5 |
1313.77 |
278.93 |
2348.61 |
I |
50 |
6165.26 |
5493.24 |
6837.27 | |
I |
95 |
11016.75 |
9981.90 |
12051.59 |
. sktest price
Skewness/Kurtosis tests for Normality
------- joint -------
Variable ∣ Pr(Skewness) Pr(Kurtosis) adj chi-sq(2) Pr(chi-sq)
----------+--------------------------------------------------------
price I 0.000 0.013 21.77 0.0000
The above two examples assume that price is normally distributed. With the normal option, the centile estimates are by
definition the same as before. The confidence intervals for the 5th and 50th centiles are similar to the previous ones, but the
interval for the 95th centile is very different. The results using the meansd option are also very different from both previous sets
of estimates. The sktest (see [5s] sktest) test of skewness and kurtosis reveals that price is definitely not normally distributed,
so the normal assumption is not reasonable and the normal and meansd options are not appropriate for such data. We rely on the
results from the default choice, which doesn’t assume normality. If the data are normally distributed, however, the precision of
the estimated centiles and their confidence intervals will be ordered (best) meansd > normal > (worst) [default]. The normal
option is useful when we really do want empirical centiles (i.e., centiles based on sample order statistics rather than on the mean
and s.d.) but are willing to assume normality.