14
Stata Technical Bulletin
STB-8
Formul∣
Default case. I basically use the method of Mood and Graybill (1963, 408). Let x1 ≤ x2 ≤ ∙ ∙ ∙ ≤ xn be a sample of
size n arranged in ascending order. Denote the estimated gth centile of the æ’s as cq. We require that 0 < q < 100. Let
R = (n + 1)g∕100 have integer part r and fractional part f, that is, r = int(R) and f = R—r. (If R is itself an integer, then
r = R and f = 0.) Note that 0 ≤ r ≤ n. For convenience, define x∣> = x1 and xn+∣ = xn. Then Cq is estimated by
cq = xr + f × (xr+ι - xr),
that is, cq is a weighted average of xr and a⅛+ι. Loosely speaking, a (conservative) p% confidence interval for Cq involves
finding the observations ranked t and и which correspond respectively to the a = (100 — p)∕200 and 1 — a quantiles of a
binomial distribution with parameters n and g∕100, i.e., B(n,g∕100). More precisely, define the )th value (i = 0,... ,n) of
the cumulative binomial distribution function to be Fi = P(X ≤ г), where X has distribution B(n,g∕100). For convenience,
let Γ-∣ = 0 and Fn+1 = 1. Then i is found such that Ft ≤ a and Ft+1 > a, and и is found such that I--F1u ≤ a and
1 - P∙u-ι > a.
With the cci option in force, the (conservative) confidence interval is (2⅛+1,2⅛+1) and its actual coverage is Fu — Ft.
The default case uses linear interpolation on the Fi as follows. Let
g = (a - Ft)/(Ft+1 - Ft),
h = [α - (1 - Fu)]/[(1 - Fu) -(1- .F1a.1)]
= (ɑ — 1 + Fu)/( F1u-1 — Fu).
Then the interpolated lower and upper confidence limits (cqL,cqu) for Cq are
CqL = Xt+1 +g× (≈⅛+2 - ≈⅛+1)
CqU — a'u ∣ ∣ h × (j'u∣∣ Xu)∙
For example, suppose we want a 95% confidence interval for the median of a sample of size 13. So n = 13, q = 50, p = 95,
α = .025, .R = 14 × 50/100 = 7, f = 0. The median is therefore the 7th observation. Some example data xi and the values of
Fi are as follows:
i |
____Fi |
I-Fi |
Xi |
i |
__R |
l~Fi |
Xi |
0^^ |
0.0001 |
0.9999 |
— |
~1~ |
0.7095 |
0.2905 |
~~33 |
1 |
0.0017 |
0.9983 |
5 |
8 |
0.8666 |
0.1334 |
37 |
2 |
0.0112 |
0.9888 |
7 |
9 |
0.9539 |
0.0461 |
45 |
3 |
0.0461 |
0.9539 |
10 |
10 |
0.9888 |
0.0112 |
59 |
4 |
0.1334 |
0.8666 |
15 |
11 |
0.9983 |
0.0017 |
77 |
5 |
0.2905 |
0.7095 |
23 |
12 |
0.9999 |
0.0001 |
104 |
6 |
0.5000 |
0.5000 |
28 |
13 |
1.0000 |
0.0000 |
211 |
The median is x-i = 33. Also, F2 < .025 and F3 > |
.025 so |
t = |
2; 1 - |
-F110 ≤ .025 and 1 - Fq > .025 so и = 10. The |
(c50L,c50σ) = (x3,x11) = (10,77),
with actual coverage P10 — F2 = .9888 — .0112 = .9776 (97.8% confidence). For the interpolation calculation, we have
g = (.025 - .0112)/(.0461 - .0112) = .395,
h = (.025 - 1 + .9888)/(.0998 - .9539) = .395.
So
⅛ol = X3 + ∙395 × (a?4 — Ж3) = 10 + .395 × 5 = 11.98,
<⅛(jγ = xιι — ∙395 × (жц — жю) = 77 — .395 × 18 = 69.89.
normal case. The value of cq is as above. Its s.e. is given by the formula
sq = ʌ/g(lθθ — q)! [100nZ(cq; x, s)j