Computing optimal sampling designs for two-stage studies



32


Stata Technical Bulletin


STB-58


(!weights or pweights) denoted by wŋ and cluster sequence numbers denoted by <¾. cendif follows the usual Stata practice
of assuming an fweight to stand for multiple observations with the same values for all other variables. The clusters may be
nested within the two groups or contain observations from each of the two groups, but the percentile differences will only apply
to observations from distinct clusters. If clusters are present, then the confidence intervals will be calculated assuming that the
sample was generated by sampling clusters independently from a population of clusters, rather than by sampling
N observations
independently from the total population of observations or by sampling
N1 and N2 observations from Populations A and B,
respectively. (By default, all the wŋ will be ones, and the <¾ will be in sequence from 1 to N. The difference between these
three alternatives will not matter.) We will denote by
M the number of distinct values of a difference, Y1j — Y⅛⅛, observed
between
Y values in the two samples belonging to different clusters. The difference values themselves will be denoted by
t1,... ,tjvf. For each ∕ι from 1 to
M, we define the sum of product weights of differences equal to t⅛ as

Wh =            δ{ci,ck}w1jw2k

(3)


j,k-.Ylj-Y2k=th

where δ(a,ty is 0 if a = a and 1 if a ≠ b. Given a value of θ expressed in units of Y, we can define Yζ(θ) to be Yij if i = 1
and
Yij + θ if i = 2. The sample Somers’ D of Y*{θ) with respect to X is defined as

D*(ff) =P[Y*(0)X] =


∑J^ι ∑⅛fiι <K<¾, <⅞fe) w1jw2k sign(Yυ∙ - Y2k - θ)
∑^ι k=ι S(clj,c2k) wljw2k

(4)


∑h∙.th>θ^ ∑h∙.th<e^h

Σ⅛

where D[∙ ∙] denotes the sample Somers’ D, defined by the methods of Newson (2000). Clearly, given a sample, D*(θ) is a
nonincreasing function of
θ. (Note that only between-cluster differences are included.) Figure 1 shows D*(θ) as a function of
f for differences between trunk capacities of American and foreign cars (expressed in cubic feet) in the auto data. The squares
represent the values D*(t⅛) for the observed differences t⅛. Note that
D*(θ) is discontinuous at the observed differences, and
constant in each open interval between two successive observed differences.

Figure 1. D* (O') plotted against the difference 0 in trunk space between American and foreign cars


We aim to include θ in a confidence interval for a gth percentile difference if, and only if, the sample D*(θ) is compatible
with a
population Л[У* (0)X] equal to 1 — 2t∕. The methods of Newson (2000), used by the program somersd, typically
use a transformation ζ(∙), which, for present purposes, may either be the identity, the arcsine or Fishers’
z (the hyperbolic
arctangent). The transformed sample statistic
ζ(θ) = ζ[D*(0)] is assumed to be normally distributed around the population
parameter ζ{D[y*(0)
X]}. In the present application, we assume that if Л[У*(0)|У] = 1 — 2g, then the quantity

[¢(0) - ¢(1 - 2<z)] /SE[C(0)]

(5)


has a standard Normal distribution, where SE^(0)] is the sampling standard deviation (or standard error) of ^Z>*(0)]. If we
knew the value of SE^(0)], then a 100(1 — ct)% confidence interval for a gth percentile difference might be the interval of
values of
θ for which

r1{ ¢(1 - 2<z) - za SE[f(0)] } ≤ D*(0) ≤ C1 { ¢(1 - 2<z) + za SE[f(0)] }

(6)




More intriguing information

1. The Role of Land Retirement Programs for Management of Water Resources
2. Crime as a Social Cost of Poverty and Inequality: A Review Focusing on Developing Countries
3. Population ageing, taxation, pensions and health costs, CHERE Working Paper 2007/10
4. The name is absent
5. Who is missing from higher education?
6. SOCIOECONOMIC TRENDS CHANGING RURAL AMERICA
7. The name is absent
8. Popular Conceptions of Nationhood in Old and New European
9. Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words
10. Natural hazard mitigation in Southern California
11. Poverty transition through targeted programme: the case of Bangladesh Poultry Model
12. The Prohibition of the Proposed Springer-ProSiebenSat.1-Merger: How much Economics in German Merger Control?
13. The name is absent
14. Review of “The Hesitant Hand: Taming Self-Interest in the History of Economic Ideas”
15. Top-Down Mass Analysis of Protein Tyrosine Nitration: Comparison of Electron Capture Dissociation with “Slow-Heating” Tandem Mass Spectrometry Methods
16. The name is absent
17. The name is absent
18. Strategic Effects and Incentives in Multi-issue Bargaining Games
19. Banking Supervision in Integrated Financial Markets: Implications for the EU
20. The name is absent