Computing optimal sampling designs for two-stage studies



38


Stata Technical Bulletin


STB-58


Notice that each of the optfixn, opt bud, and optprec commands have an option coding, which can be used if one is
sure of the order in which the vector of first-stage sample sizes or prevalences should be entered. This option results in coding
being automatically called from inside the optimal sampling command. Since this results in the creation of variables named
grp_yz and grp_z, an error message will be generated if one already has variables with these names.

Options

V irst Vmrrlitt) specifies the first-stage covariates.

nl Vecarmee) specifies the vector of first-stage sample sizes for each stratum.

prev Vecarmee) specifies the vector of prevalences for each stratum.

n2(#) specifies the second-stage sample sizes (used only with optfixn).

b(#) specifies the available budget (used only with opt bud).

cl(#) specifies the cost per observation at the first stage (used with optbud and optprec).

c2(#) specifies the cost per observation at the second stage (used with optbud and optprec).

var(#) specifies the position in the logistic regression model of the covariate whose variance is to be minimized (that is,
optimized). For example, in the simple model
Y = bo + δι-X"ι + b%Xw, if we want to minimize the variance of Xi, then
var = 2.

prec(#) specifies the desired precision, that is, the variance (used only with optprec).

coding(#) is a logical flag; the default of 0 (that is, false) means that prior to calling optfixn, optbud, or optprec one must
have run the coding command.

Example 1

The following example is from CASS (Coronary Artery Surgery Study) and appears in Reilly (1996). This study collected
data on the operative mortality and various risk factors for 8,096 subjects. Let us suppose that at the first stage we have only
mortality status
Y and sex Z as specified in the table below, and that it has been agreed to record the age for a subsample of
1,000 subjects in order to estimate the sex-adjusted odds ratio for age. The example is fictitious as we do have all the covariates
on all subjects, but for illustrative purposes we ignore this information (that is, set values to missing). In order to compute
optimal sample sizes, we require pilot data in all of the strata of the table, and so we “sampled” (reset the missing values to
the actual age values) for a randomly selected 25 observations from each stratum. The resulting dataset of 100 observations is
available as pilotcas accompanying this insert.

male female

_______Y Z = O Z=I

alive У = 0   6,666   1,228

deceased У = 1    144     58

We start by computing the optimal allocation for a second-stage sample of 1,000.

. use pilotcas

. coding mort sex

grp.yz

mort

sex

g IT-Z

nobs

1

0

0

1

25

2

0

1

2

25

3

1

0

1

25

4

1

1

2

25

for functions requiring first stage sample sizes∕prevalences
enter these in the order of grp_yz

The coding function tells us that we have to enter the vector of first-stage sample sizes in the order specified in the following
table.

First element
Second element
Third element
Fourth element


grp-yz = 1

grp_yz = 2

grp_yz = 3

grp_yz = 4


first-stage sample sizes for living (mort = 0) males (sex = 0)
first-stage sample sizes for living females

first-stage sample sizes for deceased (mort = 1) males

first-stage sample sizes for deceased females



More intriguing information

1. The name is absent
2. The name is absent
3. Public-Private Partnerships in Urban Development in the United States
4. Towards a framework for critical citizenship education
5. Fighting windmills? EU industrial interests and global climate negotiations
6. The name is absent
7. Consumer Networks and Firm Reputation: A First Experimental Investigation
8. Spectral calibration of exponential Lévy Models [1]
9. Evolving robust and specialized car racing skills
10. Surveying the welfare state: challenges, policy development and causes of resilience
11. The name is absent
12. The name is absent
13. Return Predictability and Stock Market Crashes in a Simple Rational Expectations Model
14. THE WELFARE EFFECTS OF CONSUMING A CANCER PREVENTION DIET
15. Improvements in medical care and technology and reductions in traffic-related fatalities in Great Britain
16. Unilateral Actions the Case of International Environmental Problems
17. Eigentumsrechtliche Dezentralisierung und institutioneller Wettbewerb
18. The name is absent
19. The effect of classroom diversity on tolerance and participation in England, Sweden and Germany
20. The name is absent