Computing optimal sampling designs for two-stage studies



Stata Technical Bulletin

39


We enter the vector of first-stage sample sizes as follows (note the transform operator is essential).

. matrix fstsamp=(6666, 1228, 144, 58)'

Assuming the objective is to fit the logistic regression model

logiti = βo + Asexi + ⅛agc,

and to minimize the variance of age, then we can obtain the optimal second-stage sample sizes.

. optfixn mort sex age, first(sex) nl(fstsamp) n2(1000) var(3)

the second stage sample sizes

group(mor
t sex)

-+---

I
I

Freq.

1

I

25

2

I

25

3

I

25

4

I
.+---

25

please check the sample sizes !

grp.yz

mort

sex

grP-z

nl

n2-pilot

1

0

0

1

6666

25

2

0

1

2

1228

25

3

1

0

1

144

25

4

1

1

2

58

25

the optimal sampling fraction(sample size) for grp_yz 1 = .089 (596)

the optimal sampling fraction(sample size) for grp_yz 2 = .164 (202)

the optimal sampling fraction(sample size) for grp_yz 3=1 (144)

the optimal sampling fraction(sample size) for grp_yz 4=1 (58)

the minimum variance for age : .00008027

Total second stage sample size =1000

Note that these results tell us that to minimize the variance of age, we need to sample all the available cases, 8.9% of controls
in stratum 1 and 16.4% of controls in stratum 2.

Example 2

This second example also uses the CASS data. Let us suppose that we wish to set up a two-stage study where at the first stage
we will collect only the patient’s operative mortality, sex, and weight, while at the second stage we will collect the following
variables only for a subset of the study subjects.

1. Age of patients when they underwent bypass surgery.

2. The angina status of the patients when they underwent bypass surgery.

3. CHF score, that is, congestive heart failure score.

4. LVEDBP, that is, left ventricular end diastolic blood pressure.

5. Urgency of the surgery (1 for urgent, 0 for nonurgent).

Let us suppose that we have a budget of £10,000 available, that the cost of collecting data on one subject is £2 at the first
stage and
£ 15 at the second stage, and that we would like to minimize the variance of LVEDBP in the logistic regression model

logiti = A + + sex, + ,⅛weight7 + Aagei + Aanginai + AclA + Alvedbpi + Asurgeryi

As before, we need to sample a few pilot second-stage observations from each stratum defined by the different levels of mortality
У and first stage covariates (sex and weight). Since first-stage covariates must be categorical, we first created a three-category
weight variable wtcat as 1 for weight < 60, 2 for 60 ≤ weight < 70, and 3 for weight ≥ 70. The first-stage sample sizes



More intriguing information

1. The Veblen-Gerschenkron Effect of FDI in Mezzogiorno and East Germany
2. From Aurora Borealis to Carpathians. Searching the Road to Regional and Rural Development
3. Dementia Care Mapping and Patient-Centred Care in Australian residential homes: An economic evaluation of the CARE Study, CHERE Working Paper 2008/4
4. What should educational research do, and how should it do it? A response to “Will a clinical approach make educational research more relevant to practice” by Jacquelien Bulterman-Bos
5. The name is absent
6. Understanding the (relative) fall and rise of construction wages
7. Using Surveys Effectively: What are Impact Surveys?
8. Comparative study of hatching rates of African catfish (Clarias gariepinus Burchell 1822) eggs on different substrates
9. The name is absent
10. Can genetic algorithms explain experimental anomalies? An application to common property resources
11. The name is absent
12. Micro-strategies of Contextualization Cross-national Transfer of Socially Responsible Investment
13. WP 48 - Population ageing in the Netherlands: Demographic and financial arguments for a balanced approach
14. Industrial Cores and Peripheries in Brazil
15. The name is absent
16. Globalization and the benefits of trade
17. The name is absent
18. Epistemology and conceptual resources for the development of learning technologies
19. The name is absent
20. Non-causality in Bivariate Binary Panel Data