Update to a program for saving a model fit as a dataset



Stata Technical Bulletin

39


We enter the vector of first-stage sample sizes as follows (note the transform operator is essential).

. matrix fstsamp=(6666, 1228, 144, 58)'

Assuming the objective is to fit the logistic regression model

logiti = βo + Asexi + ⅛agc,

and to minimize the variance of age, then we can obtain the optimal second-stage sample sizes.

. optfixn mort sex age, first(sex) nl(fstsamp) n2(1000) var(3)

the second stage sample sizes

group(mor
t sex)

-+---

I
I

Freq.

1

I

25

2

I

25

3

I

25

4

I
.+---

25

please check the sample sizes !

grp.yz

mort

sex

grP-z

nl

n2-pilot

1

0

0

1

6666

25

2

0

1

2

1228

25

3

1

0

1

144

25

4

1

1

2

58

25

the optimal sampling fraction(sample size) for grp_yz 1 = .089 (596)

the optimal sampling fraction(sample size) for grp_yz 2 = .164 (202)

the optimal sampling fraction(sample size) for grp_yz 3=1 (144)

the optimal sampling fraction(sample size) for grp_yz 4=1 (58)

the minimum variance for age : .00008027

Total second stage sample size =1000

Note that these results tell us that to minimize the variance of age, we need to sample all the available cases, 8.9% of controls
in stratum 1 and 16.4% of controls in stratum 2.

Example 2

This second example also uses the CASS data. Let us suppose that we wish to set up a two-stage study where at the first stage
we will collect only the patient’s operative mortality, sex, and weight, while at the second stage we will collect the following
variables only for a subset of the study subjects.

1. Age of patients when they underwent bypass surgery.

2. The angina status of the patients when they underwent bypass surgery.

3. CHF score, that is, congestive heart failure score.

4. LVEDBP, that is, left ventricular end diastolic blood pressure.

5. Urgency of the surgery (1 for urgent, 0 for nonurgent).

Let us suppose that we have a budget of £10,000 available, that the cost of collecting data on one subject is £2 at the first
stage and
£ 15 at the second stage, and that we would like to minimize the variance of LVEDBP in the logistic regression model

logiti = A + + sex, + ,⅛weight7 + Aagei + Aanginai + AclA + Alvedbpi + Asurgeryi

As before, we need to sample a few pilot second-stage observations from each stratum defined by the different levels of mortality
У and first stage covariates (sex and weight). Since first-stage covariates must be categorical, we first created a three-category
weight variable wtcat as 1 for weight < 60, 2 for 60 ≤ weight < 70, and 3 for weight ≥ 70. The first-stage sample sizes



More intriguing information

1. Federal Tax-Transfer Policy and Intergovernmental Pre-Commitment
2. The name is absent
3. PEER-REVIEWED FINAL EDITED VERSION OF ARTICLE PRIOR TO PUBLICATION
4. The name is absent
5. The Composition of Government Spending and the Real Exchange Rate
6. POWER LAW SIGNATURE IN INDONESIAN LEGISLATIVE ELECTION 1999-2004
7. AN ECONOMIC EVALUATION OF COTTON AND PEANUT RESEARCH IN SOUTHEASTERN UNITED STATES
8. PERFORMANCE PREMISES FOR HUMAN RESOURCES FROM PUBLIC HEALTH ORGANIZATIONS IN ROMANIA
9. Perceived Market Risks and Strategic Risk Management of Food Manufactures: Empirical Results from the German Brewing Industry
10. The name is absent