Update to a program for saving a model fit as a dataset



Stata Technical Bulletin

39


We enter the vector of first-stage sample sizes as follows (note the transform operator is essential).

. matrix fstsamp=(6666, 1228, 144, 58)'

Assuming the objective is to fit the logistic regression model

logiti = βo + Asexi + ⅛agc,

and to minimize the variance of age, then we can obtain the optimal second-stage sample sizes.

. optfixn mort sex age, first(sex) nl(fstsamp) n2(1000) var(3)

the second stage sample sizes

group(mor
t sex)

-+---

I
I

Freq.

1

I

25

2

I

25

3

I

25

4

I
.+---

25

please check the sample sizes !

grp.yz

mort

sex

grP-z

nl

n2-pilot

1

0

0

1

6666

25

2

0

1

2

1228

25

3

1

0

1

144

25

4

1

1

2

58

25

the optimal sampling fraction(sample size) for grp_yz 1 = .089 (596)

the optimal sampling fraction(sample size) for grp_yz 2 = .164 (202)

the optimal sampling fraction(sample size) for grp_yz 3=1 (144)

the optimal sampling fraction(sample size) for grp_yz 4=1 (58)

the minimum variance for age : .00008027

Total second stage sample size =1000

Note that these results tell us that to minimize the variance of age, we need to sample all the available cases, 8.9% of controls
in stratum 1 and 16.4% of controls in stratum 2.

Example 2

This second example also uses the CASS data. Let us suppose that we wish to set up a two-stage study where at the first stage
we will collect only the patient’s operative mortality, sex, and weight, while at the second stage we will collect the following
variables only for a subset of the study subjects.

1. Age of patients when they underwent bypass surgery.

2. The angina status of the patients when they underwent bypass surgery.

3. CHF score, that is, congestive heart failure score.

4. LVEDBP, that is, left ventricular end diastolic blood pressure.

5. Urgency of the surgery (1 for urgent, 0 for nonurgent).

Let us suppose that we have a budget of £10,000 available, that the cost of collecting data on one subject is £2 at the first
stage and
£ 15 at the second stage, and that we would like to minimize the variance of LVEDBP in the logistic regression model

logiti = A + + sex, + ,⅛weight7 + Aagei + Aanginai + AclA + Alvedbpi + Asurgeryi

As before, we need to sample a few pilot second-stage observations from each stratum defined by the different levels of mortality
У and first stage covariates (sex and weight). Since first-stage covariates must be categorical, we first created a three-category
weight variable wtcat as 1 for weight < 60, 2 for 60 ≤ weight < 70, and 3 for weight ≥ 70. The first-stage sample sizes



More intriguing information

1. The name is absent
2. A model-free approach to delta hedging
3. IMPROVING THE UNIVERSITY'S PERFORMANCE IN PUBLIC POLICY EDUCATION
4. New urban settlements in Belarus: some trends and changes
5. The name is absent
6. The name is absent
7. The bank lending channel of monetary policy: identification and estimation using Portuguese micro bank data
8. EXECUTIVE SUMMARIES
9. The name is absent
10. Fiscal Policy Rules in Practice
11. CHANGING PRICES, CHANGING CIGARETTE CONSUMPTION
12. A Duality Approach to Testing the Economic Behaviour of Dairy-Marketing Co-operatives: The Case of Ireland
13. STIMULATING COOPERATION AMONG FARMERS IN A POST-SOCIALIST ECONOMY: LESSONS FROM A PUBLIC-PRIVATE MARKETING PARTNERSHIP IN POLAND
14. THE AUTONOMOUS SYSTEMS LABORATORY
15. School Effectiveness in Developing Countries - A Summary of the Research Evidence
16. The name is absent
17. Innovation Policy and the Economy, Volume 11
18. The name is absent
19. How does an infant acquire the ability of joint attention?: A Constructive Approach
20. Midwest prospects and the new economy