Computing optimal sampling designs for two-stage studies

Stata Technical Bulletin

We can compare this to the logistic regression analysis using only the complete observations:

. keep if x”=.

. logit y x, or

Logit estimates

Log likelihood = -299.1S92S

Number of obs =

LR chi2(l)

Prob > chi2 =

Pseudo R2 =

500

92.97

0.0000
0.1345

—

I odd-ratio Std. Err.

P>∣z∣

[957, Conf.

—

Interval]

---------₊-------------------------

x I 2.771684 .3326964

8.493

0.000

2.190638

3.506847

Note that the mean score estimate above had smaller standard error, reflecting the additional information used in the analysis.
Also, since i is a surrogate for .r, it is not used in the complete case analysis.

Next, we consider a real example of an application of the mean score method to a case-control study of the association
between ectopic pregnancy and sexually transmitted diseases; see Reilly and Pepe (1995) for a full description of the data

. use ectopic

. meanscor y gonn-chlam,first(gonn-sexptn) second(chlam)
meanscore estimates

	I	odd-ratio	Std. Err.	z	P>∣z∣	[957. Conf.	Interval]
cons	I	.4543184	.0987123	-3.631	0.000	.2967666	.6955137
gonn	I	.9495978	.2856096	-0.172	0.863	.5266531	1.712201
contr	I	.0943838	.0176643	-12.612	0.000	.0654021	.1362082
sexptn	I	2.099286	.4938943	3.152	0.002	1.323766	3.329139
chlam	I	2.471606	.7808384	2.864	0.004	1.330653	4.590858

For comparison, an analysis of complete cases only gives

. keep if chlam ~=.

. logit y gonn-chlam, or
Logit estimates Log likelihood = -169.54627		Number of obs = LR chi2(4) Prob > chi2 = Pseudo R2 =			327 104.24 0.0000 0.2351
— I	odd-ratio Std. Err.	z	P>∣z∣	[957. Conf.	— Interval]
— — —--— —--+—					—
gonn I	.7445515 .3132037	-0.701	0.483	.3264582	1.698095
contr I	.1098308 .0303352	-7.997	0.000	.063918	.1887231
sexptn I	1.93898 .7101447	1.808	0.071	.945853	3.97487
chlam I	2.47682 .7576623	2.965	0.003	1.359912	4.511054

References

Reilly, M. 1996. Optimal sampling strategies for two-stage studies. American Journal of Epidemiology 143: 92-100.

Reilly, M. and M. S. Pepe. 1995. A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82: 299-314.

sg157 Predicted values calculated from linear or logistic regression models

Joanne M. Garrett, University of North Carolina, [email protected]

Abstract: The program predcalc for easily calculating predicted values and confidence intervals from linear or logistic regression
model estimates for specified values of the X variables is introduced and illustrated.

Keywords: regression models, predicted values.

Syntax

predcalc yvar, 7yt⅛x(xvarli.st) [ level (#) model linear ]

More intriguing information

1. Measuring Semantic Similarity by Latent Relational Analysis
2. The name is absent
3. The name is absent
4. WP 36 - Women's Preferences or Delineated Policies? The development or part-time work in the Netherlands, Germany and the United Kingdom
5. Wirtschaftslage und Reformprozesse in Estland, Lettland, und Litauen: Bericht 2001
6. Tissue Tracking Imaging for Identifying the Origin of Idiopathic Ventricular Arrhythmias: A New Role of Cardiac Ultrasound in Electrophysiology
7. The name is absent
8. The Trade Effects of MERCOSUR and The Andean Community on U.S. Cotton Exports to CBI countries
9. The Role of Trait Emotional Intelligence (El) in the Workplace.
10. Heterogeneity of Investors and Asset Pricing in a Risk-Value World