Computing optimal sampling designs for two-stage studies



Stata Technical Bulletin

27


We can compare this to the logistic regression analysis using only the complete observations:

. keep if x”=.

. logit y x, or

Logit estimates

Log likelihood = -299.1S92S

Number of obs =

LR chi2(l)

Prob > chi2     =

Pseudo R2       =

500

92.97

0.0000
0.1345

I odd-ratio Std. Err.

z

P>z

[957, Conf.

Interval]

---------+-------------------------

x I 2.771684   .3326964

8.493

0.000

2.190638

3.506847

Note that the mean score estimate above had smaller standard error, reflecting the additional information used in the analysis.
Also, since
i is a surrogate for .r, it is not used in the complete case analysis.

Next, we consider a real example of an application of the mean score method to a case-control study of the association
between ectopic pregnancy and sexually transmitted diseases; see Reilly and Pepe (1995) for a full description of the data

. use ectopic

. meanscor y gonn-chlam,first(gonn-sexptn) second(chlam)
meanscore estimates

I

odd-ratio

Std. Err.

z

P>z

[957. Conf.

Interval]

cons

I

.4543184

.0987123

-3.631

0.000

.2967666

.6955137

gonn

I

.9495978

.2856096

-0.172

0.863

.5266531

1.712201

contr

I

.0943838

.0176643

-12.612

0.000

.0654021

.1362082

sexptn

I

2.099286

.4938943

3.152

0.002

1.323766

3.329139

chlam

I

2.471606

.7808384

2.864

0.004

1.330653

4.590858

For comparison, an analysis of complete cases only gives

. keep if chlam ~=.

. logit y gonn-chlam, or

Logit estimates

Log likelihood = -169.54627

Number of obs =
LR chi2(4)

Prob > chi2     =

Pseudo R2       =

327
104.24
0.0000
0.2351

I

odd-ratio Std. Err.

z

P>z

[957. Conf.

Interval]

— — —--— —--+—

gonn I

.7445515   .3132037

-0.701

0.483

.3264582

1.698095

contr I

.1098308   .0303352

-7.997

0.000

.063918

.1887231

sexptn I

1.93898   .7101447

1.808

0.071

.945853

3.97487

chlam I

2.47682   .7576623

2.965

0.003

1.359912

4.511054

References

Reilly, M. 1996. Optimal sampling strategies for two-stage studies. American Journal of Epidemiology 143: 92-100.

Reilly, M. and M. S. Pepe. 1995. A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82: 299-314.

sg157 Predicted values calculated from linear or logistic regression models

Joanne M. Garrett, University of North Carolina, [email protected]

Abstract: The program predcalc for easily calculating predicted values and confidence intervals from linear or logistic regression
model estimates for specified values of the
X variables is introduced and illustrated.

Keywords: regression models, predicted values.

Syntax

predcalc yvar, 7yt⅛x(xvarli.st) [ level (#) model linear ]



More intriguing information

1. Agricultural Policy as a Social Engineering Tool
2. A Location Game On Disjoint Circles
3. A Bayesian approach to analyze regional elasticities
4. The Veblen-Gerschenkron Effect of FDI in Mezzogiorno and East Germany
5. Incorporating global skills within UK higher education of engineers
6. APPLYING BIOSOLIDS: ISSUES FOR VIRGINIA AGRICULTURE
7. Barriers and Limitations in the Development of Industrial Innovation in the Region
8. Towards a framework for critical citizenship education
9. How do investors' expectations drive asset prices?
10. Subduing High Inflation in Romania. How to Better Monetary and Exchange Rate Mechanisms?
11. The name is absent
12. Indirect Effects of Pesticide Regulation and the Food Quality Protection Act
13. Two-Part Tax Controls for Forest Density and Rotation Time
14. Convergence in TFP among Italian Regions - Panel Unit Roots with Heterogeneity and Cross Sectional Dependence
15. Weather Forecasting for Weather Derivatives
16. THE AUTONOMOUS SYSTEMS LABORATORY
17. On the estimation of hospital cost: the approach
18. ¿Por qué se privatizan servicios en los municipios (pequeños)? Evidencia empírica sobre residuos sólidos y agua.
19. Cross-Country Evidence on the Link between the Level of Infrastructure and Capital Inflows
20. The name is absent