Computing optimal sampling designs for two-stage studies

Stata Technical Bulletin

confidence interval around the predicted value based on the standard error of the prediction. Each variable in the model is set
to the desired value using the xvar option.

. predcalc sbp, xvar(age=60 cat=l exer=0 chl=260)

Model: Linear Regression

Outcome: Systolic blood pressure — sbp

X Values: age=60 cat=l exer=0 chl=260

Num. Obs: 609

Predicted Value and 95% CI for sbp:

184.35 ( 179.31, 189.38)

The predicted value for systolic blood pressure is 184.35 with a 95% confidence interval of 179.3 to 189.4. Had we not run
the model previously, predcalc would still work. The command first looks for stored estimates, and if they are not found, the
appropriate model is run. The model is not shown unless requested with the model option. In either case, it is a good idea to
check the “X Values” list to make sure that the predicted estimate is based on the model and variables expected, since the model
will contain only on the X’s listed in the xvar option.

Example 2

Next we will use the same model but change some of the values for the X variables. This time, we will request the
predicted systolic blood pressure for a 40 year old with low catecholamine (cat = 0), who exercises regularly (exer = 1), and
has a cholesterol level of 200.

. predcalc sbp, xvar(age=40 cat=0 exer=l chl=200)

Model: Linear Regression

Outcome: Systolic blood pressure — sbp

X Values: age=40 cat=0 exer=l chl=200

Num. 0bs: 609

Predicted Value and 95% CI for sbp:

131.67 ( 128.42, 134.91)

This predicted value for systolic blood pressure (131.67) is quite a bit lower than the previous example for an individual with
stronger risk factors for hypertension.

Example 3

Rather than using systolic blood pressure as the outcome, we will look at the dichotomous variable chd for coronary heart
disease (1 for yes, 0 for no). We can use logistic regression, but instead of running the model first, we can use predcalc.
Suppose we want to know the probability of coronary heart disease for a person with strong risk factors: 60 years old, smokes,
does not exercise, and has a cholesterol value of 260. Because chd is binary, a logistic regression model is assumed and run.
The model option prints a copy of the model. (Remember, model is optional and is not needed to run the model. It just displays
the regression table of estimates used to solve the equation).

. predcalc chd, xvar(age=60 smk=l exer=0 chl=260) model

Logit estimates Log likelihood = -204.00576			Number of obs = LR chi2(4) Prob > chi2 = Pseudo R2 =			609 30.55 0.0000 0.0697
— chd	I Odds Ratio	Std. Err.	z	P>∣z∣	[957. Conf.	— Interval]
—						—
age	I 1.046986	.0143708	3.345	0.001	1.019195	1.075534
smk	I 2.408027	.7311962	2.894	0.004	1.327989	4.366448
exer	I .532516	.1453497	-2.309	0.021	.3118876	.9092162
chi	I 1.007934	.0031807	2.504	0.012	1.00172	1.014188

Model: Logistic Regression

Outcome: Coronary heart disease — chd

X Values: age=60 smk=l exer=0 chl=260

Num. 0bs: 609

Predicted Value and 95% CI for chd:

0.3177 (0.2141, 0.4432)

The probability of developing coronary heart disease for someone with these attributes is 0.32 with 95% confidence interval
from 0.21 to 0.44.

More intriguing information

1. The name is absent
2. The name is absent
3. Manufacturing Earnings and Cycles: New Evidence
4. A Unified Model For Developmental Robotics
5. The name is absent
6. Nietzsche, immortality, singularity and eternal recurrence1
7. The name is absent
8. The Clustering of Financial Services in London*
9. Why unwinding preferences is not the same as liberalisation: the case of sugar
10. The name is absent