Stata Technical Bulletin
29
confidence interval around the predicted value based on the standard error of the prediction. Each variable in the model is set
to the desired value using the xvar option.
. predcalc sbp, xvar(age=60 cat=l exer=0 chl=260)
Model: Linear Regression
Outcome: Systolic blood pressure — sbp
X Values: age=60 cat=l exer=0 chl=260
Num. Obs: 609
Predicted Value and 95% CI for sbp:
184.35 ( 179.31, 189.38)
The predicted value for systolic blood pressure is 184.35 with a 95% confidence interval of 179.3 to 189.4. Had we not run
the model previously, predcalc would still work. The command first looks for stored estimates, and if they are not found, the
appropriate model is run. The model is not shown unless requested with the model option. In either case, it is a good idea to
check the “X Values” list to make sure that the predicted estimate is based on the model and variables expected, since the model
will contain only on the X’s listed in the xvar option.
Example 2
Next we will use the same model but change some of the values for the X variables. This time, we will request the
predicted systolic blood pressure for a 40 year old with low catecholamine (cat = 0), who exercises regularly (exer = 1), and
has a cholesterol level of 200.
. predcalc sbp, xvar(age=40 cat=0 exer=l chl=200)
Model: Linear Regression
Outcome: Systolic blood pressure — sbp
X Values: age=40 cat=0 exer=l chl=200
Num. 0bs: 609
Predicted Value and 95% CI for sbp:
131.67 ( 128.42, 134.91)
This predicted value for systolic blood pressure (131.67) is quite a bit lower than the previous example for an individual with
stronger risk factors for hypertension.
Example 3
Rather than using systolic blood pressure as the outcome, we will look at the dichotomous variable chd for coronary heart
disease (1 for yes, 0 for no). We can use logistic regression, but instead of running the model first, we can use predcalc.
Suppose we want to know the probability of coronary heart disease for a person with strong risk factors: 60 years old, smokes,
does not exercise, and has a cholesterol value of 260. Because chd is binary, a logistic regression model is assumed and run.
The model option prints a copy of the model. (Remember, model is optional and is not needed to run the model. It just displays
the regression table of estimates used to solve the equation).
. predcalc chd, xvar(age=60 smk=l exer=0 chl=260) model
Logit estimates Log likelihood = -204.00576 |
Number of obs = Prob > chi2 = Pseudo R2 = |
609 30.55 | ||||
— chd |
I Odds Ratio |
Std. Err. |
z |
P>∣z∣ |
[957. Conf. |
— Interval] |
— |
— | |||||
age |
I 1.046986 |
.0143708 |
3.345 |
0.001 |
1.019195 |
1.075534 |
smk |
I 2.408027 |
.7311962 |
2.894 |
0.004 |
1.327989 |
4.366448 |
exer |
I .532516 |
.1453497 |
-2.309 |
0.021 |
.3118876 |
.9092162 |
chi |
I 1.007934 |
.0031807 |
2.504 |
0.012 |
1.00172 |
1.014188 |
Model: Logistic Regression
Outcome: Coronary heart disease — chd
X Values: age=60 smk=l exer=0 chl=260
Num. 0bs: 609
Predicted Value and 95% CI for chd:
0.3177 (0.2141, 0.4432)
The probability of developing coronary heart disease for someone with these attributes is 0.32 with 95% confidence interval
from 0.21 to 0.44.