28
Stata Technical Bulletin
STB-58
Description
predcalc (“prediction calculator”) is an easy method of calculating predicted values and confidence intervals from linear or
logistic regression model estimates for specified values of the X variables. If no model has been fit previously (using regress
or logistic), predcalc will set up and run the model based on the Y variable yvar (continuous or binary) and the specified
X’s.
Options
xvar Xxarrlist) lists the X variables and their values to use to solve the model equation. The model is based only on the X’s
listed in this option. For example, xvar(age=40 gender=l chl=250) specifies that the equation for a model including
the variables age, gender, and chi (cholesterol) will be solved using the values of 40 years old, male, and a cholesterol
level of 250.
level (#) specifies the confidence level, in percent, for confidence intervals for predicted values. The default is level (95) or
as set by set level.
model displays the regression table. This option is not needed to estimate the model. It is simply for display purposes.
linear causes a model with a binary outcome to be fit using linear, rather than logistic regression. This is a rarely used option.
Example 1
All the examples come from a cohort study of coronary heart disease (Cassel 1971). The data consist of 609 men who are
followed for seven years to see what variables are risk factors for an elevated systolic blood pressure sbp (measured in mmHg)
or for coronary heart disease chd (1 for yes, 0 for no). Some of the X variables are serum catecholamine level cat (1 for high,
0 for low), smoking status smk (1 for current smoker, 0 for nonsmoker), regular exercise exer (1 for yes, 0 for no), the men’s
age age in years, and cholesterol level chi in mg per 100 ml.
. use chd
. describe
Contains data from chd.dta
obs : |
609 7 7,917 |
(99.1% of memory free) |
Evans County Data 2 Mar 2000 20:45 |
— 1. sbp |
int |
%8.0g |
Systolic blood pressure |
2. chd |
byte |
%8.0g hilo |
Coronary heart disease |
3. cat |
byte |
%8.0g hilo |
Serum catecholamine level |
4. smk |
byte |
%8.0g yesno |
Smoking status |
5. exer |
byte |
%8.0g yesno |
Regular exercise |
6. age |
byte |
%S.0g |
Age in years |
7. chi |
int |
%S.0g |
Serum cholesterol |
After fitting a linear regression model we find that older age, high catecholamine, no exercise, and higher cholesterol are
significant predictors of higher systolic blood pressure.
. reg sbp age cat exer chi
Source I |
SS |
df |
MS |
Number of obs = F( 4, 604) = |
609 85.91 |
---------+_ |
— | ||||
Model I |
166731.045 |
4 |
41682.7613 |
Prob > F = |
0.0000 |
Residual ∣ |
293038.86 |
604 |
485.163675 |
R-Squared = Adj R-Squared = |
0.3626 0.3584 |
— | |||||
Total I |
459769.905 |
608 |
756.200501 |
Root MSE = |
22.026 |
sbp I |
Coef. |
Std. Err. |
t |
p>∣t∣ |
[957. Conf. |
Interval] |
age I |
.2724588 |
.1054554 |
2.584 |
0.010 |
.0653549 |
.4795627 |
cat I |
33.99951 |
2.622221 |
12.966 |
0.000 |
28.84973 |
39.14929 |
exer I |
-8.783763 |
2.173233 |
-4.042 |
0.000 |
-13.05177 |
-4.Б1Б7БЗ |
chi I |
.0741253 |
.0227488 |
3.258 |
0.001 |
.029449 |
.1188016 |
cons I |
114.7267 |
7.441563 |
15.417 |
0.000 |
100.1123 |
129.3412 |
Given this model, suppose we would like to see the predicted systolic blood pressure for a 60 year old with high catecholamine
(cat = 1), no regular exercise (exer = 0), and a cholesterol level of 260. Because sbp is continuous, predcalc defaults to
linear regression and solves the equation using the elements of β multiplied by the specified X values. Also, it calculates a