Update to a program for saving a model fit as a dataset



28


Stata Technical Bulletin


STB-58


Description

predcalc (“prediction calculator”) is an easy method of calculating predicted values and confidence intervals from linear or
logistic regression model estimates for specified values of the
X variables. If no model has been fit previously (using regress
or logistic), predcalc will set up and run the model based on the
Y variable yvar (continuous or binary) and the specified
X’s.

Options

xvar Xxarrlist) lists the X variables and their values to use to solve the model equation. The model is based only on the X’s
listed in this option. For example, xvar(age=40 gender=l chl=250) specifies that the equation for a model including
the variables age, gender, and chi (cholesterol) will be solved using the values of 40 years old, male, and a cholesterol
level of 250.

level (#) specifies the confidence level, in percent, for confidence intervals for predicted values. The default is level (95) or
as set by set level.

model displays the regression table. This option is not needed to estimate the model. It is simply for display purposes.

linear causes a model with a binary outcome to be fit using linear, rather than logistic regression. This is a rarely used option.

Example 1

All the examples come from a cohort study of coronary heart disease (Cassel 1971). The data consist of 609 men who are
followed for seven years to see what variables are risk factors for an elevated systolic blood pressure sbp (measured in mmHg)
or for coronary heart disease chd (1 for yes, 0 for no). Some of the
X variables are serum catecholamine level cat (1 for high,
0 for low), smoking status smk (1 for current smoker, 0 for nonsmoker), regular exercise exer (1 for yes, 0 for no), the men’s
age age in years, and cholesterol level chi in mg per 100 ml.

. use chd

. describe

Contains data from chd.dta

obs :
vars:
size:

609

7

7,917

(99.1% of memory free)

Evans County Data

2 Mar 2000 20:45

1. sbp

int

%8.0g

Systolic blood pressure

2. chd

byte

%8.0g       hilo

Coronary heart disease

3. cat

byte

%8.0g       hilo

Serum catecholamine level

4. smk

byte

%8.0g       yesno

Smoking status

5. exer

byte

%8.0g       yesno

Regular exercise

6. age

byte

%S.0g

Age in years

7. chi

int

%S.0g

Serum cholesterol

After fitting a linear regression model we find that older age, high catecholamine, no exercise, and higher cholesterol are
significant predictors of higher systolic blood pressure.

. reg sbp age cat exer chi

Source I

SS

df

MS

Number of obs =

F( 4,   604) =

609

85.91

---------+_

Model I

166731.045

4

41682.7613

Prob > F      =

0.0000

Residual

293038.86

604

485.163675

R-Squared    =

Adj R-Squared =

0.3626

0.3584

Total I

459769.905

608

756.200501

Root MSE      =

22.026

sbp I

Coef.

Std. Err.

t

p>t

[957. Conf.

Interval]

age I

.2724588

.1054554

2.584

0.010

.0653549

.4795627

cat I

33.99951

2.622221

12.966

0.000

28.84973

39.14929

exer I

-8.783763

2.173233

-4.042

0.000

-13.05177

-4.Б1Б7БЗ

chi I

.0741253

.0227488

3.258

0.001

.029449

.1188016

cons I

114.7267

7.441563

15.417

0.000

100.1123

129.3412

Given this model, suppose we would like to see the predicted systolic blood pressure for a 60 year old with high catecholamine
(cat = 1), no regular exercise (exer = 0), and a cholesterol level of 260. Because sbp is continuous, predcalc defaults to
linear regression and solves the equation using the elements of
β multiplied by the specified X values. Also, it calculates a



More intriguing information

1. Wirtschaftslage und Reformprozesse in Estland, Lettland, und Litauen: Bericht 2001
2. El Mercosur y la integración económica global
3. The value-added of primary schools: what is it really measuring?
4. Searching Threshold Inflation for India
5. Do imputed education histories provide satisfactory results in fertility analysis in the Western German context?
6. CGE modelling of the resources boom in Indonesia and Australia using TERM
7. Informal Labour and Credit Markets: A Survey.
8. Testing the Information Matrix Equality with Robust Estimators
9. Types of Cost in Inductive Concept Learning
10. The name is absent
11. The name is absent
12. A dynamic approach to the tendency of industries to cluster
13. The name is absent
14. The name is absent
15. Herman Melville and the Problem of Evil
16. The name is absent
17. The name is absent
18. The name is absent
19. Trade Liberalization, Firm Performance and Labour Market Outcomes in the Developing World: What Can We Learn from Micro-LevelData?
20. The name is absent