14
Stata Technical Bulletin
STB-20
3. Estimate the model on the remaining к — 1 indicator variables.
It is this procedure that xi automates.
Using xi: Overview
xi provides a convenient way to include dummy or indicator variables when estimating a model (say with regress,
logistic, etc.). For instance, assume the categorical variable agegrp contains 1 for ages 20-24, 2 for ages 25-39, 3 for ages
40-44, etc. Typing
. xi: logistic outcome weight i.agegrp bp
estimates a logistic regression of outcome on weight, dummies for each agegrp category, and bp. That is, xi searches out
and expands terms starting with “i. ” but leaves the other variables alone. xi will expand both numeric and string categorical
variables, so if you had a string variable race containing “white,” “black,” and “other,” typing
. xi: logistic outcome weight bp i.agegrp i.race
would include indicator variables for the race group as well.
The i. indicator variables xi expands may appear anywhere in the varlist, so
. xi: logistic outcome i.agegrp weight i.race bp
would estimate the same model.
You can also create interactions of categorical variables; typing
xi: logistic outcome weight bp i.agegrp*i.race
estimates a model including indicator variables for all agegrp and race combinations.
You can interact dummy variables with continuous variables:
xi: logistic outcome bp i.agegrp*weight i.race
And, of course, you can include multiple interactions:
xi: logistic outcome bp i.agegrp*weight i.agegrp*i.race
We will now back up and consider each of xi’s features in detail.
Indicator variables for simple effects
When you type ‘i.vanaanw’, xi internally tabulates varname (which may be a string or a numeric variable) and creates
indicator (dummy) variables for each observed value, omitting the indicator for the smallest value. For instance, say agegrp
takes on the values 1, 2, 3, and 4. Typing
xi: logistic outcome i.agegrp
creates indicator variables named Iagegr_2, Iagegr_3, and Iagegr_4. (xi chooses the names and tries to make them readable;
xi guarantees that the names are unique.) The expanded logistic model then is
. logistic outcome Iagegr-2 Iagegr-3 Iagegr-4
Afterwards, you can drop the new variables xi leaves behind by typing ‘drop I*’ (note capitalization).
xi provides the following features when you type ‘i .aanaame’:
1. vannane may be string or numeric.
2. Dummy variables are created automatically.
3. By default, the dummy-variable set is identified by dropping the dummy corresponding to the smallest value of the variable
(how to specify otherwise is discussed below).
4. The new dummy variables are left in your data set. You can drop them by typing ‘drop I*’. You do not have to do this;
each time you use the xi prefix or command, any previously created automatically generated dummies are dropped and
new ones created.
5. The new dummy variables have variable labels so you can determine to what they correspond by typing ‘describe’ or
‘describe I*’.
6. xi may be used with any Stata command (not just logistic).