10
Stata Technical Bulletin
STB-33
Unless the nograph option is used, a plot will automatically be displayed. By default, the graph options used include
ylabel, xlabel, yline(5) (so one can see the 5th and 95th percentiles), xlim<edmediaa), and c(l).
As implemented, the command can only be used for one variable at a time. However, a new variable f oldx is left in the
dataset (which is quietly dropped on reuse of the command). If the user has more than one measure of something and wants to
compare the plots, then rename foldx to something meaningful and rerun; then one can plot the two mountains against what
each is measuring (be sure to sort on the x variable before graphing).
Note that a variant of egen rank is used with this command (and supplied on the disk as _grank2.ado) that does not
give the average rank to tied values since this would give a misleading plot in many cases. Instead unique ranks are given to
all values even if tied. Tied values can be seen in the plot because they are joined by absolutely vertical lines as long as they
do not cross the median; if they cross the median, then they are joined by absolutely horizontal lines.
Reference
Monti, K. L. 1995. Folded empirical distribution function curves—mountain plots. The American Statistician 49: 342-345.
sg59 Index of ordinal variation and Neyman-Barton GOF
Richard Goldstein, Qualitas, Inc., [email protected]
What do you do when you have a variable with ordered categories? While there are numerous answers to this question when
one has covariates, or other variables, there are few good answers in the univariate situation. This insert presents a measure,
called the index of variation (iov), and test of statistical significance, of the amount of variation in an ordered variable. The
closely related index of ordinal consensus is also presented. An associated program, nbgof, used in testing the significance of
the iov is also presented.
The syntax of iov is
iov vαrααme [if exp [in raage∖ [, rows(#) actual ]
The program provides a measure of variability (and its complement) for ordinal variables. The complement measures lack of
variability. Each variable can either have the same, fixed, number of categories, set by the user, or, by using the option actual,
you can use the actually existing number of categories. If you don’t use either option, the default number is 5. These options
allow for the situation when the variable as defined has x categories, but the particular sample at issue does not use all the
categories.
The iov is 0 (and ioc is 1) when all values fall into one category; the iov is 1 (and the ioc is 0) when extreme polarization
is present. The р-value for a goodness-of-fit test (where the uniform distribution is the null hypothesis; see nbgof) is also
presented. The Berry-Mielke (1994) article gives an algorithm for an exact test, and they also make FORTRAN code for this test
available. The test that I have implemented here is not exact.
Note that the program expects data in the form of individual observations; if data are frequency weighted, they should be
expanded prior to using this program.
Two options are allowed: rows(#) and actual. If you use neither, the program assumes that every variable called should
be treated as though it has five categories. If you use both options, only the actual option will be used.
The default value for rows is 5, chosen simply because the most usual use for this in my own work is with 5-point Likert
scales. Note that if your variable has other than 5 possible values you should definitely use this option as these calculations will
be wrong if you have the wrong number of categories.
The use of actual tells the program to use the actually existing number of categories. Each user must decide whether to
use the possible number of categories or the actual number in every case, but in my experience it is the possible number that
usually, but not always, of interest. Note further that using this program with the possible number of rows given eases use on
new datasets that are based on the same data collection form.
If you use the actual option, then the output tells you how many rows there are for each variable; if you use no option,
or use the rows option, then this information is not supplied.
Note that the originators of this prefer a randomization test; the test here (see below) is offered as an approximation.