Update to a program for saving a model fit as a dataset



Stata Technical Bulletin

31


Options

by Ugroupvarn is not optional. It specifies the name of the grouping variable. This variable must have exactly two possible values.
The lower value indicates Group A, and the higher value indicates Group B.

centile Uunmlitt) specifies a list of percentile differences to be reported and defaults to centile(50) (median only). Specifying
centile(25 50 75) will produce the 25th, 50th, and 75th percentile differences.

level (#) specifies the confidence level, in percent, for confidence intervals. The default is level (95) or as set by set level.

eform specifies that exponentiated percentile differences are to be given. This option is used if depaar is the log of a positive-
valued variable. In this case, confidence intervals are calculated for percentile ratios between values of the original positive
variable, instead of for percentile differences.

c lust er Uaaaanme) specifies the variable which defines sampling clusters. If cluster is defined, then the percentiles are
calculated using the between-cluster Somers’
D, and the confidence intervals are calculated assuming that the data are a
sample of clusters from a population of clusters, rather than a sample of observations from a population of observations.

tdist specifies that the standardized Somers’ D estimates are assumed to be sampled from a t distribution with n — 1 degrees
of freedom, where
n is the number of clusters or the number of observations if cluster is not specified.

transf UtrnnsOonnatinnaanme) specifies that the Somers’ D estimates are to be transformed, defining a standard error for the
transformed population value from which the confidence limits for the percentile differences are calculated. z (the default)
specifies Fisher’s
z (the hyperbolic arctangent), as in specifies Daniels’ arcsine, and iden specifies identity or untransformed.

saving(filenαme[,replace]) specifies a dataset, to be created, whose observations correspond to the observed values of
differences between a value of
depaar in Group A and a value of depaar in Group B. replace instructs Stata to replace
any existing dataset of the same name. The saved dataset can then be reused if cendif is called later, with using, to save
the large amounts of processing time used to calculate the set of observed differences. The saving option and the using
utility are provided mainly for programmers to use, at their own risk.

nohold indicates that any existing estimation results are to be overwritten with a new set of estimation results, for the use of
programmers. By default, any existing estimation results are restored after execution of cendif.

Remarks

cendif calls somersd (see Newson 2000), which has been updated, in order to take long variable lists. (It was previously
limited to eight variables.)

Methods and formulas

Suppose that a population contains two disjoint subpopulations A and B, and a random variable Y is defined for individuals
from both subpopulations. For 0 < 
q < 1, a IOOgth percentile difference in Y between Populations A and B is defined as a
value
θ satisfying

D[Y*(θ')X] = l-2q                                      (1)

where X is a binary variable equal to 1 for Population A and 0 for Population B, Y* (0) is defined as Y if X = 1 and Y + θ
if X = 0, and D[∙ ∙] denotes Somers’ D (Somers 1962, Newson 2000). Somers’ D is defined as

B[VIV] = E [sign(½ - V2) sign(IV1 - IV2) ] / E [sign(IV1 - IV2)2]

(2)


where (IVι,½) and (IV2,12) are bivariate data points sampled independently from the same population, and B[∙] denotes
expectation. In the case of (1), where
W = X and V = V*(0), Somers’ D is the difference between two conditional probabilities.
Given an individual sampled from Population
A and an individual sampled from Population B, these are the probability that the
individual from Population
A has the higher Y* value and the probability that the individual from Population B has the higher
V* value. Somers’
D is therefore the parameter equal to zero under the null hypothesis tested by the “nonparametric” Wilcoxon
rank-sum test on V*(0). In the case where
q = 0.5 (and therefore 1 — 2q = 0), a IOOgth percentile difference is known as a
median percentile difference and is zero under the null hypothesis tested by a Wilcoxon rank-sum test on
Y.

Note that a value of θ satisfying (1) is not always unique. If Y has a discrete distribution, then there may be no solution or
a wide interval of solutions. However, the method used here is intended to produce a confidence interval containing any given
θ satisfying (1), with a probability at least equal to the confidence level, if such a θ exists.

We will assume that there are N1 observations sampled from Population A and V2 observations sampled from Population B,
giving a total of N1 + V2 = V observations. These observations will be identified by double subscripts, so that Y)j is the Y
value for the Jth observation sampled from the ith population (where г = 1 for Population A and г = 2 for Population B). The
corresponding
X values (ones and zeros) will be denoted V„. The observations will be assumed to have importance weights



More intriguing information

1. Unilateral Actions the Case of International Environmental Problems
2. Public-private sector pay differentials in a devolved Scotland
3. INSTITUTIONS AND PRICE TRANSMISSION IN THE VIETNAMESE HOG MARKET
4. The name is absent
5. The name is absent
6. Informal Labour and Credit Markets: A Survey.
7. Technological progress, organizational change and the size of the Human Resources Department
8. Higher education funding reforms in England: the distributional effects and the shifting balance of costs
9. Modeling industrial location decisions in U.S. counties
10. Telecommuting and environmental policy - lessons from the Ecommute program