Update to a program for saving a model fit as a dataset



Stata Technical Bulletin

31


Options

by Ugroupvarn is not optional. It specifies the name of the grouping variable. This variable must have exactly two possible values.
The lower value indicates Group A, and the higher value indicates Group B.

centile Uunmlitt) specifies a list of percentile differences to be reported and defaults to centile(50) (median only). Specifying
centile(25 50 75) will produce the 25th, 50th, and 75th percentile differences.

level (#) specifies the confidence level, in percent, for confidence intervals. The default is level (95) or as set by set level.

eform specifies that exponentiated percentile differences are to be given. This option is used if depaar is the log of a positive-
valued variable. In this case, confidence intervals are calculated for percentile ratios between values of the original positive
variable, instead of for percentile differences.

c lust er Uaaaanme) specifies the variable which defines sampling clusters. If cluster is defined, then the percentiles are
calculated using the between-cluster Somers’
D, and the confidence intervals are calculated assuming that the data are a
sample of clusters from a population of clusters, rather than a sample of observations from a population of observations.

tdist specifies that the standardized Somers’ D estimates are assumed to be sampled from a t distribution with n — 1 degrees
of freedom, where
n is the number of clusters or the number of observations if cluster is not specified.

transf UtrnnsOonnatinnaanme) specifies that the Somers’ D estimates are to be transformed, defining a standard error for the
transformed population value from which the confidence limits for the percentile differences are calculated. z (the default)
specifies Fisher’s
z (the hyperbolic arctangent), as in specifies Daniels’ arcsine, and iden specifies identity or untransformed.

saving(filenαme[,replace]) specifies a dataset, to be created, whose observations correspond to the observed values of
differences between a value of
depaar in Group A and a value of depaar in Group B. replace instructs Stata to replace
any existing dataset of the same name. The saved dataset can then be reused if cendif is called later, with using, to save
the large amounts of processing time used to calculate the set of observed differences. The saving option and the using
utility are provided mainly for programmers to use, at their own risk.

nohold indicates that any existing estimation results are to be overwritten with a new set of estimation results, for the use of
programmers. By default, any existing estimation results are restored after execution of cendif.

Remarks

cendif calls somersd (see Newson 2000), which has been updated, in order to take long variable lists. (It was previously
limited to eight variables.)

Methods and formulas

Suppose that a population contains two disjoint subpopulations A and B, and a random variable Y is defined for individuals
from both subpopulations. For 0 < 
q < 1, a IOOgth percentile difference in Y between Populations A and B is defined as a
value
θ satisfying

D[Y*(θ')X] = l-2q                                      (1)

where X is a binary variable equal to 1 for Population A and 0 for Population B, Y* (0) is defined as Y if X = 1 and Y + θ
if X = 0, and D[∙ ∙] denotes Somers’ D (Somers 1962, Newson 2000). Somers’ D is defined as

B[VIV] = E [sign(½ - V2) sign(IV1 - IV2) ] / E [sign(IV1 - IV2)2]

(2)


where (IVι,½) and (IV2,12) are bivariate data points sampled independently from the same population, and B[∙] denotes
expectation. In the case of (1), where
W = X and V = V*(0), Somers’ D is the difference between two conditional probabilities.
Given an individual sampled from Population
A and an individual sampled from Population B, these are the probability that the
individual from Population
A has the higher Y* value and the probability that the individual from Population B has the higher
V* value. Somers’
D is therefore the parameter equal to zero under the null hypothesis tested by the “nonparametric” Wilcoxon
rank-sum test on V*(0). In the case where
q = 0.5 (and therefore 1 — 2q = 0), a IOOgth percentile difference is known as a
median percentile difference and is zero under the null hypothesis tested by a Wilcoxon rank-sum test on
Y.

Note that a value of θ satisfying (1) is not always unique. If Y has a discrete distribution, then there may be no solution or
a wide interval of solutions. However, the method used here is intended to produce a confidence interval containing any given
θ satisfying (1), with a probability at least equal to the confidence level, if such a θ exists.

We will assume that there are N1 observations sampled from Population A and V2 observations sampled from Population B,
giving a total of N1 + V2 = V observations. These observations will be identified by double subscripts, so that Y)j is the Y
value for the Jth observation sampled from the ith population (where г = 1 for Population A and г = 2 for Population B). The
corresponding
X values (ones and zeros) will be denoted V„. The observations will be assumed to have importance weights



More intriguing information

1. The name is absent
2. 5th and 8th grade pupils’ and teachers’ perceptions of the relationships between teaching methods, classroom ethos, and positive affective attitudes towards learning mathematics in Japan
3. WP 48 - Population ageing in the Netherlands: Demographic and financial arguments for a balanced approach
4. The name is absent
5. The name is absent
6. Migrant Business Networks and FDI
7. STIMULATING COOPERATION AMONG FARMERS IN A POST-SOCIALIST ECONOMY: LESSONS FROM A PUBLIC-PRIVATE MARKETING PARTNERSHIP IN POLAND
8. The name is absent
9. Whatever happened to competition in space agency procurement? The case of NASA
10. Estimating the Economic Value of Specific Characteristics Associated with Angus Bulls Sold at Auction
11. The name is absent
12. On the Real Exchange Rate Effects of Higher Electricity Prices in South Africa
13. Models of Cognition: Neurological possibility does not indicate neurological plausibility.
14. NVESTIGATING LEXICAL ACQUISITION PATTERNS: CONTEXT AND COGNITION
15. The name is absent
16. FISCAL CONSOLIDATION AND DECENTRALISATION: A TALE OF TWO TIERS
17. Draft of paper published in:
18. Feature type effects in semantic memory: An event related potentials study
19. The name is absent
20. The name is absent