12
Stata Technical Bulletin
STB-22
Reference
Royston, P. and D. G. Altman. 1994. sg26: Using fractional polynomials to model curved regression relationships. Stata Technical Bulletin 21: 11-23.
sg27 The overlapping coefficient and an “improved” rank-sum statistic
Richard Goldstein, Qualitas, Inc., EMAIL [email protected]
We all know there is a difference between statistical and substantive significance; the rejection of a null hypothesis does
not necessarily imply that something meaningful has been found. Yet statistical packages, including Stata, offer many tests of
statistical significance and few, if any, indicators of substantive significance. Here we offer two simple measures, the overlapping
coefficient and a transformation of the rank-sum statistic, that can help users assess the importance of statistically significant
findings.
The overlapping coefficient is a measure of the agreement between two distributions (or “the area under two probability
(density) functions simultaneously” (Bradley 1985, 546)). It is equal to one minus the dissimilarity coefficient, a measure widely
used by social scientists in the study of segregation.
Consider the simplest case of two normally distributed variables, x ~ N{μx,σ2) and y ~ N{μy,σ2). When the variances
of the variables being compared are equal, as they are in this case, the overlapping coefficient, o, is
o= 2*Φ(-∣μa. - μy∖∕2σ)
where Φ(∙) is the standard normal distribution function. If μx = μy, the distributions of x and y agree completely and
о = 2 * Φ(0) = 1
The overlapping coefficient is smaller the further apart are μx and μy, and
о —> 0 as ∣μx — μy∖ —> ∞
The overlapping coefficient can be generalized to the case where x and y have different variances (Inman and Bradley 1989).
Asymptotic arguments justify the use of the overlapping coefficient with a wide variety of non-normally distributed variables.
From the formula above, it is clear that the overlapping coefficient is a measure of the closeness of the location of two
distributions. The correlation of x and y has no influence on the overlapping coefficient. Uncorrelated variables with identical
means overlap perfectly. On the other hand, if y ≡ x + S, then о goes to zero for sufficiently large values of S.
The overlapping coefficient can be used, for instance, to help determine whether a statistically significant s statistic is
important in practical terms. It is also closely related to “the misclassification probability in the two population classification
problem” (Inman and Bradley 1989, 3868). Because I use the statistic only for these purposes, I do not provide measures of the
variance of the statistic.
Formulas used in calculating the overlapping coefficient о come from Inman and Bradley (1989). The calculations are
different depending on whether the variances are equal. If you provide just the names of the variables, both measures are
presented. The “OVL is invariant when a suitable common transformation is made to both variables” (Inman and Bradley 1989,
3852).
As pointed out in Gastwirth (1975), there is a potential problem with the overlapping coefficient: if changes take place only
on one side of the point(s) of intersection of the two distributions, the overlapping coefficient will not reflect these. However,
for the purpose of helping decide whether a s statistic is meaningful this is of little relevance (though it is very relevant for other
purposes).
overlap displays the overlapping coefficient. There are two syntaxes:
overlap vɑrl var2 [ if exp ] [ in range ]
overlap var1 [ if exp ] [ in range ] , by (vαr2)
In the first syntax, var1 and var2 are continuous variables. The second syntax calculates the overlapping coefficient for the
continuous variable var1 across the two groups defined by var2.
More intriguing information
1. The name is absent2. Empirically Analyzing the Impacts of U.S. Export Credit Programs on U.S. Agricultural Export Competitiveness
3. XML PUBLISHING SOLUTIONS FOR A COMPANY
4. The Institutional Determinants of Bilateral Trade Patterns
5. Testing Hypotheses in an I(2) Model with Applications to the Persistent Long Swings in the Dmk/$ Rate
6. The Trade Effects of MERCOSUR and The Andean Community on U.S. Cotton Exports to CBI countries
7. Ruptures in the probability scale. Calculation of ruptures’ values
8. Work Rich, Time Poor? Time-Use of Women and Men in Ireland
9. KNOWLEDGE EVOLUTION
10. THE UNCERTAIN FUTURE OF THE MEXICAN MARKET FOR U.S. COTTON: IMPACT OF THE ELIMINATION OF TEXTILE AND CLOTHING QUOTAS