The name is absent



Stata Technical Bulletin

15


Consider “inverting” the model implied by the tests in the example above. In other words, instead of explaining the mean
fuel efficiency by noting whether a car is domestic or foreign, consider predicting whether a car is an import by measuring its
fuel efficiency. Since the dependent variable is qualitative (domestic/import), a logistic model is a natural framework for this
prediction exercise. It turns out that
U∕mn = 1— ROC where ROC is the area under the ROC curve for the logistic model. Thus:

. logistic foreign mpg

Logit Estimates                                          Number of obs =     74

chi2(l)       = 11.49

Prob > chi2   = 0.0007

Log Likelihood = -39.28864                             Pseudo R2 = 0.1276


foreign I

Odds Ratio

Std. Err.

z

P>z

[957, Conf.

Interval]

—————————+*

≡pg I

1.173232

.0616972

3.038

0.002

1.058331

1.300608

. lroc, nograph

Logistic estimates for foreign
Area under ROC curve = 0.7286

References

Bradley, E. L. 1985. Overlapping coefficient. In Encyclopedia of Statistical Sciences, ed. S. Kotz and N. L. Johnson, vol. 6, 546-547. New York:
Wiley.

Fleiss, J. L. 1981. Statistical Methods for Rates and Proportions. 2d ed. New York: Wiley.

Gastwirth, J. L. 1975. Statistical measures of earnings differentials. The American Statistician 29: 32-35.

Inman, H. F. and E. L. Bradley, Jr. 1989. The overlapping coefficient as a measure of agreement between two probability distributions and point
estimation of the overlap of two normal densities.
Communications in Statistics—Theory and Methodology 18: 3851-3874.

Moses, L. E., J. D. Emerson, and H. Hosseini. 1992. Analyzing data from ordered categories. In Medical Uses of Statistics, 2d ed., ed. J. C. Bailar III
and F. Mosteller, 259-279. Boston: NEJM Books.

sg28 Multiple comparisons of categories after regression-like methods

William H. Rogers, Stata Corporation, FAX 409-696-4601

In a typical experiment or survey setting, we compare the responses of two or more groups. If we estimate a parametric
model, the covariance matrix of the parameters supplies us with standard error estimates for any individual parameter or contrast.
The theory of hypothesis testing provides ways of using these estimated standard errors to calculate tests of hypotheses about the
responses of different groups. For example, we might test whether crop yield is affected by the application of various fertilizers.
These tests are well known and are provided by virtually every statistical package. Ambiguities arise, however, when we make
multiple comparisons; that is, when we test more than one hypothesis about a model.

We can illustrate this problem using the automobile data set provided with Stata. This data set contains a variable, rep78,
that records the repair record in 1978 of each car. rep78 is coded as ‘1’ for cars with poor repair records, as ‘2’ for cars with
fair repair records, and so on up to ‘5’ for cars with excellent repair records. For the sake of the example, we treat rep78 as a
categorical variable rather than as an ordinal variable.

An interesting question is whether the price of a car depends on its repair record. One way to answer this question is to
estimate a regression for price where the repair record is an explanatory variable. Since the repair record is a categorical variable,
we cannot enter it directly as a regressor. Instead, we use the tabulate command to create indicator or dummy variables, one for
each level of rep78. All but one of these dummies are entered in the price regression. (The set of all five dummies is collinear
with the constant term in the regression; either the constant or one of the dummies must be dropped.) In this parameterization,
the coefficient on each dummy variable estimates the difference in the average price between the indicated level of rep78 and
the level corresponding to the omitted dummy variable.

. use stataauto

(1978 Automobile Data)

. tabulate rep78, generate(r)



More intriguing information

1. Measuring Semantic Similarity by Latent Relational Analysis
2. The name is absent
3. Language discrimination by human newborns and by cotton-top tamarin monkeys
4. The name is absent
5. Migrating Football Players, Transfer Fees and Migration Controls
6. Transfer from primary school to secondary school
7. The name is absent
8. Linking Indigenous Social Capital to a Global Economy
9. Monetary Discretion, Pricing Complementarity and Dynamic Multiple Equilibria
10. Visual Artists Between Cultural Demand and Economic Subsistence. Empirical Findings From Berlin.
11. EMU's Decentralized System of Fiscal Policy
12. Technological progress, organizational change and the size of the Human Resources Department
13. Co-ordinating European sectoral policies against the background of European Spatial Development
14. THE UNCERTAIN FUTURE OF THE MEXICAN MARKET FOR U.S. COTTON: IMPACT OF THE ELIMINATION OF TEXTILE AND CLOTHING QUOTAS
15. Towards a framework for critical citizenship education
16. PACKAGING: A KEY ELEMENT IN ADDED VALUE
17. Firm Closure, Financial Losses and the Consequences for an Entrepreneurial Restart
18. The name is absent
19. The name is absent
20. The name is absent