The name is absent



Stata Technical Bulletin

17


The Sampling Statistician runs a very simple regression, earnings on gender, and includes sampling weights to account for
the oversampling of blacks in the data. He reports the difference that the ratio of female to male earnings is 70%.

The following are worth noting: (1) Using unweighted regression, the Econometrician produced an incorrect answer when
sloppy—that is, when the model was wrong; (2) the Sampling Statistician’s problem was easier than that of the Econometrician
and he had no chance of producing the wrong answer; (3) the careful Econometrician, on the other hand, not only produced the
right answer, but produced an answer that contained more information than that produced by the Sampling Statistician.

Let us now compare the approaches of the Econometrician and the Survey Statistician on the issue of weights. The
Econometrician can be proved wrong by the data; given a set of sampling probabilities, the Econometrician may find that they
are related to the residual and may also discover that there is no set of independent variables in his model to free the residual
of this “unexplainable” correlation. On the other hand, the Sampling Statistician may be confronted by a sensitivity analysis
showing that weights for which he has so carefully accounted do not matter, but in that case, he will merely argue that the
inference can only be made under the assumption that they do matter and add that we merely happened to be lucky this time.
The Sampling Statistician will argue that if the Econometrician wants to estimate behavioral models, that’s fine, but that is still
no reason for ignoring the weights. If the Econometrician wants to perform a sensitivity analysis
ex post and finds that the
weights do not matter, that’s fine too. But if the Econometrician simply ignores the weights, that is not fine.

So far, we have not really distinguished between sampling weights and clustering. Mathematically, there are actually two
issues. Sampling weights have to do with the issue that two observations do not have the same probability of appearing in the
data. Clustering has to do with the issue that two observations may be somehow related in a way not otherwise described by
the variables in the data. To adjust standard errors for both, the estimator needs to know the sampling weights and the cluster
to which each observation belongs.

However, the Econometrician and the Sampling Statistician again have a characteristically different approach. The Econo-
metrician treats the clustering as if it were another element that needs to be modeled, and then proceeds as if the revised model
is correct. So he may introduce heterogeneity parameters and try to estimate them. “Variance components” models are one way
this is done. The Sampling Statistician wants his regression coefficients to reflect means or differences in means. He is more
interested in correcting the standard errors of the analysis he is already doing.

In attempting to estimate efficiently, the sloppy Econometrician may unwittingly downweight large clusters, since they
have less information per observation. From the Survey Statistician’s point of view, this potentially introduces a bias. However,
the careful Econometrician gains additional information on relationships among clustered observations that may be useful in
understanding the phenomenon under study.

How do you know where a given analysis fits, philosophically speaking? Econometricians sometimes use (or are forced by
data availability to use) reduced form models, in which case they should behave as if they were Sampling Statisticians. Sampling
Statisticians may sometimes use maximum-likelihood methods, but that does not make them Econometricians (logit analysis, for
example, can be a fancy way to compare proportions with adjustment). In short, if the analysis is anything less than an all-out
attempt at behavioral modeling, or if weighted analysis changes the results substantially, it needs to be considered from the
viewpoint of the Sampling Statistician.

This is where Huber’s method (implemented in Stata; see [5s] huber) is helpful. These commands take the philosophical
approach of the Sampling Statistician. With the Huber method, weighted or clustered problems can be estimated using regression,
logit, or probit estimation techniques. The calculations differ from the aweighted answers only in their standard errors.

If you have sampling weights or clusters, even if you think you have the “right” model, Huber’s method is one way you can
check your assumption. If the answers are substantially different than your weighted analysis, you know you have a problem. If
your goal was to estimate a reduced form in any case, your problem is also solved. If your goal was to estimate a full behavioral
model, you now know its time to reconsider the functional form, the hypothesized variables, or selection effects.



More intriguing information

1. How much do Educational Outcomes Matter in OECD Countries?
2. The Role of Land Retirement Programs for Management of Water Resources
3. Tissue Tracking Imaging for Identifying the Origin of Idiopathic Ventricular Arrhythmias: A New Role of Cardiac Ultrasound in Electrophysiology
4. Subduing High Inflation in Romania. How to Better Monetary and Exchange Rate Mechanisms?
5. The name is absent
6. The name is absent
7. Should Local Public Employment Services be Merged with the Local Social Benefit Administrations?
8. The duration of fixed exchange rate regimes
9. The name is absent
10. The name is absent
11. Design and investigation of scalable multicast recursive protocols for wired and wireless ad hoc networks
12. Contribution of Economics to Design of Sustainable Cattle Breeding Programs in Eastern Africa: A Choice Experiment Approach
13. Does adult education at upper secondary level influence annual wage earnings?
14. Kharaj and land proprietary right in the sixteenth century: An example of law and economics
15. The name is absent
16. A parametric approach to the estimation of cointegration vectors in panel data
17. Modelling the health related benefits of environmental policies - a CGE analysis for the eu countries with gem-e3
18. Spousal Labor Market Effects from Government Health Insurance: Evidence from a Veterans Affairs Expansion
19. Education Research Gender, Education and Development - A Partially Annotated and Selective Bibliography
20. Income Mobility of Owners of Small Businesses when Boundaries between Occupations are Vague