The name is absent



Stata Technical Bulletin

15


where x and a are the mean and s.d. of the x, and

Z(Y5μ,σ)= [l∕v‰2]e-(y-μ)2

is the density function of a normally distributed variable Y with mean μ and s.d. σ. The confidence interval for Cq is
(Cg   
.100(1—α)sq, cq + ^iθθɑ-ɑ.) Sg).

meansd case. The value of cq is x + zq × s. Its s.e. is given by the formula

sq = s^l∕n + zl∕{2n-2'}.

The confidence interval for Cq is (cg — zɪoo(i-ɑ) × s*q,cq + zɪoo(i-ɑ) × s*).

References

Conover, W. J. 1980. Practical Nonparametric Statistics. 2d ed. New York: John Wiley & Sons.

Kendall, M. G. and A. Stuart. 1969. The Advanced Theory of Statistics, Vol. I. 3d ed. London: Griffin.

Mood, A. M. and F. A. Graybill. 1963. Introduction to the Theory of Statistics. 2d ed. New York: McGraw-Hill.

sg8 Probability weighting

William Rogers, CRC, FAX 310-393-7551

The introduction of Stata 3.0 included what to many is a new kind of weight, the pweight or sampling weight, along with
the more well-known fweights (frequency weights) and aweights (analytic weights).

fweights are conceptually easy—you have data where each observation reflects one or more real observations. fweights
are most easily thought of as a data-compression scheme. An observation might record income, age, etc., and a weight, say
5, meaning that this observation really reflects 5 people with exactly the same income, age, etc. The results of estimating a
frequency-weighted regression are exactly the same as duplicating each observation so that it appears in the data as many times
as it should and then estimating the unweighted regression. There are really no statistics here; just data management.

aweights do solve a real statistical problem. The data you are analyzing reflect averages. You do not know each individual’s
income, age, etc., you know the average income in data grouped on age, etc. Weighting is important when analyzing such
data because the accuracy of the averages increases as the sample size over which the average was calculated increases. An
observation based on averages of 1,000 people is relatively more important than an observation in the same data based on
an average of 5 people. In a regression context, for instance, mispredicting the 1,000-person average is far more serious than
mispredicting, by the same amount, the 5-person average.

pweights solve another statistical problem. You have data in which each observation is an individual—not a group
average—it is merely that some individuals were more likely to appear in your data than others. An observation with a small
probability of appearing, and therefore a large pweight (which is the inverse of the sampling probability) is not in any sense
a more accurate measurement of, say, earnings, than is the earnings recorded in an observation more likely to appear in the
data, and therefore the adjustment made to standard errors is in no way related to the adjustment made to standard errors in
the aweight case. What is related is the adjustment made to the mean parameter estimate—aweights and pweights adjust
means and regression coefficients in the same way. An observation with a high weight contributes more information on the mean
because, in the case of aweights, it is a more precise estimate and, in the case of pweights, because it was less likely to be
sampled and is therefore reflective of a larger underlying population.

pweighted data can arise both intentionally and unintentionally. One might intentionally oversample blacks relative to
whites (as is common in many social-science surveys) or the sick relative to the well (as is common in many epidemiological
studies). Alternatively, imagine a survey that is administered by mail and also imagine, as is typical, that certain types of
respondents are found,
ex post, to have been more likely to respond than others. The group less likely to respond thus reflects
a larger underlying population, but the measurements on the individuals we do have are no more (or less) accurate than any of
our other measurements.

When one begins to consider how a sample is obtained, another issue arises, that of clustered sampling, an issue related to,
but conceptually different from, pweights. Let me first describe how a sample might come to be clustered and then consider
the statistical issues of such clustering.

Assume you are going to survey a population and that you will do this by sending interviewers into the field. It will be
more convenient (i.e., cheaper) if each interviewer can interview persons who are geographically close to each other, so you



More intriguing information

1. The name is absent
2. Perfect Regular Equilibrium
3. Has Competition in the Japanese Banking Sector Improved?
4. Weak and strong sustainability indicators, and regional environmental resources
5. The name is absent
6. Social Irresponsibility in Management
7. The name is absent
8. SOME ISSUES IN LAND TENURE, OWNERSHIP AND CONTROL IN DISPERSED VS. CONCENTRATED AGRICULTURE
9. The name is absent
10. A production model and maintenance planning model for the process industry
11. Types of Tax Concessions for Promoting Investment in Free Economic and Trade Areas
12. The Veblen-Gerschenkron Effect of FDI in Mezzogiorno and East Germany
13. The name is absent
14. Fortschritte bei der Exportorientierung von Dienstleistungsunternehmen
15. The name is absent
16. The name is absent
17. Towards Teaching a Robot to Count Objects
18. The constitution and evolution of the stars
19. Strategic Planning on the Local Level As a Factor of Rural Development in the Republic of Serbia
20. Motivations, Values and Emotions: Three Sides of the same Coin