PROVIDE Project Technical Paper 2005:1
variance had the sample been a simple random one. Stratification typically reduces deff below
one, while clustering increases it above one. Deaton (1997:15) suggests that most surveys
have a deff of more than one, which proves that “in survey design the practical convenience
and cost considerations of clustering usually predominate over the search for variance-
reduction”.
February 2005
2.2.4. Unequal selection probabilities
Although surveys such as the IES 2000 are usually designed to be self-weighting, the
probabilities of inclusion differ between observations. The possibilities of non-cooperation
and non-contact cannot be taken into account when designing a survey. In some cases it also
costs more to sample certain households. In such instance households that are costly to
interview may be excluded on purpose, which affects the probability of inclusion of those
observations. Since each sampled observation or household is representative of a number of
other non-sampled households, it is necessary to adjust the weight of each observation to
account for over- or under-representation of certain types of representative households.
Deaton (1997:15) explains as follows:8
“The rule here is to weight according to the reciprocals of sampling probabilities
because households with low (high) probabilities of selection stand proxy for
large (small) numbers of households in the population.”
Differences in probabilities of selection are either a result of design (in the case of surveys
that were not designed to be self-weighting) or accidental (for example when households
refuse to cooperate). In the case of accidental differences in selection probabilities it is
necessary to add weights to the survey ex-post. However, as Deaton warns, it is very difficult
to find those factors or characteristics that sufficiently explain non-response. A good example
is the apparent low response rate for White households in the IES 2000. Whether the race
explains this low response rate or whether it is as a result of a combination of factors such as
race, income and location is impossible to say. The difficulty in explaining the source(s) of
over- or under-representation suggests that there is a real threat that the ex-post weighting
adjustments could sometimes be incorrect.
2.2.5. Weights in Stata
When specifying the weight option in a Stata command line, Stata attaches a weight to each
observation. This weight can alter the ‘importance’ of each observation in the estimation of
the moments of an observation. The Stata reference manual (StataCorp, 2001) discusses four
types of weights that can be implemented in Stata:
8 See section 2.2.5 (inverse probability weights) for a discussion of the practical implementation in Stata.
6
© PROVIDE Project