Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
(in the case of urban areas or hostels) or EA (presumably in the case of rural areas).7 This
basically means that all urban households or households living in hostels are first sorted by
magisterial district and then by their average household income. The household income data
come from the Census of 1996. Rural households are sorted by magisterial district and then
by EA. Ten households are then selected randomly from each of the stratified PSUs.

February 2005


The way in which the two-stage sampling process is designed ensures that each household
has an equal chance of selection into the final sample. If each cluster is selected randomly,
with probability of selection proportionate to the size of the cluster, and if the same number
of household is selected from each cluster, then the design is ‘self-weighting’, i.e. each
household has the same chance of being included in the final sample. The IES 2000 is an
example of a self-weighted sample design. The 3,000 randomly selected clusters were
selected with probability proportionate to their size, while 10 households were selected from
each sample. In theory each of the 30,000 households in the sample all had an equal chance
of being included in the sample.

2.2.3. The ‘design effect’

When a sample is stratified, say, along rural-urban lines, there are essentially two independent
surveys that are being conducted. This ensures that the final combined survey is
representative of households from both sectors in the population. The overall variance of an
estimate, say income, will then be the weighted sum of the variance of rural income and urban
income. The covariance or between-sector variance is zero because the two samples are
independent. However, if the overall sample were a single random survey the covariance
would come into play. More importantly, if the means of rural and urban incomes, say, were
very different, the overall variability would be greater. The conclusion from this is that
stratification enhances ‘precision’, where the term precision refers to the variability of an
estimator (Deaton, 1997:14).

Clustering, on the other hand, reduces precision. This can be explained as follows.
Generally speaking, households within clusters are more similar in terms of their
characteristics and behaviour than households of different clusters. Thus, by sampling several
households from the same cluster there is potentially less information content in the survey.
The precision of an estimate therefore depends on the correlation between the observations in
the cluster. The sample design therefore affects the precision, with stratification improving it,
but clustering working against it.

Kish (1965, cited in Deaton, 1997) came up with the concept of ‘design effect’ - also
known as
deff. It is defined as the ratio of the variance of an estimate to the ratio of the

7 Statistics South Africa is not entirely clear on this.

© PROVIDE Project



More intriguing information

1. The name is absent
2. The name is absent
3. Monopolistic Pricing in the Banking Industry: a Dynamic Model
4. Tobacco and Alcohol: Complements or Substitutes? - A Statistical Guinea Pig Approach
5. CONSUMER ACCEPTANCE OF GENETICALLY MODIFIED FOODS
6. THE UNCERTAIN FUTURE OF THE MEXICAN MARKET FOR U.S. COTTON: IMPACT OF THE ELIMINATION OF TEXTILE AND CLOTHING QUOTAS
7. Strategic monetary policy in a monetary union with non-atomistic wage setters
8. Locke's theory of perception
9. The name is absent
10. The Role of Immigration in Sustaining the Social Security System: A Political Economy Approach
11. The name is absent
12. Evidence of coevolution in multi-objective evolutionary algorithms
13. Why Managers Hold Shares of Their Firms: An Empirical Analysis
14. The name is absent
15. Understanding the (relative) fall and rise of construction wages
16. Review of “The Hesitant Hand: Taming Self-Interest in the History of Economic Ideas”
17. EMU: some unanswered questions
18. DIVERSITY OF RURAL PLACES - TEXAS
19. The name is absent
20. The demand for urban transport: An application of discrete choice model for Cadiz