Creating a 2000 IES-LFS Database in Stata

PROVIDE Project Technical Paper 2005:1
(in the case of urban areas or hostels) or EA (presumably in the case of rural areas).⁷ This
basically means that all urban households or households living in hostels are first sorted by
magisterial district and then by their average household income. The household income data
come from the Census of 1996. Rural households are sorted by magisterial district and then
by EA. Ten households are then selected randomly from each of the stratified PSUs.

February 2005

The way in which the two-stage sampling process is designed ensures that each household
has an equal chance of selection into the final sample. If each cluster is selected randomly,
with probability of selection proportionate to the size of the cluster, and if the same number
of household is selected from each cluster, then the design is ‘self-weighting’, i.e. each
household has the same chance of being included in the final sample. The IES 2000 is an
example of a self-weighted sample design. The 3,000 randomly selected clusters were
selected with probability proportionate to their size, while 10 households were selected from
each sample. In theory each of the 30,000 households in the sample all had an equal chance
of being included in the sample.

2.2.3. The ‘design effect’

When a sample is stratified, say, along rural-urban lines, there are essentially two independent
surveys that are being conducted. This ensures that the final combined survey is
representative of households from both sectors in the population. The overall variance of an
estimate, say income, will then be the weighted sum of the variance of rural income and urban
income. The covariance or between-sector variance is zero because the two samples are
independent. However, if the overall sample were a single random survey the covariance
would come into play. More importantly, if the means of rural and urban incomes, say, were
very different, the overall variability would be greater. The conclusion from this is that
stratification enhances ‘precision’, where the term precision refers to the variability of an
estimator (Deaton, 1997:14).

Clustering, on the other hand, reduces precision. This can be explained as follows.
Generally speaking, households within clusters are more similar in terms of their
characteristics and behaviour than households of different clusters. Thus, by sampling several
households from the same cluster there is potentially less information content in the survey.
The precision of an estimate therefore depends on the correlation between the observations in
the cluster. The sample design therefore affects the precision, with stratification improving it,
but clustering working against it.

Kish (1965, cited in Deaton, 1997) came up with the concept of ‘design effect’ - also
known as deff. It is defined as the ratio of the variance of an estimate to the ratio of the

⁷ Statistics South Africa is not entirely clear on this.

More intriguing information

1. Direct observations of the kinetics of migrating T-cells suggest active retention by endothelial cells with continual bidirectional migration
2. The name is absent
3. Word searches: on the use of verbal and non-verbal resources during classroom talk
4. Industrial Cores and Peripheries in Brazil
5. Delivering job search services in rural labour markets: the role of ICT
6. Tobacco and Alcohol: Complements or Substitutes? - A Statistical Guinea Pig Approach
7. Towards Teaching a Robot to Count Objects
8. PERFORMANCE PREMISES FOR HUMAN RESOURCES FROM PUBLIC HEALTH ORGANIZATIONS IN ROMANIA
9. Internationalization of Universities as Internationalization of Bildung
10. Experimental Evidence of Risk Aversion in Consumer Markets: The Case of Beef Tenderness