PROVIDE Project Technical Paper 2005:1
households in each selected cluster. The second stage then involves drawing households from
each sampled cluster to enter into the survey. Often this stage of the selection process is
informed by prior knowledge about households, which implies that stratification comes into
play. Clustering and stratification are discussed in more detail in the following section.
February 2005
2.2.2. Clustering and stratification
In the two-stage sample design clusters are first selected randomly from a list of clusters
covering the entire population. Next, households are selected from each of the sampled
clusters. This generates a final sample in which households are not randomly distributed over
space, but are grouped geographically. The most important reason for clustering is the cost-
effectiveness of this approach. With clustering it also becomes more feasible to gather
village-level information on, for example, schools, clinics and (local) government services.
The Census of 1996 forms the basis for clustering in the IES 2000 sample. The 3,000 primary
sampling units (PSUs) in the IES 2000 are drawn randomly from the list of census
enumeration areas (EAs) (SSA, 2002a).
Before households are drawn from the list of random clusters, it has to be decided whether
prior knowledge about households should be used to influence the selection process. Often
surveys are required to generate statistics for population sub-groups, e.g. by geographical
area, race or standard of living. Stratification is a method used to ensure that observations
from each of these groups are adequately represented in the final sample by “effectively
[converting] a sample from one population into a sample from many populations” (Deaton,
1997:13). Household income and expenditure surveys “nearly always” distinguish between
rural and urban areas, and sometimes further stratification by geographical region, race and
income group are added. Such stratification is also known as explicit stratification.
Stratification can also be done implicitly by means of a systematic sampling process. A
list of households are ranked or sorted according to some household characteristic. A random
starting point is selected and thereafter every jth observation is selected into the sample, with
the value of j depending on the size of the clusters and the total number of households that
will eventually be included in the sample. If, for example, households are sorted according to
income, selection of every jth observation will ensure that the final sample will contain
observations from across the entire income spectrum. Such a survey is then said to be
implicitly stratified by income.
The IES 2000 is explicitly stratified by the nine provinces and by location (urban and
rural) (SSA, 2002a), giving 18 explicit strata in total. Each PSU was also implicitly stratified
firstly by magisterial district or district council, and thereafter by average household income
© PROVIDE Project