PROVIDE Project Technical Paper 2005:1
data, including income and expenditure summary tables. This file is the largest of all the data
files and contains the bulk of the information collected for the IES 2000.
February 2005
2.1.2. LFS 2000:2
The LFS 2000:2 also comes with a metadata file explaining the sampling framework and a list
of the files that are contained in the LFS 2000:2 dataset. The sample design of the LFS 2000:2
is the same as that of the IES 2000. Data files include person.txt, worker.txt and house.txt.5
The file person.txt, as its namesake in the IES 2000, contains all the person-level information
of household members, while worker.txt contains employment data of all household members
of working age (15 - 65). Finally, house.txt contains general household variables. A fourth
data file, stratum_psu.txt contains variables identifying the primary sampling units (PSUs)
and the strata used in the survey (see section 2.2). When merged with the IES 2000 only data
contained in person.txt and worker.txt are used.
2.2. Sampling and weighting6
2.2.1. Survey design
The design of a survey has important implications for the way in which data analysis should
be undertaken. Often budgets and time constraints dictate the sampling and data collection
methods used, and ingenious ways have to be sought to reduce data collection costs without
jeopardising the quality and ‘representativity’ of the data. Ideally the sampling design should
match the type of survey being conducted. Deaton (1997:17) suggests that each different
application of a survey mandates a different survey design - “precision for one variable is
imprecision for another”. However, given budgetary constraints “it makes no sense to design
a survey for each”. The IES 2000, for example, was designed specifically for calculations of
the CPI, but understandably so, has become a general-purpose household survey with a range
of applications.
A typical households survey selects households randomly from a list of all households in
the population known as the sampling frame. The sampling frame is often the most recent
Census. In the case of the IES 2000 and LFS 2000:2 the South African Population Census of
1996 was used as sampling frame (SSA, 1998). A Census contains a list of all households and
household members. The most common way of choosing representative households from the
sample frame is based on a two-stage selection process. At the first stage clusters or groups of
households are selected randomly from the population. These clusters are often based on
existing geographical boundaries. Next, the census data are used to compile a list of all
5 To avoid confusion these files were renamed lfsperson.txt, lfsworker.txt and lfshouse.txt.
6 This section draws mainly on Deaton (1997) unless otherwise cited.
3
© PROVIDE Project