Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
data, including income and expenditure summary tables. This file is the largest of all the data
files and contains the bulk of the information collected for the IES 2000.

February 2005


2.1.2. LFS 2000:2

The LFS 2000:2 also comes with a metadata file explaining the sampling framework and a list
of the files that are contained in the LFS 2000:2 dataset. The sample design of the LFS 2000:2
is the same as that of the IES 2000. Data files include
person.txt, worker.txt and house.txt.5
The file person.txt, as its namesake in the IES 2000, contains all the person-level information
of household members, while
worker.txt contains employment data of all household members
of working age (15 - 65). Finally,
house.txt contains general household variables. A fourth
data file,
stratum_psu.txt contains variables identifying the primary sampling units (PSUs)
and the strata used in the survey (see section 2.2). When merged with the IES 2000 only data
contained in
person.txt and worker.txt are used.

2.2. Sampling and weighting6

2.2.1. Survey design

The design of a survey has important implications for the way in which data analysis should
be undertaken. Often budgets and time constraints dictate the sampling and data collection
methods used, and ingenious ways have to be sought to reduce data collection costs without
jeopardising the quality and ‘representativity’ of the data. Ideally the sampling design should
match the type of survey being conducted. Deaton (1997:17) suggests that each different
application of a survey mandates a different survey design - “
precision for one variable is
imprecision for another
”. However, given budgetary constraints “it makes no sense to design
a survey for each
”. The IES 2000, for example, was designed specifically for calculations of
the CPI, but understandably so, has become a general-purpose household survey with a range
of applications.

A typical households survey selects households randomly from a list of all households in
the population known as the sampling frame. The sampling frame is often the most recent
Census. In the case of the IES 2000 and LFS 2000:2 the South African Population Census of
1996 was used as sampling frame (SSA, 1998). A Census contains a list of all households and
household members. The most common way of choosing representative households from the
sample frame is based on a two-stage selection process. At the first stage clusters or groups of
households are selected randomly from the population. These clusters are often based on
existing geographical boundaries. Next, the census data are used to compile a list of all

5 To avoid confusion these files were renamed lfsperson.txt, lfsworker.txt and lfshouse.txt.

6 This section draws mainly on Deaton (1997) unless otherwise cited.

3

© PROVIDE Project



More intriguing information

1. Robust Econometrics
2. The name is absent
3. An Interview with Thomas J. Sargent
4. The name is absent
5. Impacts of Tourism and Fiscal Expenditure on Remote Islands in Japan: A Panel Data Analysis
6. Passing the burden: corporate tax incidence in open economies
7. Regional specialisation in a transition country - Hungary
8. Trade Openness and Volatility
9. Graphical Data Representation in Bankruptcy Analysis
10. Strategic Investment and Market Integration
11. BARRIERS TO EFFICIENCY AND THE PRIVATIZATION OF TOWNSHIP-VILLAGE ENTERPRISES
12. RETAIL SALES: DO THEY MEAN REDUCED EXPENDITURES? GERMAN GROCERY EVIDENCE
13. Developmental Robots - A New Paradigm
14. Une Gestion des ressources humaines à l'interface des organisations : vers une GRH territoriale ?
15. Fiscal Insurance and Debt Management in OECD Economies
16. The name is absent
17. Detecting Multiple Breaks in Financial Market Volatility Dynamics
18. International Financial Integration*
19. From music student to professional: the process of transition
20. The Effects of Attendance on Academic Performance: Panel Data Evidence for Introductory Microeconomics