PROVIDE Project Technical Paper 2005:1
February 2005
3.1. Reading in the data ( readin.do )
The raw IES 2000 data is supplied by Statistics South Africa in a series of ASCII text files.
These fixed-width files are read into Stata using dictionary files specifying the location
(column number) and length of each variable as it appears in each row of the ASCII files. The
do-file readin.do calls up all the dictionary files. The IES 2000 ASCII files are converted to
Stata files and saved as person.dta, personwgt.dta, generalorig.dta, generalwgt.dta,
domworker.dta and homegrown.dta. 22, 23 These files are merged at a later stage to form
person- and household-level IES 2000 files. The LFS 2000:2 ASCII files are also converted to
Stata files and saved as lfsperson.dta, which is merged with the data from worker.txt to form
lfs2000_2p.dta, and lfshouse.dta, which is merged with the data from stratum_psu.txt to form
lfs2000_2h.dta. Finally, lfs2000_2p.dta and lfs2000_2h are merged to form a file called
lfs2000_2.dta, which contains person- and household-level LFS 2000:2 data.
3.2. Forming a household-level IES 2000 dataset ( ies2000h.do )
The main aim of do-file ies2000h.do is to create the household-level file ies2000h.dta. It
starts by merging general.dta with domworkerh.dta, homegrownh.dta and personh.dta. Four
do-files are called up within ies2000h.do in order to create or prepare these data files for
merging.
3.2.1. Domestic workers (domworker.do)
Unlike the other household-level data files, the original file domworker.dta does not
necessarily only contain a single entry per household. If a certain household has more than
one domestic worker a new entry with the same household identification number (variable
hhid) is added to the database. It is therefore necessary to create a household-level version of
this file where each entry or observation reports the total expense for all domestic workers
employed by the household. This avoids double counting when merging files. The following
command adds up domestic worker expenses for observations with the same hhid number.24
for var P*: by hhid, sort: egen Xh = sum(X)
22 To save computing time this do-file can be skipped by placing an asterisk at the beginning of the command
line do readin.do, provided that the various *.dta files already exist in the relevant folder.
23 Originally only four ASCII files are supplied with the IES 2000 data. Two new files (personwgt.dta and
generalwgt.dta ) were obtained from Ingrid Woolard (HSRC). These files contain newly released person -
level and household-level weights for the IES 2000. At present they are not ‘official’ yet and cannot be
used. Also note that general.txt is now read in and saved in Stata as generalorig.dta.
24 Note that P* refers to all the variables starting with P-, i.e. the expenditure variables in domworker.dta.
31
© PROVIDE Project