PROVIDE Project Technical Paper 2005:1
workers and consequently did not answer this section. There were also 40 observations in
domworkerh.dta for which no match could be found in general.dta.
February 2005
general & |
domworkerh | |
Freq. |
Percent |
Cum. |
1 | |
24134 |
91.75 |
--------- 91.75 |
2 | |
40 |
0.15 |
91.90 |
3 | |
2131 |
8.10 |
100.00 |
Total | 26305 100.00
While 5 of these 40 observations report zero expenditure, the remaining 35 observations
report expenditure ranging from R1,020 to R48,600, with an average of R10,195. The
tabulation of merge1b shows 38 observations in general.dta not found in homegrownh.dta.
One can again safely assume that these households did not partake in any home production
for home consumption. However, 4 observations were found in homegrownh.dta that were
not in general.dta. These households report zero expenditure on inputs, zero sales and very
low consumption of own produce and livestock (output appears below).
general & |
homegrownh | Freq. Percent Cum.
------------+-----------------------------------
1 | 38 0.14 0.14
2 | 4 0.02 0.16
3 | 26267 99.84 100.00
------------+-----------------------------------
Total | 26309 100.00
hhid v~inputs v~prodsale v~prodcons
v~livesale
0
0
0
0
v~livecons
3000
0
0
0
7353. 3.251e+12 0 0 248
7413. 4.061e+12 0 0 0
10924. 5.032e+12 0 0 45
11446. 5.072e+12 0 0 75
Finally, the merge between general.dta and personh.dta revealed that 46 observations
were only found in general.dta. Whereas with the previous merges this was not a problem
(one could simply assume that the relevant expenditures were zero) it is more problematic
here since demographic information (race, gender, age, province) and employment data are
now missing for 46 observations. This renders these 46 observations virtually unusable. Many
of these ‘mismatched’ observations are dropped from the sample at a later stage.
general & |
personh | Freq. Percent Cum.
------------+-----------------------------------
1 | 46 0.17 0.17
3 | 26263 99.83 100.00
------------+-----------------------------------
Total | 26309 100.00
3.2.5. Cleaning the data (cleanup.do)
After merging the datasets cleanup.do is run. As discussed in section 2.3 the IES 2000 dataset
is plagued by numerous data problems. Do-file cleanup.do aims to rectify some of the minor
ones, such as the simple adding-up problems. It also checks for consistency in the reported
37
© PROVIDE Project