Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
workers and consequently did not answer this section. There were also 40 observations in
domworkerh.dta for which no match could be found in general.dta.

February 2005


general & |

domworkerh |

Freq.

Percent

Cum.

1 |

24134

91.75

---------

91.75

2 |

40

0.15

91.90

3 |
_____________I____

2131

8.10

100.00

Total |       26305      100.00

While 5 of these 40 observations report zero expenditure, the remaining 35 observations
report expenditure ranging from R1,020 to R48,600, with an average of R10,195. The
tabulation of
merge1b shows 38 observations in general.dta not found in homegrownh.dta.
One can again safely assume that these households did not partake in any home production
for home consumption. However, 4 observations were found in
homegrownh.dta that were
not in
general.dta. These households report zero expenditure on inputs, zero sales and very
low consumption of own produce and livestock (output appears below).

general & |

homegrownh |       Freq.      Percent         Cum.

------------+-----------------------------------

1 |           38          0.14          0.14

2  |             4          0.02          0.16

3 |       26267        99.84       100.00

------------+-----------------------------------

Total |       26309      100.00

hhid v~inputs v~prodsale v~prodcons

v~livesale

0

0

0

0


v~livecons

3000

0

0

0


7353.  3.251e+12           0           0         248

7413.  4.061e+12            0            0            0

10924.  5.032e+12           0           0          45

11446.  5.072e+12           0           0          75

Finally, the merge between general.dta and personh.dta revealed that 46 observations
were only found in
general.dta. Whereas with the previous merges this was not a problem
(one could simply assume that the relevant expenditures were zero) it is more problematic
here since demographic information (race, gender, age, province) and employment data are
now missing for 46 observations. This renders these 46 observations virtually unusable. Many
of these ‘mismatched’ observations are dropped from the sample at a later stage.

general & |

personh |       Freq.      Percent         Cum.

------------+-----------------------------------

1 |           46          0.17          0.17

3 |       26263        99.83       100.00

------------+-----------------------------------
Total |       26309      100.00

3.2.5. Cleaning the data (cleanup.do)

After merging the datasets cleanup.do is run. As discussed in section 2.3 the IES 2000 dataset
is plagued by numerous data problems. Do-file
cleanup.do aims to rectify some of the minor
ones, such as the simple adding-up problems. It also checks for consistency in the reported

37

© PROVIDE Project



More intriguing information

1. The name is absent
2. Program Semantics and Classical Logic
3. The name is absent
4. THE ECONOMICS OF COMPETITION IN HEALTH INSURANCE- THE IRISH CASE STUDY.
5. The Integration Order of Vector Autoregressive Processes
6. The name is absent
7. The name is absent
8. BARRIERS TO EFFICIENCY AND THE PRIVATIZATION OF TOWNSHIP-VILLAGE ENTERPRISES
9. The Global Dimension to Fiscal Sustainability
10. Growth and Technological Leadership in US Industries: A Spatial Econometric Analysis at the State Level, 1963-1997