PROVIDE Project Technical Paper 2005:1
in the database, provided that these levels seemed realistic.29 The entire part 6 is included as a
separate do-file named hphcdrop.do.
February 2005
Finally, in part 7, household-level variables were created for value of produce and
livestock sold and consumed (valprodcons, valprodsale, vallivecons, valliveprod). These
values, together with the household-level input costs (P2205TOT) are saved as
homegrownh.dta, which is subsequently merged with the other household-level files.
3.2.3. Person-level data file (person.do)
The next do-file is person.do. This do-file opens person.dta, which contains all the
information about each individual in each household, such as employment data and general
demographic information. Variable race is slightly problematic since about 159 individuals
report race as ‘unspecified’ (code 5 or 9). Since the SAM household and factor accounts are
all disaggregated along racial lines, information about race is important. One option is to have
a separate racial category labelled ‘undefined’, but this is not justifiable given that only 0.15%
of the 104,153 individuals in person.dta do not specify their race. Another option is to drop
observations with unspecified race from the sample, but this is also undesirable if it is
possible to work around the problem.
Closer inspection revealed that some of the ‘unspecified’ individuals live in households
where the head of the household did report his or her racial group. These individuals’ race
was changed to that of the head of the household. If the head of the household’s race is
unspecified, it is changed to that of the second household member (if available). After this
adjustment 134 individuals remain unspecified. These people live in 39 households in which
all members are unspecified. Unfortunately the whole process only ‘saves’ 25 individuals and
5 households.
Next, the do-file adds labels to variables and creates a few new ones, such as variable
region, which maps the province variable (prov) to the four SAM regions. New variables are
also created for the number of children (variable K), the number of adults (variable A), the
total household size (variable H)30, and the adult equivalent household size (variable E)31.
29 In some cases home per capita consumption levels were extremely high. One explanation for this is that own
produce (such as maize) is possibly used for livestock feed, in which case it should have been reported as
an input cost. Consumption levels were truncated at certain levels when they appeared unrealistically
high.
30 Although the original person.dta comes complete with a household size variable this variable appears to be
incorrect. Consequently it is re-calculated here.
31 The adult equivalence scale adjusts the actual household size to take into account differences in size and
structure of households. The adjusted household size variable E is constructed using the formula
E=(A+αK)θ . May (1995, cited in Woolard and Leibbrandt, 2001) suggest that α = 0.5 and θ = 0.9 are
plausible values for South Africa. Some sensitivity analysis around these values will be done at a later
stage.
34
© PROVIDE Project
More intriguing information
1. Macro-regional evaluation of the Structural Funds using the HERMIN modelling framework2. Target Acquisition in Multiscale Electronic Worlds
3. The name is absent
4. Strengthening civil society from the outside? Donor driven consultation and participation processes in Poverty Reduction Strategies (PRSP): the Bolivian case
5. The name is absent
6. DISCUSSION: ASSESSING STRUCTURAL CHANGE IN THE DEMAND FOR FOOD COMMODITIES
7. Response speeds of direct and securitized real estate to shocks in the fundamentals
8. Regional dynamics in mountain areas and the need for integrated policies
9. Apprenticeships in the UK: from the industrial-relation via market-led and social inclusion models
10. Brauchen wir ein Konjunkturprogramm?: Kommentar