PROVIDE Project Technical Paper 2005:1
in the database, provided that these levels seemed realistic.29 The entire part 6 is included as a
separate do-file named hphcdrop.do.
February 2005
Finally, in part 7, household-level variables were created for value of produce and
livestock sold and consumed (valprodcons, valprodsale, vallivecons, valliveprod). These
values, together with the household-level input costs (P2205TOT) are saved as
homegrownh.dta, which is subsequently merged with the other household-level files.
3.2.3. Person-level data file (person.do)
The next do-file is person.do. This do-file opens person.dta, which contains all the
information about each individual in each household, such as employment data and general
demographic information. Variable race is slightly problematic since about 159 individuals
report race as ‘unspecified’ (code 5 or 9). Since the SAM household and factor accounts are
all disaggregated along racial lines, information about race is important. One option is to have
a separate racial category labelled ‘undefined’, but this is not justifiable given that only 0.15%
of the 104,153 individuals in person.dta do not specify their race. Another option is to drop
observations with unspecified race from the sample, but this is also undesirable if it is
possible to work around the problem.
Closer inspection revealed that some of the ‘unspecified’ individuals live in households
where the head of the household did report his or her racial group. These individuals’ race
was changed to that of the head of the household. If the head of the household’s race is
unspecified, it is changed to that of the second household member (if available). After this
adjustment 134 individuals remain unspecified. These people live in 39 households in which
all members are unspecified. Unfortunately the whole process only ‘saves’ 25 individuals and
5 households.
Next, the do-file adds labels to variables and creates a few new ones, such as variable
region, which maps the province variable (prov) to the four SAM regions. New variables are
also created for the number of children (variable K), the number of adults (variable A), the
total household size (variable H)30, and the adult equivalent household size (variable E)31.
29 In some cases home per capita consumption levels were extremely high. One explanation for this is that own
produce (such as maize) is possibly used for livestock feed, in which case it should have been reported as
an input cost. Consumption levels were truncated at certain levels when they appeared unrealistically
high.
30 Although the original person.dta comes complete with a household size variable this variable appears to be
incorrect. Consequently it is re-calculated here.
31 The adult equivalence scale adjusts the actual household size to take into account differences in size and
structure of households. The adjusted household size variable E is constructed using the formula
E=(A+αK)θ . May (1995, cited in Woolard and Leibbrandt, 2001) suggest that α = 0.5 and θ = 0.9 are
plausible values for South Africa. Some sensitivity analysis around these values will be done at a later
stage.
34
© PROVIDE Project