Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
totals and recalculates them where necessary. Before any of the actual ‘cleaning up’ can start
the problem of missing values has to be investigated.

February 2005


Usually missing values are coded in Stata as a dot (full stop). A large number of the
variables in IES 2000, fortunately only on the expenditure side, contain very large numbers of
missing values. Missing values in a Stata dataset create various problems. Any arithmetic
operation on a missing value yields a missing value, which becomes problematic if, for
example, total expenditure is to be calculated. Closer inspection revealed that large numbers
of missing values only occurred in those variables that relate to optional questions. This
created the suspicion that these are not true missing values, but rather a result of incorrect
coding by Statistics South Africa. The following definitions are defined to clarify matters, i.e.
observations that are coded with a full stop in the IES 2000 can fall into one of the following
three categories:

Uncoded - Some questions in the IES 2000 questionnaire were optional. Optional sections
are preceded by a question that asks the respondent whether the expenses relating to that
section are relevant to the household. If they answer no they may skip the section. In
many instances Statistics South Africa coded expenses in these optional sections with
missing values when the section was skipped. These are defined as uncoded observations
and can be changed to zeroes.

Miscoded - In some instances the preceding question to the optional sections was
answered in the negative, but positive expenses were nevertheless reported in the optional
section following the question. In these instances it is assumed that the original question
was miscoded and should have been coded as ‘yes’. Consequently the information content
in the section is left as is.

('True) missing values - The remaining missing values relate to respondents who should
have answered a section given their response to the preceding question, but failed to do
so. These are therefore true missing values. It can be argued that some of these missing
values are a result of miscoding, i.e. that the preceding question should have been coded
as ‘no’. However, there is no basis on which such an assumption can be made, and
consequently these values have to be treated as missing.

All variables coded with a full stop were systematically analysed to determine in which
category they fall. Table 8 shows all the missing values (uncoded and true missing values) in
the IES 2000 database, as well as those that were miscoded. The numbers of missing values
reported in the original database is shown in column C. Only expense categories that

38

© PROVIDE Project



More intriguing information

1. A Duality Approach to Testing the Economic Behaviour of Dairy-Marketing Co-operatives: The Case of Ireland
2. The name is absent
3. The name is absent
4. Survey of Literature on Covered and Uncovered Interest Parities
5. Les freins culturels à l'adoption des IFRS en Europe : une analyse du cas français
6. Notes on an Endogenous Growth Model with two Capital Stocks II: The Stochastic Case
7. The demand for urban transport: An application of discrete choice model for Cadiz
8. The name is absent
9. Happiness in Eastern Europe
10. Environmental Regulation, Market Power and Price Discrimination in the Agricultural Chemical Industry