Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
totals and recalculates them where necessary. Before any of the actual ‘cleaning up’ can start
the problem of missing values has to be investigated.

February 2005


Usually missing values are coded in Stata as a dot (full stop). A large number of the
variables in IES 2000, fortunately only on the expenditure side, contain very large numbers of
missing values. Missing values in a Stata dataset create various problems. Any arithmetic
operation on a missing value yields a missing value, which becomes problematic if, for
example, total expenditure is to be calculated. Closer inspection revealed that large numbers
of missing values only occurred in those variables that relate to optional questions. This
created the suspicion that these are not true missing values, but rather a result of incorrect
coding by Statistics South Africa. The following definitions are defined to clarify matters, i.e.
observations that are coded with a full stop in the IES 2000 can fall into one of the following
three categories:

Uncoded - Some questions in the IES 2000 questionnaire were optional. Optional sections
are preceded by a question that asks the respondent whether the expenses relating to that
section are relevant to the household. If they answer no they may skip the section. In
many instances Statistics South Africa coded expenses in these optional sections with
missing values when the section was skipped. These are defined as uncoded observations
and can be changed to zeroes.

Miscoded - In some instances the preceding question to the optional sections was
answered in the negative, but positive expenses were nevertheless reported in the optional
section following the question. In these instances it is assumed that the original question
was miscoded and should have been coded as ‘yes’. Consequently the information content
in the section is left as is.

('True) missing values - The remaining missing values relate to respondents who should
have answered a section given their response to the preceding question, but failed to do
so. These are therefore true missing values. It can be argued that some of these missing
values are a result of miscoding, i.e. that the preceding question should have been coded
as ‘no’. However, there is no basis on which such an assumption can be made, and
consequently these values have to be treated as missing.

All variables coded with a full stop were systematically analysed to determine in which
category they fall. Table 8 shows all the missing values (uncoded and true missing values) in
the IES 2000 database, as well as those that were miscoded. The numbers of missing values
reported in the original database is shown in column C. Only expense categories that

38

© PROVIDE Project



More intriguing information

1. Informal Labour and Credit Markets: A Survey.
2. Detecting Multiple Breaks in Financial Market Volatility Dynamics
3. The name is absent
4. Errors in recorded security prices and the turn-of-the year effect
5. Effects of a Sport Education Intervention on Students’ Motivational Responses in Physical Education
6. The name is absent
7. The constitution and evolution of the stars
8. The name is absent
9. Testing Hypotheses in an I(2) Model with Applications to the Persistent Long Swings in the Dmk/$ Rate
10. The name is absent
11. Neighborhood Effects, Public Housing and Unemployment in France
12. The name is absent
13. The name is absent
14. NEW DEVELOPMENTS IN FARM PRICE AND INCOME POLICY PROGRAMS: PART I. SITUATION AND PROBLEM
15. Natural Resources: Curse or Blessing?
16. The Evolution
17. A Multimodal Framework for Computer Mediated Learning: The Reshaping of Curriculum Knowledge and Learning
18. Retirement and the Poverty of the Elderly in Portugal
19. Should Local Public Employment Services be Merged with the Local Social Benefit Administrations?
20. The Clustering of Financial Services in London*