Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1

February 2005


When the cluster option is activated the standard error increases and the confidence
interval widens compared to the previous example where clustering was ignored. Also,
deff
increases substantially due the effect of clustering on the precision, i.e. the variance
increases.
13 When stratification is also taken into account the standard deviation declines and
the confidence interval becomes narrower in line with expectations (see section 2.2.3).
However,
deff is still substantially higher than one.

In conclusion it can be said that the svy-commands are useful and indeed important to use
when the distribution of a variable is of concern. Income distribution data, for example, will
only be reliable when weights, clustering and stratification are specified. Test statistics will
also be more accurate. However, if the only concern is finding the means or total income or
expenditure (mean multiplied by the number of observations), normal analytic or frequency
weights will suffice.

2.3. Merging the IES 2000 and the LFS 2000:2

2.3.1. Overview

The IES 2000, unlike its predecessor, the IES 1995, contains enough information on
employment activities of household members to determine their occupation codes, industry
codes and wages or salaries. Employment data also appears in the LFS 2000:2 in somewhat
more detail. Therefore, depending on the information requirements, it may be unnecessary to
merge the two files. However, recently education data, which is only available in the LFS
2000:2, was required for the formation of new household groups for the PROVIDE SAM. As
a consequence it was necessary to merge these files, and hence the LFS 2000:2 employment
data became available within the IES 2000 in any event. Furthermore, since the LFS 2000:2 is
designed specifically to gather information on employment and related activities of the
population, the quality of the data is arguably better. For example, the IES 2000 only asks a
single question to determine a person’s occupation or industry code. In contrast, occupation
and industry codes in the LFS 2000:2 are based on a series of questions. Consequently there
are fewer ‘unspecified’ factors and industries in the LFS 2000:2 (see section 2.3.2).

Various researchers have encountered difficulties when merging the IES 2000 and LFS
2000:2 data files. Van der Berg
et al. (2003a) find that when merging these datasets there are
a substantial number of observations for which age, gender and race variables do not match.

13 Incidentally, deff will equal one if none of pweight, psu and strata were specified, since the variance is then
simply equal to the sample variance as computed before in
ci totinc. When only pweight is specified deff
increases to 1.23, which indicates that weighting (in this instance) increases the variability. A tabulation
of average weights by income deciles will reveal that the weights attached to high-income households is
higher than for low-income households. Thus, when weights are specified the inequality in the
distribution of income will increase since more weight is now attached to high-income households in the
sample.

10

© PROVIDE Project



More intriguing information

1. The name is absent
2. The name is absent
3. The name is absent
4. Better policy analysis with better data. Constructing a Social Accounting Matrix from the European System of National Accounts.
5. The name is absent
6. PROFITABILITY OF ALFALFA HAY STORAGE USING PROBABILITIES: AN EXTENSION APPROACH
7. 5th and 8th grade pupils’ and teachers’ perceptions of the relationships between teaching methods, classroom ethos, and positive affective attitudes towards learning mathematics in Japan
8. Valuing Access to our Public Lands: A Unique Public Good Pricing Experiment
9. The name is absent
10. he Virtual Playground: an Educational Virtual Reality Environment for Evaluating Interactivity and Conceptual Learning
11. Spatial Aggregation and Weather Risk Management
12. Endogenous Heterogeneity in Strategic Models: Symmetry-breaking via Strategic Substitutes and Nonconcavities
13. The name is absent
14. The bank lending channel of monetary policy: identification and estimation using Portuguese micro bank data
15. The name is absent
16. Database Search Strategies for Proteomic Data Sets Generated by Electron Capture Dissociation Mass Spectrometry
17. Review of “From Political Economy to Economics: Method, the Social and Historical Evolution of Economic Theory”
18. The name is absent
19. The name is absent
20. The name is absent