Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
2.2.6. Survey estimation in Stata

February 2005


Complex surveys typically have three characteristics: (1) the survey weights are inverse
probability weights; (2) the sample is drawn from clusters rather than from the entire
population; and (3) the data are stratified. Sampling weights, whether added to the data ex-
post or designed beforehand, have to be used to adjust for differing selection probabilities
between observations. Failure to use weights will result in biased estimates. When the sample
is drawn from clusters, observations are not independent. Many statistical estimators assume
independence and use of these estimators without making the correct adjustments will result
in standard errors being too small. Finally, since stratification can reduce estimates of
standard errors, it is also necessary to adjust for it.

Consider the following example.10 Suppose we wish to estimate the average total income
(variable
totinc) of South African households. We can use the confidence interval (command
ci) to show the mean, standard error and the 95% confidence interval. In the Stata output
table below ‘unweighted’ data are used. This effectively means that the mean is the sample
mean, which is at its best a crude estimate of the population mean.

. ci totinc

Variable | Obs          Mean    Std. Err.        [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc |   26177     39186.44    638.5181        37934.91    40437.97

If we use weights Stata will compute a more accurate estimate of the population mean.
Since
pweight does not work with the ci command, we allow Stata to choose the type of
weight.
11 However, if we do wish to use the pweight option, we have to make use of the
svymean command. Initially the svyset pweight wgtselect option is set, i.e. clustering and
stratification is ignored. The output of these two examples are listed below:

. ci totinc [weight = wgtselect]

(analytic weights assumed)

Variable | Obs          Mean    Std. Err.        [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc |   26177     42793.12    653.4643        41512.29    44073.95

. svymean totinc

Survey mean estimation

pweight: wgtselect                                Number of obs    =     26177

Strata: <one>                                     Number of strata =         1

10 The ies2000h.dta database is used for the example (see section 3). The weight variable wgtselect is used. (The
current version of the
ies2000h.dta has changed slightly since these examples were run - KP
15/02/2005).

11 Alternatively, we can specify frequency weights (fweight), but then the truncated version of the weight,
fwgtselect, has to be used since fweight only allows integer weights. This will give similar means and
standard deviations (see section 2.2.5).

8
© PROVIDE Project



More intriguing information

1. Financial Market Volatility and Primary Placements
2. The name is absent
3. Dynamic Explanations of Industry Structure and Performance
4. The name is absent
5. Testing Panel Data Regression Models with Spatial Error Correlation
6. Short- and long-term experience in pulmonary vein segmental ostial ablation for paroxysmal atrial fibrillation*
7. Standards behaviours face to innovation of the entrepreneurships of Beira Interior
8. The resources and strategies that 10-11 year old boys use to construct masculinities in the school setting
9. The name is absent
10. The name is absent
11. Towards a Mirror System for the Development of Socially-Mediated Skills
12. The name is absent
13. Social Irresponsibility in Management
14. The name is absent
15. APPLYING BIOSOLIDS: ISSUES FOR VIRGINIA AGRICULTURE
16. Fiscal Reform and Monetary Union in West Africa
17. Managing Human Resources in Higher Education: The Implications of a Diversifying Workforce
18. How we might be able to understand the brain
19. The Effects of Attendance on Academic Performance: Panel Data Evidence for Introductory Microeconomics
20. Learning and Endogenous Business Cycles in a Standard Growth Model