Creating a 2000 IES-LFS Database in Stata

PROVIDE Project Technical Paper 2005:1
2.2.6. Survey estimation in Stata

February 2005

Complex surveys typically have three characteristics: (1) the survey weights are inverse
probability weights; (2) the sample is drawn from clusters rather than from the entire
population; and (3) the data are stratified. Sampling weights, whether added to the data ex-
post or designed beforehand, have to be used to adjust for differing selection probabilities
between observations. Failure to use weights will result in biased estimates. When the sample
is drawn from clusters, observations are not independent. Many statistical estimators assume
independence and use of these estimators without making the correct adjustments will result
in standard errors being too small. Finally, since stratification can reduce estimates of
standard errors, it is also necessary to adjust for it.

Consider the following example.¹⁰ Suppose we wish to estimate the average total income
(variable totinc) of South African households. We can use the confidence interval (command
ci) to show the mean, standard error and the 95% confidence interval. In the Stata output
table below ‘unweighted’ data are used. This effectively means that the mean is the sample
mean, which is at its best a crude estimate of the population mean.

. ci totinc

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc | 26177 39186.44 638.5181 37934.91 40437.97

If we use weights Stata will compute a more accurate estimate of the population mean.
Since pweight does not work with the ci command, we allow Stata to choose the type of
weight.¹¹ However, if we do wish to use the pweight option, we have to make use of the
svymean command. Initially the svyset pweight wgtselect option is set, i.e. clustering and
stratification is ignored. The output of these two examples are listed below:

. ci totinc [weight = wgtselect]

(analytic weights assumed)

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc | 26177 42793.12 653.4643 41512.29 44073.95

. svymean totinc

Survey mean estimation

pweight: wgtselect Number of obs = 26177

Strata: <one> Number of strata = 1

¹⁰ The ies2000h.dta database is used for the example (see section 3). The weight variable wgtselect is used. (The
current version of the ies2000h.dta has changed slightly since these examples were run - KP
15/02/2005).

¹¹ Alternatively, we can specify frequency weights (fweight), but then the truncated version of the weight,
fwgtselect, has to be used since fweight only allows integer weights. This will give similar means and
standard deviations (see section 2.2.5).

More intriguing information

1. Parent child interaction in Nigerian families: conversation analysis, context and culture
2. Impacts of Tourism and Fiscal Expenditure on Remote Islands in Japan: A Panel Data Analysis
3. Perfect Regular Equilibrium
4. Artificial neural networks as models of stimulus control*
5. The name is absent
6. Campanile Orchestra
7. Modelling the Effects of Public Support to Small Firms in the UK - Paradise Gained?
8. DISCUSSION: ASSESSING STRUCTURAL CHANGE IN THE DEMAND FOR FOOD COMMODITIES
9. Macroeconomic Interdependence in a Two-Country DSGE Model under Diverging Interest-Rate Rules
10. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach