Creating a 2000 IES-LFS Database in Stata



PROVIDE Project Technical Paper 2005:1
2.2.6. Survey estimation in Stata

February 2005


Complex surveys typically have three characteristics: (1) the survey weights are inverse
probability weights; (2) the sample is drawn from clusters rather than from the entire
population; and (3) the data are stratified. Sampling weights, whether added to the data ex-
post or designed beforehand, have to be used to adjust for differing selection probabilities
between observations. Failure to use weights will result in biased estimates. When the sample
is drawn from clusters, observations are not independent. Many statistical estimators assume
independence and use of these estimators without making the correct adjustments will result
in standard errors being too small. Finally, since stratification can reduce estimates of
standard errors, it is also necessary to adjust for it.

Consider the following example.10 Suppose we wish to estimate the average total income
(variable
totinc) of South African households. We can use the confidence interval (command
ci) to show the mean, standard error and the 95% confidence interval. In the Stata output
table below ‘unweighted’ data are used. This effectively means that the mean is the sample
mean, which is at its best a crude estimate of the population mean.

. ci totinc

Variable | Obs          Mean    Std. Err.        [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc |   26177     39186.44    638.5181        37934.91    40437.97

If we use weights Stata will compute a more accurate estimate of the population mean.
Since
pweight does not work with the ci command, we allow Stata to choose the type of
weight.
11 However, if we do wish to use the pweight option, we have to make use of the
svymean command. Initially the svyset pweight wgtselect option is set, i.e. clustering and
stratification is ignored. The output of these two examples are listed below:

. ci totinc [weight = wgtselect]

(analytic weights assumed)

Variable | Obs          Mean    Std. Err.        [95% Conf. Interval]

-------------+-------------------------------------------------------------
totinc |   26177     42793.12    653.4643        41512.29    44073.95

. svymean totinc

Survey mean estimation

pweight: wgtselect                                Number of obs    =     26177

Strata: <one>                                     Number of strata =         1

10 The ies2000h.dta database is used for the example (see section 3). The weight variable wgtselect is used. (The
current version of the
ies2000h.dta has changed slightly since these examples were run - KP
15/02/2005).

11 Alternatively, we can specify frequency weights (fweight), but then the truncated version of the weight,
fwgtselect, has to be used since fweight only allows integer weights. This will give similar means and
standard deviations (see section 2.2.5).

8
© PROVIDE Project



More intriguing information

1. Improving behaviour classification consistency: a technique from biological taxonomy
2. The name is absent
3. Changing spatial planning systems and the role of the regional government level; Comparing the Netherlands, Flanders and England
4. The name is absent
5. The Economics of Uncovered Interest Parity Condition for Emerging Markets: A Survey
6. WP RR 17 - Industrial relations in the transport sector in the Netherlands
7. The name is absent
8. The name is absent
9. THE CO-EVOLUTION OF MATTER AND CONSCIOUSNESS1
10. The Role of Evidence in Establishing Trust in Repositories