Computing optimal sampling designs for two-stage studies

Stata Technical Bulletin

. clad Ioffinc Isize urban coastal, 11(0) reps(200)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I	Reps	Observed	Bias	Std. Err.	[957. Conf	. Interval]
Isize I I I	200	1.149846	.0554115	.2544479	.6480861 .7073701 .6859084	1.651606 1.689895 1.624102	— (N) (P) (BC)
urban I I I	200	2.375166	.0128999	.3375226	1.709586 1.642076 1.677854	3.040746 3.120919 3.184893	— (N) (P) (BC)
coastal I I I	200	1.287741	-.0094159	.2830439	.7295905 .7311435 .7339153	1.845891 1.863342 1.90661	— (N) (P) (BC)
const I I I	200	6.443694	-.0810437	.6198413	5.221394 4.956254 5.371459	7.665994 7.557803 7.730506	— (N) (P) (BC)

N = normal, P = percentile, BC = bias-corrected

The first line of output tells us that the original sample size is 1,581 and in the second line we learn that the algorithm for
estimation dropped one case from the sample. An important caveat to the pseudo ^-squared reported on the third line, is that
this is the reported statistic from the last iteration of the qreg command on the final sample size. It is not the pseudo ^-squared
for the original sample, but we have opted to report this statistic to provide some indication of how the model is performing.

In the example above, no sample design information is passed to clad and the program calls Stata’s bsample utility to
resample the data 200 times. In order to maintain the same sample size in each bootstrap resample, clad ignores observations
where the dependent variable is missing. The results from bsample are then passed to the bstat command to generate the
standard Stata bootstrap output. For more information about the normal, percentile, and bias-corrected percentile confidence
intervals, see bstrap in the Stata manuals. For an introduction to the bootstrap principle, see Efron and Tibshirani (1993). In
order to reproduce results from clad, it is necessary first to set the random number seed; see generate in the Stata reference
manuals for more information.

The reported standard errors above will be correct if the sample comes from a simple random draw. This is not the case
with the GLSS data, which was collected using a two-stage design. clad can generate bootstrap estimates of the standard errors
which are robust to the two-stage design by passing the information about the primary sampling unit (PSU) to clad. For example,
we correct the standard errors above for this aspect of the sample in the example below.

. clad Ioffinc Isize urban coastal, 11(0) reps(200) psu(clust)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I	Reps	Observed	Bias	Std. Err.	[957. Conf	. Interval]
Isize I I I	200	1.149846	.0916958	.395014	.3708959 .6573149 .6507832	1.928797 2.076703 2.053507	— (N) (P) (BC)
urban I I I	200	2.375166	.0562143	.6152112	1.161996 1.285434 1.12299	3.588336 3.658858 3.495041	— (N) (P) (BC)
---------₊-. coastal I I I	200	1.287741	.0386539	.5439033	.2151873 .2898641 .0728349	2.360294 2.466994 2.216781	— (N) (P) (BC)
const I I I	200	6.443694	-.1804084	1.04149	4.389922 3.942665 4.440762	8.497466 8.130428 8.347237	— (N) (P) (BC)

N = normal, P = percentile, BC = bias-corrected

It is worth noting that introducing information about the sample design only affects the estimates of the standard errors.

More intriguing information

1. The name is absent
2. Database Search Strategies for Proteomic Data Sets Generated by Electron Capture Dissociation Mass Spectrometry
3. Endogenous Determination of FDI Growth and Economic Growth:The OECD Case
4. Cultural Neuroeconomics of Intertemporal Choice
5. Non Linear Contracting and Endogenous Buyer Power between Manufacturers and Retailers: Empirical Evidence on Food Retailing in France
6. The name is absent
7. An institutional analysis of sasi laut in Maluku, Indonesia
8. Prevalence of exclusive breastfeeding and its determinants in first 6 months of life: A prospective study
9. The Shepherd Sinfonia
10. The name is absent