Computing optimal sampling designs for two-stage studies



Stata Technical Bulletin

15


. clad Ioffinc Isize urban coastal, 11(0) reps(200)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I

Reps

Observed

Bias

Std. Err.

[957. Conf

. Interval]

Isize I

I
I

200

1.149846

.0554115

.2544479

.6480861

.7073701

.6859084

1.651606

1.689895

1.624102

(N)

(P)

(BC)

urban I

I
I

200

2.375166

.0128999

.3375226

1.709586

1.642076

1.677854

3.040746

3.120919

3.184893

(N)

(P)

(BC)

coastal I

I
I

200

1.287741

-.0094159

.2830439

.7295905

.7311435

.7339153

1.845891

1.863342

1.90661

(N)

(P)

(BC)

const I

I
I

200

6.443694

-.0810437

.6198413

5.221394

4.956254

5.371459

7.665994

7.557803

7.730506

(N)

(P)

(BC)

N = normal, P = percentile, BC = bias-corrected

The first line of output tells us that the original sample size is 1,581 and in the second line we learn that the algorithm for
estimation dropped one case from the sample. An important caveat to the pseudo ^-squared reported on the third line, is that
this is the reported statistic from the last iteration of the qreg command on the final sample size. It is not the pseudo ^-squared
for the original sample, but we have opted to report this statistic to provide some indication of how the model is performing.

In the example above, no sample design information is passed to clad and the program calls Stata’s bsample utility to
resample the data 200 times. In order to maintain the same sample size in each bootstrap resample, clad ignores observations
where the dependent variable is missing. The results from bsample are then passed to the bstat command to generate the
standard Stata bootstrap output. For more information about the normal, percentile, and bias-corrected percentile confidence
intervals, see bstrap in the Stata manuals. For an introduction to the bootstrap principle, see Efron and Tibshirani (1993). In
order to reproduce results from clad, it is necessary first to set the random number seed; see generate in the Stata reference
manuals for more information.

The reported standard errors above will be correct if the sample comes from a simple random draw. This is not the case
with the GLSS data, which was collected using a two-stage design. clad can generate bootstrap estimates of the standard errors
which are robust to the two-stage design by passing the information about the primary sampling unit (PSU) to clad. For example,
we correct the standard errors above for this aspect of the sample in the example below.

. clad Ioffinc Isize urban coastal, 11(0) reps(200) psu(clust)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I

Reps

Observed

Bias

Std. Err.

[957. Conf

. Interval]

Isize I

I
I

200

1.149846

.0916958

.395014

.3708959

.6573149

.6507832

1.928797

2.076703

2.053507

(N)

(P)

(BC)

urban I

I
I

200

2.375166

.0562143

.6152112

1.161996

1.285434

1.12299

3.588336

3.658858

3.495041

(N)

(P)

(BC)

---------+-.

coastal I

I
I

200

1.287741

.0386539

.5439033

.2151873

.2898641

.0728349

2.360294

2.466994

2.216781

(N)

(P)

(BC)

const I

I
I

200

6.443694

-.1804084

1.04149

4.389922

3.942665

4.440762

8.497466

8.130428

8.347237

(N)

(P)

(BC)

N = normal, P = percentile, BC = bias-corrected

It is worth noting that introducing information about the sample design only affects the estimates of the standard errors.



More intriguing information

1. The name is absent
2. The name is absent
3. Name Strategy: Its Existence and Implications
4. The name is absent
5. The name is absent
6. BODY LANGUAGE IS OF PARTICULAR IMPORTANCE IN LARGE GROUPS
7. TOWARD CULTURAL ONCOLOGY: THE EVOLUTIONARY INFORMATION DYNAMICS OF CANCER
8. Evolving robust and specialized car racing skills
9. The Mathematical Components of Engineering
10. Global Excess Liquidity and House Prices - A VAR Analysis for OECD Countries
11. Philosophical Perspectives on Trustworthiness and Open-mindedness as Professional Virtues for the Practice of Nursing: Implications for he Moral Education of Nurses
12. AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES
13. The name is absent
14. Konjunkturprognostiker unter Panik: Kommentar
15. The name is absent
16. Bridging Micro- and Macro-Analyses of the EU Sugar Program: Methods and Insights
17. Opciones de política económica en el Perú 2011-2015
18. Flatliners: Ideology and Rational Learning in the Diffusion of the Flat Tax
19. The name is absent
20. Forecasting Financial Crises and Contagion in Asia using Dynamic Factor Analysis