Computing optimal sampling designs for two-stage studies



Stata Technical Bulletin

15


. clad Ioffinc Isize urban coastal, 11(0) reps(200)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I

Reps

Observed

Bias

Std. Err.

[957. Conf

. Interval]

Isize I

I
I

200

1.149846

.0554115

.2544479

.6480861

.7073701

.6859084

1.651606

1.689895

1.624102

(N)

(P)

(BC)

urban I

I
I

200

2.375166

.0128999

.3375226

1.709586

1.642076

1.677854

3.040746

3.120919

3.184893

(N)

(P)

(BC)

coastal I

I
I

200

1.287741

-.0094159

.2830439

.7295905

.7311435

.7339153

1.845891

1.863342

1.90661

(N)

(P)

(BC)

const I

I
I

200

6.443694

-.0810437

.6198413

5.221394

4.956254

5.371459

7.665994

7.557803

7.730506

(N)

(P)

(BC)

N = normal, P = percentile, BC = bias-corrected

The first line of output tells us that the original sample size is 1,581 and in the second line we learn that the algorithm for
estimation dropped one case from the sample. An important caveat to the pseudo ^-squared reported on the third line, is that
this is the reported statistic from the last iteration of the qreg command on the final sample size. It is not the pseudo ^-squared
for the original sample, but we have opted to report this statistic to provide some indication of how the model is performing.

In the example above, no sample design information is passed to clad and the program calls Stata’s bsample utility to
resample the data 200 times. In order to maintain the same sample size in each bootstrap resample, clad ignores observations
where the dependent variable is missing. The results from bsample are then passed to the bstat command to generate the
standard Stata bootstrap output. For more information about the normal, percentile, and bias-corrected percentile confidence
intervals, see bstrap in the Stata manuals. For an introduction to the bootstrap principle, see Efron and Tibshirani (1993). In
order to reproduce results from clad, it is necessary first to set the random number seed; see generate in the Stata reference
manuals for more information.

The reported standard errors above will be correct if the sample comes from a simple random draw. This is not the case
with the GLSS data, which was collected using a two-stage design. clad can generate bootstrap estimates of the standard errors
which are robust to the two-stage design by passing the information about the primary sampling unit (PSU) to clad. For example,
we correct the standard errors above for this aspect of the sample in the example below.

. clad Ioffinc Isize urban coastal, 11(0) reps(200) psu(clust)

Initial sample size = 1581

Final sample size = 1580

Pseudo R2 = .05048178

Bootstrap statistics

Variable I

Reps

Observed

Bias

Std. Err.

[957. Conf

. Interval]

Isize I

I
I

200

1.149846

.0916958

.395014

.3708959

.6573149

.6507832

1.928797

2.076703

2.053507

(N)

(P)

(BC)

urban I

I
I

200

2.375166

.0562143

.6152112

1.161996

1.285434

1.12299

3.588336

3.658858

3.495041

(N)

(P)

(BC)

---------+-.

coastal I

I
I

200

1.287741

.0386539

.5439033

.2151873

.2898641

.0728349

2.360294

2.466994

2.216781

(N)

(P)

(BC)

const I

I
I

200

6.443694

-.1804084

1.04149

4.389922

3.942665

4.440762

8.497466

8.130428

8.347237

(N)

(P)

(BC)

N = normal, P = percentile, BC = bias-corrected

It is worth noting that introducing information about the sample design only affects the estimates of the standard errors.



More intriguing information

1. Une Classe de Concepts
2. The name is absent
3. Reform of the EU Sugar Regime: Impacts on Sugar Production in Ireland
4. DIVERSITY OF RURAL PLACES - TEXAS
5. Intertemporal Risk Management Decisions of Farmers under Preference, Market, and Policy Dynamics
6. BUSINESS SUCCESS: WHAT FACTORS REALLY MATTER?
7. The name is absent
8. Meat Slaughter and Processing Plants’ Traceability Levels Evidence From Iowa
9. The name is absent
10. The name is absent