Computing optimal sampling designs for two-stage studies

Stata Technical Bulletin

STB-58

Description

This insert provides the program clad for estimating Powell’s (1984) censored least absolute deviations estimator (CLAD)
and bootstrap estimates of its sampling variance. The CLAD estimator is a generalization of the least absolute deviations (LAD)
estimator, which is implemented in Stata in the command qreg. Unlike the standard estimators of the censored regression model
such as tobit or other maximum likelihood approaches, the CLAD estimator is robust to heteroscedasticity and is consistent and
asymptotically normal for a wide class of error distributions. See Arabmazar and Schmidt (1981) and Vijverberg (1987) for
empirical examples of the magnitude of the bias resulting from the tobit estimator in the presence of nonnormal error distributions.

This program sidesteps the issue of programming analytical standard errors and provides instead bootstrapped estimates of
the sampling variance. Rogers (1993) shows that the standard errors reported by Stata for qreg are not robust to violations of
homoscedasticity or independence of the residuals and proposes a bootstrap alternative. We follow Rogers for the CLAD estimator
and propose two bootstrap estimates of the standard errors. The first is the standard bootstrap which assumes that the sample
was selected using a simple random design. The second is a bootstrap estimate which assumes that the sample was selected in
two stages and which replicates the design by bootstrapping in two stages.

An advantage of the two-stage bootstrap estimates available in clad is that if the sample was collected using a two-stage
process, then the estimated standard errors will be robust to this design effect. Kish (1995) and Cochran (1997) show the
importance of correcting mean values for design effects. Scott and Holt (1982) show that the magnitude of the bias for the
estimated variance-covariance matrix for ordinary least squares estimates can be quite large when it is erroneously assumed that
the data were collected using a simple random sample; if in fact a two-stage design had been used.

Syntax

clad varlist [if exp [in range] [, reps(#) psu (.aaraeme) 11[(#)] ul[(#)] dots saving filename)
replace level(#) quantile(#) iterate(#) wlsiter(#) ]

Options

reps(#) specifies the number of bootstrap replications to be performed. The default value is 100.

psu Varramne) specifies the variable identifying the primary sampling unit. If no variable is specified, then the bootstrap replication
is a single-stage, simple random draw on the sample.

11[(#)] and u 1[(#)] are as in Stata’s tobit command and indicate the censoring point. 11() indicates left censoring and
ul() indicates right censoring. If 11 or ul is specified without a specific censoring value, then clad assumes that the
lower limit is the minimum observed in the data (if 11 is specified) and the upper limit is the maximum (if ul is specified).
If nothing is specified for a lower or upper bound, clad assumes that the lower limit is zero. clad only functions with
lower or upper censoring; one cannot specify censoring at both the lower and upper bound.

dots prints a dot to the screen for each bootstrap replication; thereby allowing the user to estimate, after a few replications, the
time to completion.

Savingfileaame) creates a Stata datafile (.dta file) containing the bootstrap sample of the parameter estimates.

replace overwrites the Stata datafile specified in saving(), if it already exists.

level (#) specifies the confidence level, in percent, for confidence intervals. The default is level (95) or as set by set level.

quant ile(#) specifies the quantile to be estimated and should be a number between 0 and 1, exclusive. Numbers larger than

1 are interpreted as a percent. The default value of 0.5 corresponds to the median.

iterate (#) specifies the maximum number of iterations that will be allowed to find a solution. The default value is 16,000,
and the range is 1 to 16,000.

wlsiter (#) specifies the number of weighted least squares iterations that will be attempted before the linear programming iterations
are started. The default value is 1. If there are convergence problems—something we have never observed—increasing this
value should help.

Examples

To illustrate the use of clad, we use data from the 1988 Ghana Living Standard Survey (GLSS) and consider a somewhat
nonsensical regression. The sample considered is 1,581 households, and the dependent variable, Ioffinc, is the log of household,
nonfarm income. Since some households are fully engaged in farming, this variable has 528 observations with zeros recorded.
This variable is regressed on the log of the size of the household, lsize, and two geographic dummy variables, urban and
coastal. When we issue clad we obtain the results below.

More intriguing information

1. DISCUSSION: ASSESSING STRUCTURAL CHANGE IN THE DEMAND FOR FOOD COMMODITIES
2. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach
3. Optimal Tax Policy when Firms are Internationally Mobile
4. Stillbirth in a Tertiary Care Referral Hospital in North Bengal - A Review of Causes, Risk Factors and Prevention Strategies
5. Prizes and Patents: Using Market Signals to Provide Incentives for Innovations
6. An Attempt to 2
7. Institutions, Social Norms, and Bargaining Power: An Analysis of Individual Leisure Time in Couple Households
8. Climate Policy under Sustainable Discounted Utilitarianism
9. DURABLE CONSUMPTION AS A STATUS GOOD: A STUDY OF NEOCLASSICAL CASES
10. Financial Market Volatility and Primary Placements