Computing optimal sampling designs for two-stage studies

Stata Technical Bulletin

STB-58

The dramatic increase in the size of the standard errors is not that surprising as the design effect for the dependent variable is
approximately 3.8, and there is little in the observation matrix which will explain the intracluster correlation.

Methods and Formulas

The Powell (1984) CLAD estimator is found by minimizing

ɪɪ ∣y, - max((λ (1)

The consistency of this estimator rests on the fact that medians are preserved by monotone transformations of the data, and (1)
is a monotone transformation of the standard least absolute deviations (LAD) regression. The properties of the LAD estimator are
presented in Koenker and Basset (1978). The LAD estimator is implemented in Stata with the qreg command.

The estimation technique used in clad for the CLAD estimator is Buchinsky’s (1991) iterative linear programming algorithm
(ILPA). (For a critique of and alternative to this algorithm, see Fitzenberger 1997.) The first step of the ILPA is to estimate a
quantile regression for the full sample, then delete the observations for which the predicted value of the dependent variable is less
than zero. Another quantile regression is estimated on the new sample, and again negative predicted values are dropped. More
generally, observations are dropped if the predicted value is less than the censoring value when the left tail of the distribution is
censored, or they are dropped if the predicted value is greater than the censoring value when the right tail of the distribution is
censored. Buchinsky (1991) shows that if the process converges, then a local minimum is obtained. Convergence occurs when
there are no negative predicted values in two consecutive iterations.

The two bootstraps are implemented as follows. For the simple random sample (SRS) we simply use Stata’s bsample utility
to bootstrap the CLAD point estimates. The SRS, two-stage bootstrap follows this process. In the first stage it counts the number
of unique PSUs, say k, and then using Stata’s uniform function, randomly selects with replacement к (not necessarily unique)
PSUs. At this point, it counts the number of times each PSU has been selected, and this is stored for later use. To implement
the second stage, the program first counts the number of ultimate sampling units (USUs), say m, in each selected PSU and then
randomly selects m USUs from each selected PSU. If a PSU is selected more than once, say a times, then in the second stage the
program randomly selects am USUs from the selected PSU. As a final note, we warn that clad can be quite time consuming
since the entire algorithm described above is repeated for each bootstrap resampling of the data.

References

Arabmazar, A. and P. Schmidt. 1981. Further evidence on the robustness of the tobit estimator to heteroskedasticity. Journal of Econometrics 17:
253-258.

Buchinsky, M. 1991. Methodological issues in quantile regression, Chapter 1 of The Theory and Practice of Quantile Regression Ph.D. dissertation,
Harvard University.

——. 1994. Changes in the U.S. wage structure 1963-1987: application of quantile regression. Econometrica 62(2): 405-459.

Cochran, W., 1997. Sampling Techniques. 3d ed. New York: John Wiley & Sons.

Efron, B. and R. Tibshirani. 1993. An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability. New York: Chapman & Hall.

Fitzenberger, B. 1997. Computational aspects of censored quantile regression. In Proceedings of The 3rd International Conference on Statistical Data
Analysis based on the L1 B Norm and Related Methods, ed. Y. Dodge, 171-186. Hayword, California: Institute of Mathematical Statistics
Lecture Notes B Monograph Series, Volume 31.

Kish, L. 1995. Survey Sampling. New York: John Wiley & Sons.

Koenker, R. and G. Bassett. 1978. Regression quantiles. Econometrica 46(1): 33 -50.

Powell, J. L. 1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics 25: 303-325.

Rogers, W. 1993. sg11.2: Calculation of quantile regression standard errors. Stata Technical Bulletin 13: 18-19. Reprinted in Stata Technical Bulletin
Reprints, vol. 3, pp. 77-78.

Scott, A. J. and D. Holt. 1982. The effect of two-stage sampling on ordinary least squares methods. Journal of the American Statistical Association
77(380): 848-854.

Vijverberg, W. 1987. Non-normality as distributional misspecification in single-equation limited dependent variable models. Oxford Bulletin of Economics
and Statistics 49(4): 417-430.

sg154 Confidence intervals for the ratio of two binomial proportions by Koopman’s method

Duolao Wang, London School of Hygiene and Tropical Medicine, London, UK, [email protected]

Abstract: This article introduces the koopman and koopmani commands, which compute confidence intervals for the ratio of
two binomial proportions based on two independent binomially distributed random variables using Koopman’s method.

Keywords: Koopman’s method, odds ratio, confidence intervals.

More intriguing information

1. The name is absent
2. Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages
3. Delivering job search services in rural labour markets: the role of ICT
4. The name is absent
5. The name is absent
6. The resources and strategies that 10-11 year old boys use to construct masculinities in the school setting
7. The name is absent
8. The name is absent
9. The name is absent
10. The name is absent