10
Stata Technical Bulletin
STB-57
their spread should be proportional to their variances. This suggested that, when plotted, small studies should be widely spread
about the average effect, and the spread should narrow as sample sizes increase, resulting in a symmetric, funnel-shaped graph.
If the graph revealed a lack of symmetry about the average effect (especially if small, negative studies appeared to be absent)
then publication bias was assumed to exist.
Evaluation of a funnel graph was a very subjective process, with bias—or lack of bias—residing in the eye of the beholder.
Begg and Mazumdar (1994) noted this and observed that the presence of publication bias induced skewness in the plot and a
correlation between the effect sizes and their variances. They proposed that a formal test of publication bias could be constructed
by examining this correlation. More recently, Egger et al. 1997 proposed an alternative, regression-based test for detecting
skewness in the funnel plot and, by extension, for detecting publication bias in the data. Their numerical measure of funnel plot
asymmetry also constitutes a formal test of publication bias. Stata implementations of both the Begg and Mazumdar procedure
and the Egger et al. procedure were provided in metabias (Steichen 1998; Steichen et al. 1998).
However, neither of these procedures provided estimates of the number or characteristics of the missing studies, and neither
provided an estimate of the underlying (unbiased) effect. There exist a number of methods to estimate the number of missing
studies, model the probability of publication, and provide an estimate of the underlying effect size. Duval and Tweedie list
some of these and note that all “are complex and highly computer-intensive to run” and, for these reasons, have failed to find
acceptance among meta-analysts. They offer their new method as “a simple technique that seems to meet many of the objections
to other methods.”
The following sections paraphrase some of the mathematical development and discussion in the Duval and Tweedie paper.
Estimators of the number of suppressed studies
Let (Yj,Vj), j = 1,... ,n, be the estimated effect sizes and within-study variances from n observed studies in a meta-
analysis, where all such studies attempt to estimate a common global “effect size” Δ. Define the random-effects (RE) model
used to combine the Yj as
Yj = Δ + βj + ej
where βj ~ JV(0,τ2) accounts for heterogeneity between studies, and ¾ ~ JV(0,σ2) is the within-study variability of study j.
For a fixed-effects (FE) model, assume τ2 = 0.
Further, in addition to n observed studies, assume that there are ко relevant studies that are not observed due to publication
bias. Both the value of fc0, that is, the number of unobserved studies, and the effect sizes of these unobserved studies are
unknown and must be estimated.
Now, for any collection Xi, i = 1,. ..,N of random variables, each with a median of zero and sign generated according
to an independent set of Bernoulli variables taking values —1 and 1, let Tj denote the rank of ∣XJ and
WN = Σ ri
Xi >0
be the sum of the ranks associated with positive ¾. Then has a Wilcoxon distribution.
Assume that among these N random variables, ко were suppressed, leaving n observed values. Furthermore, assume that
the suppression has taken place in such a way that the ко values of the Xi with the most extreme negative ranks have been
suppressed. (Note: Duval and Tweedie call this their key assumption and present it italicized, as done here, for emphasis. Further,
they label the model for an overall set of studies defined in this way as a suppressed Bernoulli model and state that it might be
expected to lead to a truncated funnel plot.)
Rank again the n observed ∣¾∣ as r↑ running from 1 to n. Let 7* ≥ O denote the length of the rightmost run of ranks
associated with positive values of the observed ¾ that is, if h is the index of the most negative of the Xi and τh is its absolute
rank, then 7* = n — r'^D. Define the “trimmed” rank test statistic for the observed n values as
τn = ∑ <
Xi >0
Note that though the distributions of 7* and Tn depend on ⅛0, the dependence is omitted in this notation. Based on these
quantities, define three estimators of ко, the number of suppressed studies:
R0 = 7* - 1,
τ _ 4Tn - n(n + 1)
l°- ‰≡1