97
and a probability with respect to repeated simulations.
Recall that scenario Sl favors the HLRM. In particular, the success rates are or-
dered according to prognoses as assumed by the HLRM. Not surprisingly, the HLRM
outperforms NEPPM for the subtypes with few patients. The implicit monotonicity
assumption and the fixed grouping (which happens to match the simulation truth)
greatly improve inference for these subtypes. The early stopping probability for sub-
types with poor prognosis is (correctly) high and is low for subtypes with good prog-
nosis. In contrast, under S2 there is no clear winner. See Figure 4.4 (b). The HLRM
is overall more aggressive in the sense that it leads to higher early stopping proba-
bilities. But it does so even when the treatment is effective (i.e., pi > 0.175). We
observe similar summaries under S3. Figure 4.4 (c) indicates that the HLRM stops
earlier, across study arms, including those with p2 ≥ 0.175. Finally, Figure 4.4 (d)
shows comparable average sample sizes under S4 for both models.
In summary, we gain precision when incorporating the information of the covari-
ates in the model and the covariates have predictive power. Among the 5 competing
models considered in Section 4.1, the partially exchangeable HLRM and the NEPPM
with random partitions show the best overall performance. Comparing these two
models directly, when the assumed true success rates are monotone increasing with
respect to prognosis, then the HLRM is optimal in terms of bias, MSE and coverage
probability. However, if the assumed simulation truth does not match the grouping
by prognosis xi or the monotonicity is violated, then the NEPPM performs better. By
its nature, the HLRM introduces strong prior beliefs about the similarity of the suc-
cess rates. The model groups subtypes by xi, and allows no modification of this fixed
clustering. Inference is precise when these beliefs happen to be right. In contrast,
the NEPPM introduces similar beliefs, but allows for uncertainty. The model allows
the data to speak and correct the clustering in case the prior beliefs were inaccurate.
In terms of early stopping probabilities, the HLRM tends to stop study arms earlier
than the NEPPM. In all but Sl where HLRM wins, there is not is a clear winner