60
3.4 Model
3.4.1 A Semi-parametric Mixture of Poisson Model
Our approach is model-based, we will use Ni — {Nil, Ni2, Λ∕i3) to denote the observed
counts for tripeptide-tissue pair i across the three stages, for pairs i = 1,... ,n. We
cast the desired selection of tissue-specific tripeptides as inference about the increas-
ing trend of mean counts in a probability model for the observed data Ni. Ji et
al. (2007) have proposed a model-based approach based on a mixture of normal dis-
tributions in the parameters of the assumed Poisson distribution of the observations.
The strength and attraction of their model is its parsimonious nature. For the rela-
tively small mouse data set, this characteristic is important. Nevertheless, this model
has a limitation. The human data analyzed here is ten-fold larger. This allows us to
consider a more elaborate model. The model in Ji et al. assumes a linear relation on
the Iog-Poisson scale. For example, consider the pairs that are reported as oscillating
in Figure 3.3 (b). Although the data shows a marked difference in slopes from stages
1 to 2 versus from stages 2 to 3, the model assumes one common slope. This is a
concern when the imputed overall slope is positive, e.g., the pair marked by A in the
Figure 3.3 (b). Outliers like pair A in Figure 3.3 (b) can inappropriately drive the in-
ference. Taking advantage of the larger sample size, the semiparametric nature of the
model we propose can mitigate this problem. Finally, in Ji et al. binding tripeptides
are reported in terms of statistical significance, formalized as the posterior probability
of the overall slope being greater than zero. We will propose an approach that also
takes into account the size of the overall slope and is more suitable to incorporate
biological significance.
In summary, the choice of an appropriate probability model is driven by the follow-
ing considerations. First, we wish to limit the impact of specific parametric modeling
choices on the inference about monotonicity of the mean counts. The large number of
recorded pairs allows us to use a semi-parametric approach that reduces dependence