78
3.8 Discussion
We have proposed semi-parametric model-based statistical inference for high dimen-
sion count data arising from phage experiments with parallel biopanning. The prob-
ability model is extended to a decision problem by adding a utility function for the
choice of reported peptide∕tissue pairs.
Previously, Ji et al. (2007) introduced a model for the analysis of phage experiment
data based on mouse data. Analogous to their model, ours takes into account the
correlation that exists between the different stages and detects the tripeptides that
tend to bind with a specific tissue. Since there is just one observation across the three
stages for every tripeptide-tissue pair, we need to impose a hierarchical structure that
allows us to borrow information of all the pairs in order to make statistical inferences
of the behavior of one in particular. A visual inspection of the human data, such
as the one presented on Figure 3.3, shows the existence of pairs with oscillating
counts and the presence of outliers. This indicates that the log-linearity of the means
assumption on the model proposed by Ji et al. may not be appropriate in this case.
In order to avoid an inference than can be misled by outliers, we require a more robust
model against them. These two phenomena are taken into account by our model, the
first by the specific structure of it and, the second, taking advantage of our larger
data set, by its nonparametric nature. In addition, the model we proposed has an
easy interpretation of the parameters at the upper level of the hierarchical model:
μi is the mean counts of the tripeptide-tissue pair i if there were no enrichment of
the tripeptide library at every stage, while βi and 5l are the folds of these counts
at the second and third stages respectively due to this enrichment. Moreover, this
parametrization allows an easy description of the phenomenon we are interested in-
the increasing mean counts through the three stages- in terms of the parameters. If
the biologists are interested in studying other phenomena involving the means, once
we have simulated the posterior sample of our parameters, it is easy to compute the
posterior probability of the events equivalent to these phenomena.