57
3.3 Data
We now return to the biopanning experiment described in Section 3.1. The data
are from a biopanning experiment carried out at M. D. Anderson Cancer Center.
The data come from three consecutive human subjects who met the formal criteria
for brain-based determination of death (Wijdicks, 2001). See Arap et al. (2002) for
details on patient selection and clinical procedure following clinical ethics criteria..
The purpose of the experiment is to identify peptides with increasing counts over
the consecutive stages. At each stage we record counts for peptide∕tissue pairs.
Peptides are denoted as CXyC (here C=cysteine, X = any amino acid, represented
by a letter). Tissues are Bone-Marrow, Fat, Muscle, Prostate and Skin.
At each stage a phage display peptide library was injected into a new patient, and
15 minutes later biopsies were collected from each of the target tissues and the peptide
counts were recorded. For the second and third stage the injected phage display
peptide library was the already enriched phage display library from the previous
stage.
The original data are counts for all unique 7-mers Xy. However, we summarize the
data using all implied 3-mers. For example, the 7-mer AGAGADR corresponds to the
four unique tripeptides AGA,GAG,DAG and ADR. Note that we do not distinguish
between a tripeptide and its mirror (e.g., DAG and GAD are counted as the same)
and each tripeptide contained in a 7-mer is counted only once (e.g., the count on AGA
is incremented only once, although it is contained twice in the 7-mer). So, an observed
7-mer AGAGADR contributes a count for the four tripeptides AGA,GAG,DAG and
ADR. The main reason for recording 3-mers are problems related to sparse counts
that would result from recording the 207 possible 7-mers. In contrast there are only
4200 (203, minus duplicate mirrors) tripeptides. It is believed that the 4200 distinct
3-mers are still a sufficiently rich class to differentiate between binding sites. See, for
example, Arap et al. (2002), Ji et al. (2006) and Kolonin et al. (2006) who also use
tripeptides. Finally, the data corresponding to the third stage contains two seven-