Database Search Strategies for Proteomic Data Sets Generated by Electron Capture Dissociation Mass Spectrometry



Database Search Strategies for Proteomic Data Sets
score) described here; however, no presearch processing was
employed. We reanalyzed this data set, keeping the search and
postsearch steps the same, but employing presearch processing
of DTAs, as described above. This increased the number of
phosphopeptide identifications from 1087 to 1155 (a 6%
increase) with no change in the number of reverse identifica-
tions. This increase was largely due to an increase in the
number of identifications from doubly charged precursor ions:
from 285 to 338 (a 19% increase). Employing Mascot for the
search step resulted in slightly fewer phosphopeptide identifi-
cations.

Discussion

We have considered three stages of data processing: pre-
search, search and postsearch.

Postsearch. We confirm that searching using a wide precur-
sor mass tolerance window, with subsequent filtering by ppm,
substantially improves the identification rate for ECD data as
well as CID data. This improvement is due to the tendency for
false-positive hits to scatter across a wider mass range of the
search space than true hits, which cluster within a window of
approximately 10 ppm.10 We show that postsearch filtering has
a greater effect on ECD identification efficiency than CID data,
which may be due to the lower scores assigned to ECD spectra
by the database search engines employed. We also find that
postsearch processing has the largest effect of any individual
data processing step (6.4% increase in ECD identifications,
compared to 5.3% increase for all preprocessing steps).

Search. Ideally, both precursor and fragment mass errors
should be specified in ppm, as ppm errors are fairly constant
across the measured
m/z range. By contrast, an error of 0.02
m/z corresponds to a ppm error of 12.5 for a peak at m/z 1600,
but an error of 50 ppm at
m/z 400. Observed fragment ion
errors range from 0 to 12 ppm. Precursor errors can be
converted to ppm for postsearch filtering (as we have shown).
However, fragment errors are less easy to convert and filter.
Both the Mascot and OMSSA ECD searches employed a
fragment tolerance of 0.02
m/z, which is not ideal. Nevertheless,
fragment errors greater than 12 ppm are indicative of an
incorrect match and are useful information for manual valida-
tion of identifications.

We note that the “Mascot decoy” search results in a
significantly lower number of identifications than the other
searches. This observation draws attention to the conservative
nature of the Mascot scoring system, as previously discussed.15
The scoring scheme was developed prior to the use of decoy
searches to estimate FDR. This type of conservative scoring
system is particularly valuable when the number of identifica-
tions is too low to allow a meaningful estimate of FDR.

ECD fragmentation of peptides results in both z-dot and
z-prime fragment ions.5,8,16 Mascot allows identification of both
types. That might be expected to provide an advantage over
OMSSA, which cannot identify z-prime fragments. We do
observe a slightly higher performance for Mascot for the main
test data set studied here; however, OMSSA slightly outperforms
Mascot for the phosphopeptide data set (data not shown). Both
algorithms perform reasonably well for ECD analysis; however,
there is a clear need for a purpose-built ECD/ETD algorithm.

Presearch. The starting premise for all presearch DTA
processing is that intense peaks corresponding to anticipated
c, z or y fragment ions result in high-scoring identifications;
conversely, intense peaks unattributable to anticipated frag-
ment ions detract from an identification score. We removed,
research articles

or reduced, contributions from three types of uninformative
peaks: electrical noise peaks, coeluting precursor peaks, and
neutral loss peaks from the charge-reduced precursors. Each
of these steps improved both the number of identifications and
the average search engine scores of those identifications. Note
that neutral loss peaks are not necessarily uninformative;6,7
however, the search engines we employed are incapable of
interpreting these peaks. We remove neutral loss peaks only
from ECD mass spectra from doubly charged precursors. This
enables us to retain every possible true c/z/y fragment ion,
while removing all other peaks within a 140 Da region of the
[M + 2H]
reduced precursor. An alternative strategy described
by Good et al. removes a smaller region around all reduced
precursors, with the side-effect that some true c/z/y fragments
are also removed.9 Our strategy removes fewer true c/z/y
fragments and results in a greater number of identifications
(Table 5). Retention of true fragments is facilitated by the high
resolution ECD MS/MS data: in the region from [M + 2H]
' -
57 to -140
m/z (73 m/z region), there are 33 potential true
fragment masses which are retained (Supplemental Table 1).
However, even using a relatively wide retention window of (12
ppm, we retain less than 0.8
m/z of the 73 m/z region. Eighteen
neutral losses have been described which fall into this 73
m/z
region, and which we remove.6,17

In a recent comparison of different search engines for
identification of ETD mass spectra, Kandasamy et al. found that
OMSSA identified far fewer doubly charged peptides than
Mascot.18 We do not find a similarly dramatic difference for
ECD mass spectra: identifications from doubly charged pep-
tides make up 55% and 61% of the total for OMSSA and Mascot,
respectively. The shortfall observed by Kandasamy et al. may
be related to the fact that y ions were not utilized in their
OMSSA search. The proportion of identifications from doubly
charged peptides approaches the proportion selected for
fragmentation (67%).

In conclusion, we show that search results for ECD data are
highly dependent on the search strategy employed, varying
from an identification rate of 19% (Mascot decoy search) up
to 49% (Mascot search with pre- and postsearch processing).
We also demonstrate that the absolute database search engine
peptide score is unimportant; rather the relative scores of
forward and reverse hits are more useful in determining correct
identifications.

Acknowledgment. The authors gratefully acknowledge
EU Endotrack (S.M.M.S.), EPSRC (A.W.J.), Cancer Research
UK (D.L.C., J.K.H.) and the Wellcome Trust (074131) (H.J.C.)
for funding.

Supporting Information Available: Supplementary
Table 1: Masses retained in the
m/z region ([M + 2H]+ - 57)
> m/z([M + 2H]+ - 140). Supplementary Table 2: CID and
ECD pairs giving conflicting IDs. Supplementary File: Perl script
for removal of non-c,z,y peaks from DTA files, Trim_DTAs_
May2009.pl. Supplementary Data: Peptide identifications;
Supplementary Tables Identifications.xlsx. This material is
available free of charge via the Internet at
http://pubs.acs.org.

References

(1) Zubarev, R. A.; Kelleher, N. L.; McLafferty, F. W. Electron capture
dissociation of multiply charged protein cations. A nonergodic
process.
J. Am. Chem. Soc. 1998, 120 (13), 3265-3266.

(2) Cooper, H. J.; Hakansson, K.; Marshall, A. G. The role of electron
capture dissociation in biomolecular analysis.
Mass Spectrom. Rev.
2005, 24 (2), 201-222.

Journal of Proteome Research Vol. 8, No. 12, 2009 5483



More intriguing information

1. The name is absent
2. Visual Artists Between Cultural Demand and Economic Subsistence. Empirical Findings From Berlin.
3. ISSUES IN NONMARKET VALUATION AND POLICY APPLICATION: A RETROSPECTIVE GLANCE
4. Convergence in TFP among Italian Regions - Panel Unit Roots with Heterogeneity and Cross Sectional Dependence
5. Short- and long-term experience in pulmonary vein segmental ostial ablation for paroxysmal atrial fibrillation*
6. Impact of Ethanol Production on U.S. and Regional Gasoline Prices and On the Profitability of U.S. Oil Refinery Industry
7. The name is absent
8. The duration of fixed exchange rate regimes
9. Policy Formulation, Implementation and Feedback in EU Merger Control
10. Natural hazard mitigation in Southern California
11. An alternative way to model merit good arguments
12. The name is absent
13. The Composition of Government Spending and the Real Exchange Rate
14. The name is absent
15. The name is absent
16. The name is absent
17. The name is absent
18. Dendritic Inhibition Enhances Neural Coding Properties
19. The name is absent
20. Locke's theory of perception