Database Search Strategies for Proteomic Data Sets Generated by Electron Capture Dissociation Mass Spectrometry



Provided by University of Birmingham Research Archive, E-prints Repository

research articles proιtjur ∣ ∣tj

’research

Database Search Strategies for Proteomic Data Sets Generated by
Electron Capture Dissociation Mass Spectrometry

Steve M. M. Sweet/# Andrew W. Jones, Debbie L. Cunningham/ John K. Heath/
Andrew J. Creese, and Helen J. Cooper*

School of Biosciences, College of Life and Environmental Sciences, University of Birmingham,
Edgbaston, Birmingham B15 2TT, United Kingdom

Received May 28, 2009

Large data sets of electron capture dissociation (ECD) mass spectra from proteomic experiments are
rich in information; however, extracting that information in an optimal manner is not straightforward.
Protein database search engines currently available are designed for low resolution CID data, from
which Fourier transform ion cyclotron resonance (FT-ICR) ECD data differs significantly. ECD mass
spectra contain both z-prime and z-dot fragment ions (and c-prime and c-dot); ECD mass spectra contain
abundant peaks derived from neutral losses from charge-reduced precursor ions; FT-ICR ECD spectra
are acquired with a larger precursor
m/z isolation window than their low-resolution CID counterparts.
Here, we consider three distinct stages of postacquisition analysis: (1) processing of ECD mass spectra
prior to the database search; (2) the database search step itself and (3) postsearch processing of results.
We demonstrate that each of these steps has an effect on the number of peptides identified, with the
postsearch processing of results having the largest effect. We compare two commonly used search
engines: Mascot and OMSSA. Using an ECD data set of modest size (3341 mass spectra) from a complex
sample (mouse whole cell lysate), we demonstrate that search results can be improved from 630
identifications (19% identification success rate) to 1643 identifications (49% identification success rate).
We focus in particular on improving identification rates for doubly charged precursors, which are
typically low for ECD fragmentation. We compare our presearch processing algorithm with a similar
algorithm recently developed for electron transfer dissociation (ETD) data.

Keywords: ECD neutral loss OMSSA Mascot identification CID mass spectrometry FT-ICR
LTQ-FT

Introduction

Electron capture dissociation (ECD) is a radical-driven
fragmentation technique which provides an alternative to slow-
heating collision induced dissociation (CID).1 ECD has suc-
cessfully been applied to the small-scale detailed character-
ization of various peptides, modified or otherwise.2,3 These
experiments are greatly facilitated by a prior knowledge of the
peptide sequence, allowing manual analysis of the ECD data.
In contrast, large-scale proteomic experiments utilizing ECD
rely on a database search step in order to identify the
fragmented peptide.4,5 The database search engines employed
were originally designed to accept low resolution CID data.
High resolution ECD data presents a significantly different
challenge. The characteristics of FT-ICR ECD data are sub-10
ppm mass accuracy, low noise levels, intense precursor and
charge-reduced precursor peaks, and strong neutral loss peaks

* Address correspondence to: Helen J. Cooper, School of Biosciences,
College of Life and Environmental Sciences, University of Birmingham,
Edgbaston, Birmingham B15 2TT, UK. Telephone:
+44 (0)121 414 7527. Fax:
+44 (0)121 414 5925. E-mail: [email protected].

t CRUK Growth Factor Group.

# Current address: Department of Chemistry, University of Illinois at
Urbana-Champaign, Urbana, IL 61801, USA.

10.1021∕pr9008282 CCC: $40.75    © 2009 American Chemical Society
from the charge-reduced precursor.6,7 Furthermore, hydrogen
transfer can occur between ECD c-prime and z-dot fragments,
resulting in c-dot and z-prime products.8

The search engines that have been employed for large-scale
ECD data analysis are Mascot and OMSSA.4,5 These search
engines have certain limitations, for example, the product ion
tolerance cannot be specified in ppm and the benefits of high
mass accuracy data are not fully realized. We have analyzed
large-scale ECD data sets both manually and using various
search engines. It is apparent from these analyses that certain
generic aspects of ECD mass spectra are likely to be detrimental
to their identification by database search engines. The most
obvious of these is the high intensity precursor and charge-
reduced precursor peaks. Both search engines tested here
already anticipate these peaks, removing them from consid-
eration. For example, Mascot removes peaks within the frag-
ment ion tolerance window about each of the precursor isotope
peaks. However, the search engines do not consider coeluting
peaks in the precursor isolation window and are, in fact,
ignorant of the isolation window size. Another characteristic
of ECD is the generation of various neutral losses from the
charge-reduced precursors. These peaks are not utilized by
currently available search engines. In the case of ECD of doubly

Journal of Proteome Research 2009, 8, 5475-5484 5475

Published on Web 10/13/2009



More intriguing information

1. The name is absent
2. Nonparametric cointegration analysis
3. Commuting in multinodal urban systems: An empirical comparison of three alternative models
4. The name is absent
5. Optimal Vehicle Size, Haulage Length, and the Structure of Transport Costs
6. A Critical Examination of the Beliefs about Learning a Foreign Language at Primary School
7. Temporary Work in Turbulent Times: The Swedish Experience
8. A Consistent Nonparametric Test for Causality in Quantile
9. Olfactory Neuroblastoma: Diagnostic Difficulty
10. The name is absent
11. The Economics of Uncovered Interest Parity Condition for Emerging Markets: A Survey
12. Types of Cost in Inductive Concept Learning
13. The name is absent
14. Who runs the IFIs?
15. The name is absent
16. Business Cycle Dynamics of a New Keynesian Overlapping Generations Model with Progressive Income Taxation
17. Migration and Technological Change in Rural Households: Complements or Substitutes?
18. FISCAL CONSOLIDATION AND DECENTRALISATION: A TALE OF TWO TIERS
19. Models of Cognition: Neurological possibility does not indicate neurological plausibility.
20. Social Cohesion as a Real-life Phenomenon: Exploring the Validity of the Universalist and Particularist Perspectives