Database Search Strategies for Proteomic Data Sets Generated by Electron Capture Dissociation Mass Spectrometry



Database Search Strategies for Proteomic Data Sets

Exported Mascot results were sorted (in Excel) by descending
score, and then by protein accession (A to Z, in this case, to
ensure ###REV... is listed before IPI:...). This was to ensure that,
for a reverse and a forward hit of identical score, the reverse
hit was preferentially retained. Lower scoring identifications
were removed, leaving only the top scoring identification for
each MS/MS event (remove duplicates for Peptide Scan Title
column). The mass error in ppm was calculated for each
identification (using the charge-state, theoretical mass and
delta mass: delta mass/(theor. mass
+ chargestate × 1.00728)
× 1 000 000).

The OMSSA Browser 2.1.1 was employed to search the DTA
files. OMSSA settings were as previously described, with the
exception that phosphorylation was not considered as a
variable modification.5 For the ECD search, the ‘elimination
of charge-reduced precursors in spectrum’ option was selected.

Researching ECD Phosphopeptide Data Set. A total of 6080
ECD DTAs were processed as above, that is, removal of noise
peak, precursor window and neutral loss peaks. Database
searching was as for unmodified (above), but allowing STY
phosphorylation as a variable modification. Postsearch filtering
was as above.

Results

To test the effect of various search-related parameters, we
employed a test data set consisting of 3341 high quality ECD
mass spectra obtained from the LC-MS/MS analysis of mouse
whole cell lysate. Paired ion trap CID and FT-ICR ECD mass
spectra were acquired, as previously described.4,5 The mouse
IPI database was searched; a concatenated forward-reverse
version of this database was employed, unless stated otherwise.
In all cases, the false-discovery rate (FDR) as estimated by the
number of accepted reverse identifications was controlled at
less than 1%. Full details of the peptide identifications are
supplied as Supplementary Data.

Initial Search. An initial search, without preprocessing of
the CID or ECD data, was carried out using both search
engines: Mascot and OMSSA. The precursor mass tolerance was
set to 0.02
m/z (OMSSA) or 10 ppm (Mascot).

For the initial Mascot search, a forward-only version of the
mouse IPI database was employed, in combination with the
Mascot ‘decoy’ option. The ‘decoy’ option automatically carries
out a second search using a randomized database, and thereby
gives an estimate of FDR. However, adjusting the FDR to a
particular value (1%) was not possible. The search resulted in
633 ECD identifications and 1712 CID identifications. To better
control the estimated FDR, we repeated the Mascot search,
without “decoy” option, using the concatenated version of the
database (as used in all subsequent searches), exported
all
results into Excel, and manually filtered according to Mascot
score. That resulted in a doubling of the number of accepted
identifications, as shown in Table 1 (ECD Search: row 1 versus
row 3). Manually filtered Mascot and OMSSA searches give
similar numbers of identifications for both ECD and CID data
sets. The identification rates reach 38% for ECD data (1254
identifications) and 69% for CID data (2297 identifications).
Clearly, there is a considerable difference, of approximately
30%, in identification success rate between CID and ECD mass
spectra.

Postsearch Filtering by Precursor Mass Error. Database
searches employing a wide precursor mass tolerance window,
with subsequent filtering of results, have previously been shown
to improve identification rates.5,10 While the benefits of post-
research articles

Table 1. Initial Searches of ECD and CID Data Filtered
According to Database Search Algorithm Score
a

search

postsearch
filter

forward
hits

reverse
hits

ID rate

ECD Search (3341 DTAs)

Mascot; 10 ppm precursor;
Mascot “decoy”

633

2*

18.9

OMSSA; 0.02 Da precursor

Peptide score

1190

11

35.6

Mascot; 10 ppm precursor

Peptide score

1254

12

37.5

CID Search (3341 DTAs)

Mascot; 10 ppm precursor;
Mascot “decoy”

1712

16*

51.2

OMSSA; 0.02 Da precursor

Peptide score

2297

22

68.8

Mascot; 10 ppm precursor

Peptide score

2283

22

68.3

a DTA files are unaltered. Asterisks indicate “decoy” hits, from Mascot
“Decoy” search option.

Table 2. Searches of ECD and CID Data in Which a Wider
Precursor Mass Tolerance Window Was Combined with
Postsearch Precursor Mass Error Filtering
a

search

postsearch
filter

forward
hits

reverse
hits

ID rate

ECD Search (3341 DTAs)

OMSSA; 1.1

Precursor ppm error

1447

14

43.3

Da precursor

and peptide score

Mascot; 1.1

Precursor ppm error

1468

7

43.9

Da precursor

CID Search (3341 DTAs)

OMSSA; 1.1

Precursor ppm error

2344

9

70.2

Da precursor

Mascot; 1.1

Precursor ppm error

2385

9

71.4

Da precursor

a To achieve the estimated FDR of 1%, results were filtered according
to database search algorithm scores where necessary (
E-value cutoff of
8.01
× 10-1 for OMSSA ECD search).

search filtering are well-established, we were interested in
comparing the magnitude of its effect with the other levels of
data processing described here and the effectiveness for ECD
compared to CID data. We therefore repeated the above
searches with a precursor mass tolerance of 1.1 Da and
exported all results for subsequent manual filtering of identi-
fications by precursor error (in ppm) and, if the selected ppm
range contains more reverse hits than compatible with a 1%
FDR, by peptide score. This resulted in an increased number
of accepted identifications for all searches (Table 2). We note
that the increase in identification efficiency for ECD data is
greater than that for CID data, for example, increases of 6.4%
and 3.1%, for ECD and CID searches using Mascot. This
characteristic may be the result of the lower peptide scores for
ECD identifications (Mascot average score of 23 versus 40, for
ECD (
n ) 1468) and CID (n ) 2385), respectively), that is, the
true identifications are less readily distinguished from reverse
hits on the basis of peptide score alone.

Previous work has shown that it is possible for the precursor
mass recorded in the DTA file to correspond to the second or
third isotopic peak (i.e., one or two 13C more than the
monoisotopic peak).11 This occurrence is particularly common
for low resolution ion-trap only experiments. If this occurs,
identifications can be rescued by searching with a larger
precursor tolerance window (with subsequent narrow mass
filtering around the offset precursor mass). We compared
searches with 1.1, 2.1, and 3.1 Da tolerances. In none of the
cases was there a high-scoring identification resulting from

Journal of Proteome Research Vol. 8, No. 12, 2009 5477



More intriguing information

1. The name is absent
2. The name is absent
3. Federal Tax-Transfer Policy and Intergovernmental Pre-Commitment
4. Testing for One-Factor Models versus Stochastic Volatility Models
5. Der Einfluß der Direktdemokratie auf die Sozialpolitik
6. Antidote Stocking at Hospitals in North Palestine
7. Survey of Literature on Covered and Uncovered Interest Parities
8. Human Development and Regional Disparities in Iran:A Policy Model
9. Ability grouping in the secondary school: attitudes of teachers of practically based subjects
10. The name is absent
11. PROPOSED IMMIGRATION POLICY REFORM & FARM LABOR MARKET OUTCOMES
12. The constitution and evolution of the stars
13. A Critical Examination of the Beliefs about Learning a Foreign Language at Primary School
14. THE CHANGING STRUCTURE OF AGRICULTURE
15. How to do things without words: Infants, utterance-activity and distributed cognition.
16. The name is absent
17. Migrating Football Players, Transfer Fees and Migration Controls
18. A Note on Costly Sequential Search and Oligopoly Pricing (new title: Truly Costly Sequential Search and Oligopolistic Pricing,)
19. The Context of Sense and Sensibility
20. The name is absent