Date post: | 22-Dec-2015 |
Category: |
Documents |
Upload: | lucas-sutton |
View: | 214 times |
Download: | 0 times |
EBI is an Outstation of the European Molecular Biology Laboratory.
MS Identification
Dr. Juan Antonio VIZCAINO
PRIDE Group coordinatorPRIDE team, Proteomics Services Group
PANDA group
European Bioinformatics Institute
Hinxton, Cambridge
United Kingdom
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Overview …
• Search engines: peptide identification
• Protein inference
• De novo and spectral searches
• Choosing the right protein sequence DB
• You need to learn many things…
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
It should not be a black box…
From: Lilley et al., Proteomics, 2011
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: Shot-gun/bottom-up approaches
300 400 500 600 700 800 900 1000 1100m/z0
100
%
300 400 500 600 700 800 900 1000 1100m/z0
100
%
MS analysis
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
MS/MS analysis
fragmentation
PROTOCOL
peptides
proteins
sequencedatabase
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Peptide Mass Fingerprinting (MS)
300 400 500 600 700 800 900 1000 1100m/z0
100
%
300 400 500 600 700 800 900 1000 1100m/z0
100
%
MS analysis
Peptide MassFingerprinting
(PMF)MW
- Each peak in the spectrum represents a peptide (or mixture of peptides)
- Information about the Mass and Charge
Not very used at present except forGel Based approaches
(in this case the Molecular Weight of the protein is known)
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Peptide Mass Fingerprinting (MS) in the webAldente (Phenyx): http://www.expasy.org/tools/aldente/
ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/
Bupid: http://zlab.bu.edu/Amemee/
Mascot: http://www.matrixscience.com/search_form_select.html
MassSearch: http://www.cbrg.ethz.ch/services/MassSearch
MS-Fit (Protein Prospector):http://prospector.ucsf.edu/prospector/mshome.htm
PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/
Profound (Prowl): http://prowl.rockefeller.edu/prowl-cgi/profound.exe
XProteo: http://xproteo.com:2698/
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
300 400 500 600 700 800 900 1000 1100m/z0
100
%
300 400 500 600 700 800 900 1000 1100m/z0
100
%
MS analysis
Peptide MassFingerprinting
(PMF)
MS/MS
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%MS/MS analysis
Peptide sequence information
(on top of Mass and Charge)
Fragmentation
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein database based comparison
Sequential comparison: de novo approaches
Spectral comparison
database sequence theoreticalspectrum
experimentalspectrum
compare
database sequence experimentalspectrum
compare de novosequence
Spectrallibrary
experimentalspectrum
experimentalspectrum
compare
Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
Three types of MS/MS identification
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: peptide IDs and protein IDs
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
MS/MS spectra
proteins
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: peptide IDs and protein IDs
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
MS/MS spectra
proteins
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: peptide IDs and protein IDs
proteins
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
MS/MS spectra
peptides
Searchengine
sequencedatabase
UniProtIPI
RefSeq
TDMDNQIVVSDYAQMDR
LFDQAFGLPRAKPLMELIER
DESTNVDMSLAQRDIVVQETMEDIDK
NGMFFSTYDRGTAGNALMDGASQL
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines
Sequence database matching
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
Experimental Spectra
Proteins
Peptides
Spectra
TDMDNQIVVSDYAQMDRLFDQAFGLPRAKPLMELIER
DESTNVDMSLAQRDIVVQETMEDIDK
NGMFFSTYDRGTAGNALMDGASQL
VDMSLAQRDIVVQETMEDIDK
…
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
Theoretical Spectra
UniProtIPI
RefSeq
sequencedatabase
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines
How good is the correlation?- Scores are generated by search engines- Usually the best match is kept
Theoretical Spectra
m / z
800 1200 1600 2000 2400
Experimental Spectra
m / z
800 1200 1600 2000 2400
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines
Taken from Nesvizhskii, J Proteomics, 2010
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines
Taken from Nesvizhskii, J Proteomics, 2010
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
• MASCOT (Matrix Science)http://www.matrixscience.com
• SEQUEST (Scripps, Thermo Fisher Scientific)http://fields.scripps.edu/sequest
• X!Tandem (The Global Proteome Machine Organization)http://www.thegpm.org/TANDEM
• OMSSA (NCBI)http://pubchem.ncbi.nlm.nih.gov/omssa/
The most popular algorithms
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Incorrect identifications
Correctidentifications
False positivesFalse negatives
Threshold score
Adapted from: www.proteomesoftware.com – Wiki pages
Overall concept of scores and cut-offs
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
0%
1%
2%
3%
4%
5%
6%
p=0.05 p=0.01 p=0.005 p=0.0005
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
false positives
identifications
higher stringency
Playing with probabilistic cut-off scores
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
• Very well established search engine
• Can be used for MS/MS (PFF) identifications
• Based on a cross-correlation score (includes experimental peak height)
• Published core algorithm (patented, licensed to Thermo Fisher Scientific)
• Provides preliminary (Sp) score, rank, cross-correlation score (XCorr),
and score difference between the top tow ranks (deltaCn, Cn)
• Thresholding is up to the user, and is commonly done per charge state
• Many extensions exist to perform a more automatic validation of results
SEQUEST
CrossCorr
avg AutoCorr offset=-75 to 75 XCorr =
deltaCn=
XCorr1 XCorr 2
XCorr1
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines: Sequest
The XCorr is high if the direct comparison is significantly greater than
the background
It measures how good the XCorr is relative to the next best match.
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
• Very well established search engine
• Can do MS (PMF) and MS/MS (PFF) identifications
• Based on the MOWSE score
• Unpublished core algorithm (trade secret)
• Predicts an a priori threshold score that identifications need to pass
• From version 2.2, Mascot allows integrated decoy searches
• Provides rank, score, threshold and expectation value per identification
• Customizable confidence level for the threshold score
Search engines: Mascot
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines: Mascot
www.matrixscience.com
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines: X!Tandem
• Open source search engine• Can be used for MS/MS experiments• Based on a hyperscore, than only takes into account
b and y ions.• Published core algorithm and it is freely available• Fast and able to handle PTMs in an iterative fashion• Used as an auxiliary search engine
by-Score= Sum of intensities of peaks matchingB-type or Y-type ions
HyperScore= by-Score Ny! Nb!
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Search engines: OMSSA
• Open source search engine• Can be used for MS/MS experiments• Relies on a Poisson distribution• Published core algorithm and it is freely available• Provides an expectancy score, similar to the BLAST
E-value• Very good performance in comparison with the
others• Used as an auxiliary search engine
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: peptide IDs and protein IDs
proteins
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
MS/MS spectra
peptides
Searchengine
sequencedatabase
UniProtIPI
RefSeq
TDMDNQIVVSDYAQMDR
LFDQAFGLPRAKPLMELIER
DESTNVDMSLAQRDIVVQETMEDIDK
NGMFFSTYDRGTAGNALMDGASQL
So far, we have actually identified peptides, not proteins
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
MS proteomics: peptide IDs and protein IDs
peptides proteins
TDMDNQIVVSDYAQMDRTW
LFDQAFGLPRAKPLMELIER
DESTNVDMSLAQRDIVVQETMEDIDK
NGMFFSTYDRGTAGNALMDGASQL
IPI00302927IPI00025512IPI00002478IPI00185600IPI00014537IPI00298497IPI00329236IPI00002232
Protein Inference is complex!!
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Intermezzo: Protein inference
The minimal and maximal explanatory sets
peptide a b c d
proteinsprot X x xprot Y xprot Z x x x
Minimal setOccam {
peptide a b c d
proteinsprot X x xprot Y xprot Z x x x
Maximal setanti-Occam {
The Truth
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Intermezzo: Protein inference
Slide from J. Cottrell, Matrix Science
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein inference
B
A
D
C
Unambiguous peptide
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
OTHER APPROACHES TO PERFORM MS/MS IDENTIFICATION
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein database based comparison
Sequential comparison: de novo approaches
Spectral comparison
database sequence theoreticalspectrum
experimentalspectrum
compare
database sequence experimentalspectrum
compare de novosequence
Spectrallibrary
experimentalspectrum
experimentalspectrum
compare
Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
Three types of MS/MS identification
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Example of a manual de novo of an MS/MS spectrumNo more database necessary to extract a sequence!
Algorithms
LutefiskSherengaPEAKS
PepNovo…
References
Dancik 1999, Taylor 2000Fernandez-de-Cossio 2000
Ma 2003, Zhang 2004Frank 2005, Grossmann 2005
…
De novo approaches
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Protein database based comparison
Sequential comparison: de novo approaches
Spectral comparison
database sequence theoreticalspectrum
experimentalspectrum
compare
database sequence experimentalspectrum
compare de novosequence
Spectrallibrary
experimentalspectrum
experimentalspectrum
compare
Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
Three types of MS/MS identification
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Spectral searching
• Concept: To compare experimental spectra to other experimental spectra.
• There are many spectral libraries publicly available (for instance, from NIST)
• Custom ‘search engines’ have been developed:• SpectraST (TPP)• X!Hunter (GPM)
• It has been claimed that the searches have more sensitivity that with sequence database approaches
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Spectral searching (2)
http://peptide.nist.gov/
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
COMBINING DIFFERENT SEARCH APPROACHES
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Multi-stage peptide identification strategy
Taken from Nesvizhskii, J Proteomics, 2010
Goal: “Squeeze” your good quality experimental spectra
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
PROTEIN SEQUENCE DATABASES
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
1. Comprehensive (whatever is not in the DB will not be included in your results).
2. Not too redundant at the protein sequence level- Protein inference gets easier- It is not very good if the database is too big.
3. Quality of annotation
4. Stability of identifiers
What is needed from a protein database
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
a) UniProt Knowledgebase (UniProtKB): SWISS-PROT (manually curated)/ TrEMBL.
b) NCBI non-redundant database: It compiles all protein sequences available from the following databases: ‘GenBank’ translations, the Protein Data Bank (PDB), UniProtKB/Swiss-Prot, PIR and PRF.
c) Ensembl: Genomics centric resource. Integration of the information with genomics is easy.
d) IPI (International Protein Index): It has been discontinued (9/2012). Different builds for different species (Human, Mouse, Cow, Rat, Zebrafish, Dog, Arabidopsis).
e) Model organisms DBs (for instance, TAIR for Arabidopsis).
Main databases used
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
- If the species is not well represented in the protein databases, there is a much stronger need to search ESTs or genomic databases.
- The search engine will translate the 6 possible ORFs for each nucleotide sequence.
- ESTs are not suitable for PMF approaches (incomplete proteins).
- The alternative is to filter comprehensive databases like UniProt by species or genus, or to use a protein DB from a close organism.
Databases for non-model organisms
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
- Since each database has a different focus, the databases can vary in terms of completeness, degree of redundancy, and quality of annotations.
- More inclusive bigger protein databases will take longer to search
- For the bigger resources, it may also result on more false-positive identifications and reduced statistical significance (the probability of random match is higher).
Importance of choosing the right DB
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
POST-VALIDATION OF RESULTS
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
- Concepts of peptide and protein FDR
- Decoy databases
- Softwares like PeptideProphet, ProteinProphet, …
- Influence of PTMs in the search
- Scoring of PTM positioning
…..
Other concepts that would be nice to learn…
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Recommended reading….
Nesvizhskii, J Proteomics, 2010
and many more…
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Conclusions
• Approaches to perform peptide and protein identification
• Sequence database based approaches: search engines
• The protein inference problem
• Importance of choosing the right protein database
• Many things to be learnt…
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
Remember: it should not be a black box…
From: Lilley et al., Proteomics, 2011
EBI Bulgaria RoadshowRotterdam, 12 June 2012
Juan A. Vizcaí[email protected]
And still… we haven’t touched quantification at all
From: Vaudel et al., Proteomics, 2010