Post on 18-Nov-2014
description
transcript
http://www.bits.vib.be/training
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Lennart MARTENSlennart.martens@ebi.ac.uk
Proteomics Services GroupEuropean Bioinformatics Institute
Hinxton, CambridgeUnited Kingdomwww.ebi.ac.uk
kenny helsens
kenny.helsens@ugent.be
Computational Omics and Systems Biology Group
Department of Medical Protein Research, VIBDepartment of Biochemistry, Ghent University
Ghent, Belgium
peptide validationand protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Raw data
Peaklists
Peptide sequences
Protein accession numbersdata sizeambiguity
See: Martens and Hermjakob, Molecular BioSystems, 2007
Data processing and information ambiguity
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
PEPTIDE IDENTIFICATION VALIDATION
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Populations and individuals
10,000 peptide-to-spectrum matches
5%decoy hits
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Suspect peptide identifications happen.
The problem is that finding them requiresdetailed analysis of a single spectrum and its identifications, amongst thousands of
other spectra…
Eliminating false positives
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Automated interpretation
The Netherlands??
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Manual interpretation
Tyrosine phosporylation
See: Ghesquière and Helsens, Proteomics, 2010
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Peptizer expert system
Aggregation of the votes
Agent a
Agent b
Agent c
Agent d
Agent e
+ 1 + 1 0 -1 + 1Vote casts
Trustedsubset
Suspicioussubset
Confident Peptide Identifications
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
PROTEIN INFERENCE
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
1a 1b 3 4 6a 6b2 5
1a 1b 3 4 6a2 5
1a 1b 6a 6b2 5
1a 1b 3 6a 6b2 5
1b 3 4 6a 6b2 5
2 5
2 3 5
3 4 52
3 4 52
Gene
Transcripts
Translations
Intron Exon UTR Exon CDS Peptide
Peptidesmatching all transcriptsmatching a transcript subsetmatching exactly 1 translation
redundant
Not all peptides are created equal
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Sample preparation consequences
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
Sample preparation consequences
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
peptides a b c d
proteinsprot X x xprot Y xprot Z x x x
Minimal setOccam {
peptides a b c d
proteinsprot X x xprot Y xprot Z x x x
Maximal setanti-Occam {
peptides a b c d
proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x
Minimal set withmaximal annotation {
true Occam?See: Martens and Hermjakob, Molecular BioSystems, 2007
Protein inference: a question of conviction
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
ALGORITHMS FOR THE
PROTEIN INFERENCE PROBLEM
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
• IDPickerZhang et al, Journal of Proteome Research, 2007
• ProteinProphetNesvizhskii AI et al, Analytical Chemistry, 2003
• DBToolkitMartens et al, Bioinformatics, 2005http://genesis.UGent.be/dbtoolkit
A few algorithms for protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
IDPicker parsimonious protein assembly
(I) Initialize
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
IDPicker parsimonious protein assembly
(II) Collapse
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
IDPicker parsimonious protein assembly
(III) Separate
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
IDPicker parsimonious protein assembly
(IV) Reduce
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
peptideprobability
peptideweight
proteinprobability
In iteration 1, all weights w start off as 1/n,with n the degeneracy count for the peptide
peptide probability
See: Nesvizhskii AI et al., Analytical Chemistry, 2003
ProteinProphet: the simplified view
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
peptides a b cd
proteinsprot X(-) x xprot Y(+) xprot Z(0) x x x
Minimal set withmaximal annotation{
DBToolkit protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
peptides a b c d
proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x
Some indications from the HUPO BPP
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
PROTEIN INFERENCE AND
QUANTIFICATION
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Some inference examples (i)
See: Colaert et al, Proteomics, 2010
http://genesis.ugent.be/rover/
Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Some inference examples (ii)
See: Colaert et al, Proteomics, 2010
Nice and easy, down-regulated
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Some inference examples (iii)
See: Colaert et al, Proteomics, 2010
A little less easy, up-regulated
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Some inference examples (iv)
See: Colaert et al, Proteomics, 2010
A nice example of the mess of degenerate peptides
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Some inference examples (v)
See: Colaert et al, Proteomics, 2010
A bit of chaos, but a defined core distribution
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny Helsenskenny.helsens@UGent.be
Thank you!
Questions?