BITS - Protein inference from mass spectrometry data

Post on 18-Nov-2014

874 views 2 download

description

This is the fifth presentation of the BITS training on 'Mass spec data processing'. It reviews the problems of determining protein sequences of mass spec data, how to deal with it, with an overview of useful tools.Thanks to the Compomics Lab of the VIB for their contribution.

transcript

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Lennart MARTENSlennart.martens@ebi.ac.uk

Proteomics Services GroupEuropean Bioinformatics Institute

Hinxton, CambridgeUnited Kingdomwww.ebi.ac.uk

kenny helsens

kenny.helsens@ugent.be

Computational Omics and Systems Biology Group

Department of Medical Protein Research, VIBDepartment of Biochemistry, Ghent University

Ghent, Belgium

peptide validationand protein inference

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Raw data

Peaklists

Peptide sequences

Protein accession numbersdata sizeambiguity

See: Martens and Hermjakob, Molecular BioSystems, 2007

Data processing and information ambiguity

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

PEPTIDE IDENTIFICATION VALIDATION

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Populations and individuals

10,000 peptide-to-spectrum matches

5%decoy hits

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Suspect peptide identifications happen.

The problem is that finding them requiresdetailed analysis of a single spectrum and its identifications, amongst thousands of

other spectra…

Eliminating false positives

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Automated interpretation

The Netherlands??

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Manual interpretation

Tyrosine phosporylation

See: Ghesquière and Helsens, Proteomics, 2010

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Peptizer expert system

Aggregation of the votes

Agent a

Agent b

Agent c

Agent d

Agent e

+ 1 + 1 0 -1 + 1Vote casts

Trustedsubset

Suspicioussubset

Confident Peptide Identifications

See: Helsens et al, Molecular and Cellular Proteomics, 2008

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Peptizer expert system

See: Helsens et al, Molecular and Cellular Proteomics, 2008

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Peptizer expert system

See: Helsens et al, Molecular and Cellular Proteomics, 2008

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

PROTEIN INFERENCE

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

1a 1b 3 4 6a 6b2 5

1a 1b 3 4 6a2 5

1a 1b 6a 6b2 5

1a 1b 3 6a 6b2 5

1b 3 4 6a 6b2 5

2 5

2 3 5

3 4 52

3 4 52

Gene

Transcripts

Translations

Intron Exon UTR Exon CDS Peptide

Peptidesmatching all transcriptsmatching a transcript subsetmatching exactly 1 translation

redundant

Not all peptides are created equal

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Sample preparation consequences

See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005

Sample preparation consequences

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

peptides a b c d

proteinsprot X x xprot Y xprot Z x x x

Minimal setOccam {

peptides a b c d

proteinsprot X x xprot Y xprot Z x x x

Maximal setanti-Occam {

peptides a b c d

proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x

Minimal set withmaximal annotation {

true Occam?See: Martens and Hermjakob, Molecular BioSystems, 2007

Protein inference: a question of conviction

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

ALGORITHMS FOR THE

PROTEIN INFERENCE PROBLEM

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

• IDPickerZhang et al, Journal of Proteome Research, 2007

• ProteinProphetNesvizhskii AI et al, Analytical Chemistry, 2003

• DBToolkitMartens et al, Bioinformatics, 2005http://genesis.UGent.be/dbtoolkit

A few algorithms for protein inference

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

IDPicker parsimonious protein assembly

(I) Initialize

See: Zhang et al, Journal of Proteome Research, 2007

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

IDPicker parsimonious protein assembly

(II) Collapse

See: Zhang et al, Journal of Proteome Research, 2007

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

IDPicker parsimonious protein assembly

(III) Separate

See: Zhang et al, Journal of Proteome Research, 2007

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

IDPicker parsimonious protein assembly

(IV) Reduce

See: Zhang et al, Journal of Proteome Research, 2007

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

peptideprobability

peptideweight

proteinprobability

In iteration 1, all weights w start off as 1/n,with n the degeneracy count for the peptide

peptide probability

See: Nesvizhskii AI et al., Analytical Chemistry, 2003

ProteinProphet: the simplified view

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

peptides a b cd

proteinsprot X(-) x xprot Y(+) xprot Z(0) x x x

Minimal set withmaximal annotation{

DBToolkit protein inference

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

peptides a b c d

proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x

Some indications from the HUPO BPP

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

PROTEIN INFERENCE AND

QUANTIFICATION

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Some inference examples (i)

See: Colaert et al, Proteomics, 2010

http://genesis.ugent.be/rover/

Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Some inference examples (ii)

See: Colaert et al, Proteomics, 2010

Nice and easy, down-regulated

http://genesis.ugent.be/rover/

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Some inference examples (iii)

See: Colaert et al, Proteomics, 2010

A little less easy, up-regulated

http://genesis.ugent.be/rover/

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Some inference examples (iv)

See: Colaert et al, Proteomics, 2010

A nice example of the mess of degenerate peptides

http://genesis.ugent.be/rover/

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Some inference examples (v)

See: Colaert et al, Proteomics, 2010

A bit of chaos, but a defined core distribution

http://genesis.ugent.be/rover/

BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011

Kenny Helsenskenny.helsens@UGent.be

Thank you!

Questions?