Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011...

Post on 23-Dec-2015

218 views 2 download

transcript

Proteomics Informatics WorkshopPart II: Protein Characterization

David Fenyö

February 18, 2011

• Top-down/bottom-up proteomics• Post-translational modifications• Protein complexes• Cross-linking• The Global Proteome Machine Database

MSMS/MS

Biological System

Samples

Information about each sample

Information about the biological system

Measurements

What does the sample contain?

How much?

Proteomics Informatics

ExperimentalDesign

Data Analysis

InformationIntegration

SamplePreparation

What does the sample contain?

How much?

Biological System

Information about each sample

Information about the biological system

What does the sample contain?

How much?

Sample Preparation

ExperimentalDesign

Data Analysis

InformationIntegration

MSMS/MS

Samples

Measurements

SamplePreparation

What does the sample contain?

How much?

EnrichmentSeparation etc

DigestionTopdown Bottom

upPeptidesProteins

Fragmentation

Fragments

Top down / bottom up

Top down

Bottom up

mass/charge

inte

nsi

ty

Top down Bottom up

Charge distribution

mass/chargein

ten

sity

mass/charge

inte

nsi

ty

1+

2+

3+

4+

27+

31+

Top down Bottom up

m = 1035 Da m = 1878 Da m = 2234 Da

Isotope distribution

mass/chargein

ten

sity

mass/charge

inte

nsi

ty

Fragmentation

Top down Bottom up

Fragmentation

Correlations between modifications

Top down

Bottom up

Alternative Splicing

Top down

Bottom up

Exon 1 2 3

Top down

Kellie et al., Molecular BioSystems 2010

Proteinmass

spectraFragment

mass spectra

Non-Covalent Protein Complexes

Schreiber et al., Nature 2011

Dynamic Range in Proteomics

Large discrepancy between the experimental dynamic range and the range of amounts of different proteins in a proteome

ExperimentalDynamic Range

Distribution of Protein Amounts

Log (Protein Amount)

Nu

mb

er o

f P

rote

ins

The goal is to identify and characterize all components of a proteome

Desired Dynamic Range

Experimental Designs

SimulatedProtein Separation

PeptideSeparation

"Retention time" (bin)

y

1 k

y

1 k

# o

f p

ep

tid

es

p

er

bin

Mass SpectrometryMS

dynamicrange

10

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5m6

10

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5m6

Protein AbundanceProtein Abundance

Digestion

Sample

Parameters in Simulation

● Distribution of protein amounts in sample

● Loss of peptides before binding to the column

● Loss of peptides after elution off the column

● Distribution of mass spectrometric response for different peptides present at the same amount

● Total amount of peptides that are loaded on column (limited by column loading capacity)

● # of peptide fractions

● # of Proteins in each fraction

● Total amount of peptides that are loaded on column (limited by column loading capacity)

● # of peptide fractions

● Dynamic range of mass spectrometer

● Detection limit of mass spectrometer

Protein Separation

PeptideSeparation

"Retention time" (bin)

y

1 k

y

1 k

# o

f p

ep

tid

es

p

er

bin

Mass SpectrometryMS

dynamicrange

10

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5m6

10

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

MS dynamicrange

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5m6

Protein AbundanceProtein Abundance

Digestion

Sample

Simulation Results for 1D-LC-MS

Complex Mixtures of Proteins

RPC

Digestion

MS Analysis

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

No ProteinSeparation

Protein Separation:10 fractions

Protein Separation:10 fractions

No ProteinSeparation

Tissue

Tissue

Body Fluid

Body Fluid

Success Rate of a Proteomics Experiment

DEFINITION: The success rate of a proteomics experiment is defined as the number of proteins detected divided by the total number of proteins in the proteome.

Log (Protein Amount)

Nu

mb

er o

f P

rote

ins

ProteinsDetected

Distribution of Protein Amounts

Relative Dynamic Range of a Proteomics Experiment

DEFINITION: RELATIVE DYNAMIC RANGE, RDRx,

where x is e.g. 10%, 50%, or 90%

Log (Protein Amount)

RDR90

RDR50

RDR10Fra

cti

on

of

Pro

tein

s D

etec

ted

Nu

mb

er o

f P

rote

ins

ProteinsDetected

Distribution of Protein Amounts

Repeat Analysis

1 Analysis2 Analyses3 Analyses4 Analyses5 Analyses6 Analyses7 Analyses8 Analyses

Repeat Analysis: Comparison of Simulations and Experiments

0

0.1

0.2

0.3

0 2 4 6 8 10

Number of Repeats

Su

ce

ss

Ra

te

Experiment

Simulation

0

0.1

0.2

0.3

0.4

0.5

0 2 4 6 8 10

Number of Repeats

RD

R1

0

Experiment

Simulation

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Su

cc

es

s R

ate

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

Number of Proteins in Mixture

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

Tissue

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er o

f P

rote

ins

Body Fluid Body Fluid1 1 2

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Su

cc

es

s R

ate

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

RDR50 Success Rate

TissueBody Fluid

1

1

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Tissue 2

2

2

Amount loaded and peptide separation

1. Protein separation2. Amount loaded 3. Peptide separation

Order:

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

11

11

Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

11

11

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

22

Tissue

11

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

11

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

11

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

33

Amountloaded

33

Tissue1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rel

ati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rel

ati

ve D

yna

mic

Ran

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

11

11

Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

44

Peptideseparation

44

33

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

33

Amountloaded

1. Protein separation2. Peptide separation3. Amount loaded

11

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

22

1111

Tissue1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Re

lati

ve D

yna

mic

Ran

ge Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

1111

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Peptideseparation

33

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rel

ati

ve D

yna

mic

Ran

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rel

ati

ve D

yna

mic

Ran

ge Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

1111

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Proteinseparation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er o

f P

rote

ins

44

Amountloaded44

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Peptideseparation

33

Protein separationAmount loadedPeptide separation

Ranges:Protein separation: 30000 – 3000 proteins in each fractionAmount loaded: 0.1 ug – 10 ugPeptide separation: 100 – 1000 fractions

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Number of fragment ions

Pro

bab

ilit

y o

f L

oca

liza

tio

n

Phosphopeptide identification

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f Lo

cali

zati

on

Number of fragment ions

ID

3

Localization (dmin=3)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin>=3 for 47% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f Lo

cali

zati

on

Number of fragment ions

ID32

Localization (dmin=2)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin=2 for 33% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f Lo

cali

zati

on

Number of fragment ions

ID321

Localization (dmin=1)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

dmin=1 for 20% of human tryptic peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f Lo

cali

zati

on

Number of fragment ions

ID3211*

Localization(d=1*)

mprecursor = 2000 DaDmprecursor = 1 DaDmfragment = 0.5 DaPhosphorylation

Localization of modifications

Peptide with two possible modification sites

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsi

ty

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsi

ty

Matching

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsi

ty

Matching

Which assignment doesthe data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

AAYYQK

Visualization of evidence for localization

AAYYQK

Visualization of evidence for localization

AAYYQK

AAYYQK

Visualization of evidence for localization

3

2

1

3

2

1

Estimation of global false localization rate using decoy sites

By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency.

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

Amino acid frequency

Fal

se l

oca

liza

tio

n f

req

uen

cy

Y

S21

Sm1

How much can we trust a single localization assignment?

If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

SSmm21

0

2

1

2

1

2

0

2

1

2

1

2

2

1

1

dSSF

dSSFp

S m

)(

)(

1.

2.

Is it a mixture or not?

If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

S12

Sm2

SSmm21

0

12

12

1

0

12

12

1

1

2)(

)(2

dSSF

dSSFp

Sm

1.

2.

ppppthth

and1

2

2

11 and 2

ppppthth

and1

2

2

11

ppppthth

and1

2

2

1

ppppthth

and1

2

2

11 or 2

Ø )( ppSS mm 1

2

2

121

Peptide with two possible modification sites

MS/MS spectrum

m/zIn

ten

sity

Matching

Which assignment doesthe data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

Protein Complexes

AB

A

CD

Digestion

Mass spectrometry

Tackett et al. JPR 2005

Protein Complexes – specific/non-specific binding

Sowa et al., Cell 2009

Protein Complexes – specific/non-specific binding

Protein Complexes – specific/non-specific binding

Choi et al., Nature Methods 2010

Analysis of Non-Covalent Protein Complexes

Taverner et al., Acc Chem Res 2008

Determining the architectures ofmacromolecular assemblies

Alber et al., Nature 2007

M/Z

PeptidesFragments

Fragmentation

ProteolyticPeptides

Enzymatic Digestion

ProteinComplex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-LinkedProtein Complex

Interaction Partners by Chemical Cross-Linking

M/Z

PeptidesFragments

Fragmentation

ProteolyticPeptides

Enzymatic Digestion

ProteinComplex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-LinkedProtein Complex

Interaction Sites by Chemical Cross-Linking

Cross-linking

protein

n peptides with reactive groups

(n-1)n/2 potential ways to cross-link peptides pairwise

+ many additional uninformative formsProtein A + IgG heavy chain 990 possible peptide pairs

Yeast NPC ˜106 possible peptide pairs

Cross-linking

Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides.

For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation.

High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides.

Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.

Search Results

Search Results

Search Results

GPMDB

2005 2006 2007 2008 2009 2010 20110

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

Year (as of Jan 1st)

Ass

ign

ed s

pect

raSequence-spectrum assignments in

GPMDB

0 20 40 60 80 100

chromatin

cytoskeleton

E.R.

Golgi

lysosome

mitochondrion

nuclear membrane

plasma membrane

ribosome

% genes

Human Genes Observed in GPMDB

-40

-30

-20

-10

0

10

20

30

40

N G P D E A V I S T L Y M F H Q K C R Wc

om

po

sit

ion

dif

fere

nc

e (

pe

rce

nt) b

Proteotypic peptide relative composition

Comparison with GPMDB

Most proteins show very reproducible peptide patterns

Comparison with GPMDB

Global frequency of observing a peptide

Peptide Sequence ObservationsFSTVAGESGSADTVR 2633FNTANDDNVTQVR 2432AFYVNVLNEEQR 1722LVNANGEAVYCK 1701

GPLLVQDVVFTDEMAHFDR 1637LSQEDPDYGIR 1560

LFAYPDTHR 1499NLSVEDAAR 1400

FYTEDGNWDLVGNNTPIFFIR 1386

ADVLTTGAGNPVGDK 1338

If the number of times a peptide sequence (i) has been observed is ni, then for a particular protein:

i

itotal nN

Global frequency of observing a peptide

Define a normalized global frequency of observation for a particular peptide sequence from a particular protein as:

total

ii N

n

Global frequency of observing a peptide (ω)

Peptide Sequence ωFSTVAGESGSADTVR 0.08FNTANDDNVTQVR 0.07AFYVNVLNEEQR 0.05LVNANGEAVYCK 0.05

GPLLVQDVVFTDEMAHFDR 0.05

LSQEDPDYGIR 0.04LFAYPDTHR 0.04NLSVEDAAR 0.04

FYTEDGNWDLVGNNTPIFFIR 0.04

ADVLTTGAGNPVGDK 0.04

Global frequency of observation (ω), catalase

1 2 3 4 5 6 7 8 9 10111213141516171819200.00

0.02

0.04

0.06

0.08

ω

Peptide sequences

Global frequency of observation (ω), catalase

For any set peptides observed in an experiment assigned to a particular protein (1 to j ):

j

jprotein )(

1)( protein

Omega (Ω) value for a protein identification

Protein ID Ω (z=2) Ω (z=3)SERPINB1 0.88 0.82SNRPD1 0.88 0.59

CFL1 0.81 0.87SNRPE 0.8 0.81

PPIA 0.79 0.64CSTA 0.79 0.36PFN1 0.76 0.61CAT 0.71 0.78

GLRX 0.66 0.8CALM1 0.62 0.76FABP5 0.57 0.17

Protein Ω’s for a set of identifications

Part of Best Practices Integrative Informatics Consultation Service (BPIC) at the NYU Center for Health Informatics and

Bioinformatics (CHIBI)

ContactInformaticsConsultation@nyumc.org

orDavid.Fenyo@nyumc.org

Walk-in Clinic:Wednesday, February 23, 3-5 pm

227 E 30th Street, 7th Floor, Room #739

Proteomics Consultation

Proteomics Informatics WorkshopPart III: Protein Quantitation

February 25, 2011

• Metabolic labeling – SILAC• Chemical labeling• Label-free quantitation• Spectrum counting• Stoichiometry• Protein processing and degradation• Biomarker discovery and verification

Proteomics Informatics Workshop

Part I: Protein Identification, February 4, 2011

Part II: Protein Characterization, February 18, 2011

Part III: Protein Quantitation, February 25, 2011