+ All Categories
Home > Documents > Proteomics Informatics Protein identification II: search engines and protein sequence...

Proteomics Informatics Protein identification II: search engines and protein sequence...

Date post: 21-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
63
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Transcript
Page 1: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Proteomics Informatics – Protein identification II: search engines and

protein sequence databases (Week 5)

Page 2: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

The response to random input data should be random.

Maximum number of correct identification and minimum

number of incorrect identifications for any data set.

Maximal separation between scores for correct

identifications and the distribution of scores for random

matching proteins for any data set.

The statistical significance of the results should be

calculated.

The searches should be fast.

General Criteria for a Good Protein Identification Algorithms

Page 3: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Search Parameters

Parent tolerance +/- daltons/ppm

Frag. Tolerance +/- daltons/ppm

Complete mods Cys alkylation

Potential mods

(artifacts)

Met/Trp oxidation,

Gln/Asn deamidation

Potential mods

(PTMs)

Phosphoryl, sulfonyl, acetyl, methyl, glycosyl, GPI

Cleavage Trypsin ([KR]|{P})

Scoring method Scores or statistics

Sequences FASTA files

Page 4: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

MS

Identification – Peptide Mass Fingerprinting

MS

Digestion

All Peptide Masses

Pick Protein

Compare, Score, Test Significance

Rep

ea

t for e

ac

h p

rote

in

Sequence DB

Identified Proteins

Page 5: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Response to Random Data

Nor

malized F

requ

enc

y

Page 6: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

ProFound – Search Parameters

http://prowl.rockefeller.edu/

Page 7: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

ProFound – Protein Identification by Peptide Mapping

pattern

r

i

iirr

i

i F

mmrmm

gN

rNIkPDIkP

2

1

2

0

minmax

1 2

)(

2exp

2!

)!()|()|(

W. Zhang & B.T. Chait,

Analytical Chemistry

72 (2000) 2482-2489

Page 8: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

ProFound Results

Page 9: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Peptide Mapping – Mass Accuracy

ProFound

0

1

2

3

4

5

6

7

0 0.5 1 1.5 2

Mass Tolerance (Da)

-lo

g(e

)

Mascot

0

20

40

60

80

100

120

140

0 0.5 1 1.5 2

Mass Tolerance (Da)S

co

re

Page 10: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Peptide Mapping - Database Size

S. cerevisiae

Fungi

All Taxa

Expectation Values

Peptide mapping example:

S. Cerevisiae 4.8e-7

Fungi 8.4e-6

All Taxa 2.9e-4

Page 11: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Missed Cleavage Sites

u = 1

u = 2

u = 4

Expectation Values

Peptide mapping example:

u=1 4.8e-7

u=2 1.1e-5

u=4 6.8e-4

Page 12: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Peptide Mapping - Partial Modifications

No Modifications

Phophorylation (S, T, or Y)

Searched Searched With

Without Possible

Modifications Phosphorylation

of S/T/Y

DARPP-32 0.00006 0.01

CFTR 0.00002 0.005

Even if the protein is modified it is usually better to

search a protein sequence database without

specifying possible modifications using peptide

mapping data.

Page 13: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Peptide Mapping - Ranking by Direct Calculation of the Significance

Page 14: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

MS/MS

Lysis

Fractionation

Tandem MS – Database Search

MS/MS

Digestion

Sequence DB

All Fragment Masses

Pick Protein

Compare, Score, Test Significance

Rep

eat fo

r all p

rote

ins

Pick Peptide LC-MS

Rep

ea

t for

all p

ep

tides

Page 15: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Algorithms

Page 16: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Comparing and Optimizing Algorithms

Score

Score 1-Specificity

1-Specificity

Se

ns

itiv

ity

Se

ns

itiv

ity

Algorithm 1

Algorithm 2

True

True

False

False

Score

Score 1-Specificity

1-Specificity

Se

ns

itiv

ity

Se

ns

itiv

ity

Algorithm 1

Algorithm 2

True

True

False

False

Page 17: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

17

MS/MS - Parent Mass Error and Enzyme Specificity

)!!( ybIII nnxx

Expectation Values

MS/MS example:

Dm=2, Trypsin 2.5e-5

Dm=100, Trypsin 2.5e-5

Dm=2, non-specific 7.9e-5

Dm=100, non-specific 1.6e-4

Page 18: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Sequest

Cross-correlation

Page 19: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

X! Tandem - Search Parameters

http://www.thegpm.org/

Page 20: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

X! Tandem - Search Parameters

Page 21: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

X! Tandem - Search Parameters

Page 22: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

sequences

sequences

spectra

Conventional,

single stage searching

Generic search engine

Test all

cleavages,

modifications,

& mutations

for all sequences

Page 23: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Determining potential modifications

- e.g., oxidation, phosphorylation, deamidation

- calculation order 2n

- NP complete

Some hard problems in MS/MS analysis in proteomics

Allowing for unanticipated peptide cleavages - e.g., chymotryptic contamination in trypsin - calculation order ~ 200 × tryptic cleavage - “unfortunate” coefficient

Detecting point mutations - e.g., sequence homology - calculation order 18N

- NP complete

Page 24: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

sequences

sequences

spectra

Multi-stage searching

Tryptic

cleavage

Modifications #1

Modifications #2

Point mutation

X! Tandem

Page 25: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Search Results

Page 26: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Search Results

Page 27: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Sequence Annotations

Page 28: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Search Results

Page 29: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Search Results

Page 30: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Mascot

http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS

Page 31: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Lysis

Fractionation

Digestion

LC-MS/MS

Identification – Spectrum Library Search

MS/MS

Spectrum Library

Pick

Spectrum

Compare, Score, Test Significance

Rep

eat fo

r a

ll sp

ec

tra

Identified Proteins

Page 32: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

1. Find the best 10 spectra for a particular

sequence, with the same PTMs and charge.

2. Add the spectra together and normalize the

intensity values.

3. Assign a “quality” value: the median

expectation value of the 10 spectra used.

4. Record the 20 most intense peaks in the

averaged spectrum, it’s parent ion z, m/z,

sequence, protein accessions & quality.

Steps in making an

Annotated Spectrum Library (ASL):

Page 33: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

0

2

4

6

8

10

0 10 20 30 40 50

peptide length

fraction o

f libra

ry (

%)

Spectrum Library Characteristics – Peptide Length

Page 34: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

0

10

20

30

40

50

10 30 50 70 90 110 130 150 170 190

protein Mr (kDa)

% c

ove

rag

e

residues

peptides

Spectrum Library Characteristics – Protein Coverage

Page 35: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Library spectrum

Test spectrum

(5:25)

(5:25)

Results: 4 peaks selected, 1 peak missed

Identification – Spectrum Library Search

Page 36: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Matches Probability

1 0.45

2 0.15

3 0.016

4 0.00039

5 0.0000037

Apply a hypergeometric probability model:

- 25 possible m/z values;

- 5 peaks in the library spectrum; and

- 4 selected by the test spectrum.

How likely is this?

Identification – Spectrum Library Search

Page 37: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

If you have 1000 possible m/z values and

20 peaks in test and library spectrum?

1.0E-14

1.0E-12

1.0E-10

1.0E-08

1.0E-06

1.0E-04

1.0E-02

1.0E+00

1 2 3 4 5 6 7 8 9 10

matches

p 1 matched: p = 0.6

5 matched: p = 0.0002

10 matched: p = 0.0000000000001

Identification – Spectrum Library Search

Page 38: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Experimental

Mass Spectrum

Library of Assigned

Mass Spectra

M/Z

Best search result

Identification – Spectrum Library Search

Page 39: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

X! Hunter

Page 40: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

1. Use dot product to find a library spectrum

that best matches a test spectrum.

2. Calculate p-value with hypergeometric

distribution.

3. Use p-value to calculate expectation value,

given the identification parameters.

4. If expectation value is less than the median

expectation value of the library spectrum,

report the median value.

X! Hunter algorithm:

Page 41: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

X! Hunter Result

Query Spectrum

Library Spectrum

Page 42: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Dynamic Range In Proteomics

Large discrepancy between the experimental dynamic

range and the range of amounts of different proteins in

a proteome

Experimental

Dynamic Range

Distribution of

Protein Amounts

Log (Protein Amount)

Nu

mb

er

of P

rote

ins

The goal is to identify and characterize all components of

a proteome

Desired Dynamic Range

Page 43: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Loss of

material

Limit of amount

of material

Loss of

material

Limit of amount

of material

Separation

of material

Detection limit

Dynamic range

Mass

Separation

Detection

Mass

Separation

Peptide

Separation

Peptide

Labeling

Protein

Separation

Digestion

Protein

Labeling

Sample

Extraction

Ionization

Fragmentation

Protein AbundanceProtein Abundance

Page 44: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Experimental Designs

Simulated

Protein Separation

Peptide

Separation

"Retention time" (bin)

y

1 k

y

1 k

# o

f

pe

pti

de

s

pe

r b

in

Mass SpectrometryMS

dynamic

range

10

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5

m6

10

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5

m6

Protein AbundanceProtein Abundance

Digestion

Sample

Page 45: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Parameters in Simulation

● Distribution of protein amounts in sample

● Loss of peptides before binding to the column

● Loss of peptides after elution off the column

● Distribution of mass spectrometric response for

different peptides present at the same amount

● Total amount of peptides that are loaded on

column (limited by column loading capacity)

● # of peptide fractions

● # of Proteins in each fraction

● Total amount of peptides that are loaded on

column (limited by column loading capacity)

● # of peptide fractions

● Dynamic range of mass spectrometer

● Detection limit of mass spectrometer

Protein Separation

Peptide

Separation

"Retention time" (bin)

y

1 k

y

1 k

# o

f

pe

pti

de

s

pe

r b

in

Mass SpectrometryMS

dynamic

range

10

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5

m6

10

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

MS dynamic

range

m1

m2

m3

m4

m5m

6

m1

m2

m3

m4

m5

m6

Protein AbundanceProtein Abundance

Digestion

Sample

Page 46: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Simulation Results for 1D-LC-MS

Complex Mixtures

of Proteins

RPC

Digestion

MS Analysis

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

No Protein

Separation

Protein

Separation:

10 fractions

Protein

Separation:

10 fractions

No Protein

Separation

Tissue

Tissue

Body Fluid

Body Fluid

Page 47: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Success Rate of a Proteomics Experiment

DEFINITION: The success rate of a proteomics experiment

is defined as the number of proteins detected divided by

the total number of proteins in the proteome.

Log (Protein Amount)

Nu

mb

er

of P

rote

ins

Proteins

Detected

Distribution of

Protein Amounts

Page 48: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Relative Dynamic Range of a Proteomics Experiment

DEFINITION: RELATIVE DYNAMIC RANGE, RDRx,

where x is e.g. 10%, 50%, or 90%

Log (Protein Amount)

RDR90

RDR50

RDR10 Fra

ctio

n o

f P

rote

ins

De

tec

ted

N

um

be

r o

f P

rote

ins

Proteins Detected

Distribution of Protein Amounts

Page 49: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Su

cc

es

s R

ate

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Su

cc

es

s R

ate

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000Number of Proteins in Mixture

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

Number of Proteins in Mixture

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Tissue

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Body Fluid Body Fluid 1 1 2

RDR50 Success Rate

Tissue

Body Fluid

1

1

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Tissue 2

2

2

Page 50: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100Amount Loaded [mg]

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

Amount Loaded [mg]S

uc

ce

ss

Ra

te

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

Amount Loaded [mg]S

uc

ce

ss

Ra

te

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100Amount Loaded [mg]

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

Amount of Peptides Loaded on the Column

Tissue Body Fluid Body Fluid 2 2 3

RDR50 Success Rate Tissue

Body Fluid

2

2

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Tissue 3

3

3

Page 51: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

0

0.2

0.4

0.6

0.8

1

10 100 1000 10000 100000Number of Peptide Fractions

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

0

0.2

0.4

0.6

0.8

1

10 100 1000 10000 100000Number of Peptide Fractions

Su

cc

es

s R

ate

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 2 4 6 8 10log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.2

0.4

0.6

0.8

1

10 100 1000 10000 100000Number of Peptide Fractions

Su

cc

es

s R

ate

0

0.2

0.4

0.6

0.8

1

10 100 1000 10000 100000Number of Peptide Fractions

Re

lati

ve

Dy

na

mic

Ra

ng

e (

RD

R5

0)

Peptide Separation

Tissue Body Fluid Body Fluid 3 3 4

RDR50 Success Rate

Tissue

Body Fluid

3 3

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

Tissue 4

4 4

Page 52: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Amount loaded and peptide separation

1. Protein separation

2. Amount loaded

3. Peptide separation

Order:

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

11

11

Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

11

11

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

22

Tissue

11

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

11

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

11

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Amount

loaded 33

Tissue

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

11

11

Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

44

Peptide

separation

44

33

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Amount

loaded

1. Protein separation

2. Peptide separation

3. Amount loaded

11

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive

Dyn

am

ic R

an

ge

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

22

1111

Tissue

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive D

yn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive D

yn

am

ic R

an

ge Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

1111

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Peptide

separation

33

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive D

yn

am

ic R

an

ge

1.0

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1.0

Success Rate

Rela

tive D

yn

am

ic R

an

ge Tissue

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

1111

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

22

Protein

separation

22

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

44

Amount

loaded 44

0

0.005

0.01

0.015

0.02

0.025

0 1 2 3 4 5 6log(Protein Amount)

Nu

mb

er

of

Pro

tein

s

33

Peptide

separation

33

Protein separation

Amount loaded

Peptide separation

Ranges:

Protein separation: 30000 – 3000 proteins in each fraction

Amount loaded: 0.1 ug – 10 ug

Peptide separation: 100 – 1000 fractions

Page 53: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Repeat Analysis

1 Analysis

Page 54: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

2 Analyses

Repeat Analysis

Page 55: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

3 Analyses

Repeat Analysis

Page 56: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

4 Analyses

Repeat Analysis

Page 57: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

5 Analyses

Repeat Analysis

Page 58: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

6 Analyses

Repeat Analysis

Page 59: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

7 Analyses

Repeat Analysis

Page 60: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

8 Analyses

Repeat Analysis

Page 61: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Repeat Analysis: Simulations

0

0.1

0.2

0.3

0 2 4 6 8 10

Number of Repeats

Su

ce

ss

Ra

te

Experiment

Simulation

0

0.1

0.2

0.3

0.4

0.5

0 2 4 6 8 10

Number of Repeats

RD

R1

0

Experiment

Simulation

Page 62: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Summary

• The success rate of proteome analysis is influenced by the following factors (listed in order of importance):

• Amount of peptides loaded on column or

mass spectrometric detection limit

• The degree of peptide separation or

mass spectrometric dynamic range

• The degree of protein separation

Page 63: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence

Proteomics Informatics – Protein identification II: search engines and

protein sequence databases (Week 5)


Recommended