MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick,...

Post on 13-Jan-2016

230 views 0 download

transcript

MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein

Digests

Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein

Mass Spectrometry Data Center

Library searching in not new

Organize for Reuse

MS Library Searching

• Hertz, Hites and Biemann Anal. Chem. (1971).

• PBM: McLafferty, Hertel, Villwock Org. Mass Spectrom. (1974).

• SISCOM: Damen, Henneberg, Weimann, Anal. Chem. Acta (1978).

• INCOS: Sokolow, Karnofsky, Gustafson , Finnigan Application Report 2 (March 1978).

• Stein, Scott J. Amer. Soc. Mass Spectrom., (1994).

‘Dot Product’(cosine of ‘angle’ between a pair of spectra)

• Measured = f(m/z abundance) • Reference = f(m/z abundance)• f(abundance) : Weight as you like

RM

MRSum over all peaks in common

Normalize

Traditional GC/MS Library Search

Variability Depends on S/N

~7,000 Radiodurans

Peptides, LCQ

(PNNL/NCRR)Medians

Library Searching for Peptides

• LIBQUEST (Yates)– Yates et al, Anal. Chem., 1998, 70, 3557

• X!Hunter (Beavis)– Craig et al, J. Proteome Res., 2006, 5, 1843

• BiblioSpec (MacCoss)– Frewen et al., Anal. Chem. 2006, 78, 5678

• Spectral Comparison (Kearney) – Liu et al, Proteome Science 2007, 5:3

• SpectraST (Aebersold)– Lam et al., Proteomics 2007 6, 655-667

• NIST Peptide Ion Fragmentation Library– June 2006 release (US-HUPO – March 2004)

Why Spectrum Libraries?

• More sensitive

• Better scoring

• Faster

• Annotation

• Unrestricted precursor ion

Fraction of MS/MS Spectra Identified vs S/N

0.001

0.01

0.1

1

1 10 100 1000 10000

S/N

Fra

cti

on

ID

ed

All Peptides

HSA Peptides

HSA-OMSSA

Identification by Spectrum Matching is More Sensitive than by Spectrum/Sequence Matching

Simple Protein Mix

Spectrum/Spectrum Scores are More Robust than Sequence/Spectrum Scores

Sequence score

99% Confidence

0.005/s vs. 6.2/s per query spectrum

Matching Spectra is Faster than Matching Sequence

Reference Library Building

• Extract identified spectra from sequence search

– Multiple search engines

– Instrument-class specific

• Create ‘consensus’ spectra

– Two or more matching spectra, also save best

• Assign probability of being correct

– Refine confidence starting from decoy FDR

– Classify peptides – tryptic, missed cleavage, semi, mods

• Create searchable spectral library

– Resolve conflicts, add annotation

Three Classes of Libraries

I. Conventional Target Identification

– Peptides (Proteins)

II. Identifiable

– By unconventional searching

III. Not Identifiable

– Account for all recurring spectra

– QA/QC

I. OMSSA overlap with MS/MS Library Search

747 1350 353

34K6/06

318 1752 833

78K6/07

Identified spectra (1% FDR) for 1-D Yeast NCI/CPTAC – Vanderbilt

Semitryptic

Tryptic bad miss

Tryptic missed cleavage Tryptic

Identified Spectra: Yeast - 1 D

II. Identify What we CanDerive Class-specific FDR

• Tryptic– Simple– Expected missed cleavages– Unexpected missed cleavages

• Semitryptic (cleaved tryptic)– No missed cleavage

• In source (with parent at same retention)• In sample

– Missed cleavage• In source (with parent)• In sample (obey rules)• Uncommon – reject

• Others …

Atypical Peptide Ionsuse Sequence Search Method

• Tryptic only with many mods• Less common: Methylation, Phosphorylation, …• Artifacts: Na, K, Carbamyl• InsPecT/Pevzner (Unidentified, +70)

• High charge states, >2 missed cleavages

• Use class specific score thresholds

HSA/Fibrinogen/Transferrin Mix

6124 Consensus Peptide Spectra, IT, Qtof, TofTof

Ion Trap Peptide Ions: 1300 HSA, 1100 Fibrinogen, 700 Transferrin

contiguous = tryptic, exploded = semitryptic

Bad missMissed

'Insample'

Insource

Unknown modBad miss

Missed

Simple

Identified Peptide Spectra - Simple Protein Mix

III. Library ofRecurring, Unidentified Spectra

• Create consensus spectra– From similar spectra from an experiment

• Combine from multiple experiments

• Identify spectra in other experiments– QA/QC: Artifacts, in standards, …– Apply other sequencing methods

Assign all Spectra• Identified Spectrum

– Matches library peptide or unidentified spectrum– Subset of peaks match library spectrum (impure)– Similar to a matched spectrum (cluster)

• Not a Peptide– Low S/N

• Maximum/Median <15– High charge state (many large peaks)

• Proteins, large fragments, …– One dominant peak

• Stable ion, not peptide– Singly charged (high/low abund < 1.2)

• Probable artifact, lower probability of identification– Narrow m/z range

• Peptide?

exploded = identified, contiguous = unidentified

Peptide?

1+ No ID

OtherLow S/N

NoID Lib/Impure

NoID Lib

Peptide/Impure

Peptide

Spectrum Classification - Yeast - 1D

exploded = identified, contiguous = unidentified

Spectrum Classification - Simple Protein Mix

Peptide?

1+ NoID

NarrowComplex

Dominant PeakLow S/N

NoID Lib/ClusterNoID Lib/Impure

NoID lib

Pep/Cluster

Pep/Impure

Peptide

Library Pipeline of the Future

assigned

No ID No IDPep.Lib

Unass.Lib

unassigned

No ID

Garbage filter

Sequence Search,

De Novo,Theoretical

Spec,Similarity, ...No ID

assigned

Mass spectrometer

NCI/NIH - CPTAC:Clinical Proteomic Technology Assessment

for Cancer

http://proteomics.cancer.gov

Technology assessment; develop standard protocols and clinical reference sets; and evaluate methods to ensure data reproducibility.

Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University,

University of California, San Francisco,, and Vanderbilt University School of Medicine.

NCI grants (U24CA126476-01, U24CA126485-01, U24CA126480-01, U24CA126477-01, and U24CA126479-01).

RT: 10.01 - 70.06

15 20 25 30 35 40 45 50 55 60 65 70

Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

Re

la

tive

A

bu

nd

an

ce

NL: 6.73E6

TIC F: ITMS + c ESI Full ms [300.00-2000.00] MS NCI_study2_021607_sample1B228_vial_03

Run-to-Run Chromatographic Reproducibility

RT: 9.99 - 70.13

10 15 20 25 30 35 40 45 50 55 60 65 70

Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

Re

la

tive

A

bu

nd

an

ce

NL: 5.53E6

TIC F: ITMS + c ESI Full ms [300.00-2000.00] MS NCI_study2_021607_sample1B33_vial_01

CPTAC_STUDY2_WEEK1_1B144_01 2/27/2007 11:31:04 AMHPLC: CPTAC - Dilute 150x - Inj 2 ul

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

75.15493.21

46.50516.2741.98

543.2548.07569.75

56.53409.54

63.95575.3141.54

749.799.06

401.1166.44500.81

3.95363.79

12.11401.11

33.69322.18

25.07337.68

80.95528.38

85.52426.73

88.72445.12

NL: 6.63E8

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS CPTAC_STUDY2_WEEK1_1B144_01

CPTAC_STUDY2_WEEK1_1B144_01 #1745 RT: 16.39 P: + NL: 6.89E6F: FTMS + p NSI Full ms [300.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

345.52

517.78371.10

540.29692.30 741.20 869.43 1679.171612.381497.98 1925.52

nw_022207o_liebler_study2_Vanderbilt_... 2/24/2007 6:08:26 AM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

12.65390.14

11.79401.11

13.35421.06

45.28647.29

31.61387.4529.16

722.32 40.65409.543.43

313.0249.31

547.3223.46

358.8540.14

660.0653.26

507.3015.17

588.3368.38

671.8256.75749.38 70.47

682.7095.22

313.0290.34

313.0283.06

313.02

NL: 4.30E7

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS nw_022207o_liebler_study2_Vanderbilt_Orib2_week1_1B035_070224060826

No scan(s) match the scan filter.

20070511_CPTAC_1B100 5/11/2007 10:26:15 AM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bun

dan

ce

67.05493.21

42.27722.32 45.38

395.2458.53

575.31 78.46671.82

41.92492.75 55.91

500.7533.29749.792.79

319.1071.35

829.3879.61

454.6930.64

371.1084.49

319.1190.86

673.3614.78

371.1024.32

371.109.45

371.10

NL:1.08E8

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS 20070511_CPTAC_1B100

No scan(s) match the scan filter.

IN_LTQm_041907_1B274_02 4/20/2007 11:21:41 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

dan

ce

39.78647.55

68.00682.96

63.29672.12

10.71493.27

1.17444.98

46.11547.69

35.05660.35 76.11

992.8662.31

556.9632.59

409.8577.97

673.64 85.691133.34

52.60749.65

3.98538.07

31.90501.74

16.74722.60

21.81516.71

87.16840.73 95.80

835.36

NL: 5.50E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c ESI Full ms MS IN_LTQm_041907_1B274_02

IN_LTQm_041907_1B274_02 #3160 RT: 34.26 P: + NL: 1.05E5F: ITMS + c ESI Full ms

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Inte

nsity

318.42

864.14

535.49472.81635.43

451.36751.54352.85 874.75 1069.11 1280.621171.71 1451.95 1553.29 1897.561725.84

liebler_Vanderbilt_1B121_100 2/24/2007 11:17:00 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive

Ab

un

dan

ce

47.00647.53

69.41672.1651.41

547.79

42.50409.84

71.26682.89

58.37749.72

65.71484.14

31.29722.72

33.23493.0329.63

441.01 83.82673.6975.32

544.3823.82

538.085.02

401.1619.28

749.9599.02

406.4594.31

461.97

NL: 6.09E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c NSI Full ms [300.00-2000.00] MS liebler_Vanderbilt_1B121_100

No scan(s) match the scan filter.

BroadOrbitrap

VandyOrbitrap

NYUOrbitrap

INCAPSLTQ

NCI_study2_021607_sample1B228_vial_03 2/16/2007 8:45:21 PM sample 1B

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive

Ab

un

dan

ce

32.21492.49

40.30647.47

27.35722.5022.63

537.94 63.77671.96

66.20682.8835.90

660.212.69

536.2947.13

547.6919.46749.98 49.18

500.93 56.48829.583.66

444.4273.99

992.737.98508.17

78.50674.14

86.001133.70

96.96435.96

NL: 2.51E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c ESI Full ms MS NCI_study2_021607_sample1B228_vial_03

NCI_study2_021607_sample1B228_vial_03 #3205 RT: 34.26 P: + NL: 5.70E4F: ITMS + c ESI Full ms

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10000

20000

30000

40000

50000

Inte

nsi

ty

566.86

849.57

390.88

631.19

354.17 860.65530.80680.41 952.15731.19 1625.481237.471022.05 1364.34 1521.92 1696.42 1865.63 1951.58

NISTLTQ

VandyLTQ

0703141B289 3/15/2007 12:58:07 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

39.43575.73

33.33410.00

41.64547.76

24.18516.87

42.88501.15

23.07722.47 47.81

481.7025.84718.0618.27

408.1053.04498.08

58.16426.98 95.61

432.1770.94615.41

65.81419.06

84.48419.13

74.13419.12

1.01445.03

15.131451.05

NL: 5.80E5

Base Peak m/z= 400.00-2000.00 F: ITMS + c NSI Full ms [300.00-2000.00] MS 0703141B289

No scan(s) match the scan filter.

PurdueLTQ

YICENQDSISSK

Lab-to-Lab Chromatography

HSA_CAM_SigmaA9511_5H_8MS2_m2_10de_040406_05

Measures of Reproducibility

• Identified ions– Unique peptides, Ions, Spectrum counts

• Unidentified components– Classify by type, link to origin

• Ion cluster analysis– MS1 linked to MS2

• Chromatography– Time evolution of ion clusters

Ion Component Analysis

Ion Component Analysis (Yeast)

1E-3 0.01 0.1

10

100

1000

Oversampling

Relative Component Intensity

Co

un

ts

Components All MS2 Sampled Peptides

Undersampling

1E-4 1E-3 0.01 0.1

10

100

1000

Nu

mb

er

of C

om

po

ne

nts

Component Intensity

Components in Replicate Runs

total

sampled

identified

▲▼ run 1,2 ■ in both