+ All Categories
Transcript
Page 1: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein

Digests

Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein

Mass Spectrometry Data Center

Page 2: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Library searching in not new

Organize for Reuse

Page 3: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

MS Library Searching

• Hertz, Hites and Biemann Anal. Chem. (1971).

• PBM: McLafferty, Hertel, Villwock Org. Mass Spectrom. (1974).

• SISCOM: Damen, Henneberg, Weimann, Anal. Chem. Acta (1978).

• INCOS: Sokolow, Karnofsky, Gustafson , Finnigan Application Report 2 (March 1978).

• Stein, Scott J. Amer. Soc. Mass Spectrom., (1994).

Page 4: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

‘Dot Product’(cosine of ‘angle’ between a pair of spectra)

• Measured = f(m/z abundance) • Reference = f(m/z abundance)• f(abundance) : Weight as you like

RM

MRSum over all peaks in common

Normalize

Page 5: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Traditional GC/MS Library Search

Page 6: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Variability Depends on S/N

~7,000 Radiodurans

Peptides, LCQ

(PNNL/NCRR)Medians

Page 7: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Library Searching for Peptides

• LIBQUEST (Yates)– Yates et al, Anal. Chem., 1998, 70, 3557

• X!Hunter (Beavis)– Craig et al, J. Proteome Res., 2006, 5, 1843

• BiblioSpec (MacCoss)– Frewen et al., Anal. Chem. 2006, 78, 5678

• Spectral Comparison (Kearney) – Liu et al, Proteome Science 2007, 5:3

• SpectraST (Aebersold)– Lam et al., Proteomics 2007 6, 655-667

• NIST Peptide Ion Fragmentation Library– June 2006 release (US-HUPO – March 2004)

Page 8: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Page 9: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Why Spectrum Libraries?

• More sensitive

• Better scoring

• Faster

• Annotation

• Unrestricted precursor ion

Page 10: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Fraction of MS/MS Spectra Identified vs S/N

0.001

0.01

0.1

1

1 10 100 1000 10000

S/N

Fra

cti

on

ID

ed

All Peptides

HSA Peptides

HSA-OMSSA

Identification by Spectrum Matching is More Sensitive than by Spectrum/Sequence Matching

Simple Protein Mix

Page 11: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Spectrum/Spectrum Scores are More Robust than Sequence/Spectrum Scores

Sequence score

99% Confidence

Page 12: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

0.005/s vs. 6.2/s per query spectrum

Matching Spectra is Faster than Matching Sequence

Page 13: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Reference Library Building

• Extract identified spectra from sequence search

– Multiple search engines

– Instrument-class specific

• Create ‘consensus’ spectra

– Two or more matching spectra, also save best

• Assign probability of being correct

– Refine confidence starting from decoy FDR

– Classify peptides – tryptic, missed cleavage, semi, mods

• Create searchable spectral library

– Resolve conflicts, add annotation

Page 14: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Three Classes of Libraries

I. Conventional Target Identification

– Peptides (Proteins)

II. Identifiable

– By unconventional searching

III. Not Identifiable

– Account for all recurring spectra

– QA/QC

Page 15: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

I. OMSSA overlap with MS/MS Library Search

747 1350 353

34K6/06

318 1752 833

78K6/07

Identified spectra (1% FDR) for 1-D Yeast NCI/CPTAC – Vanderbilt

Page 16: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Semitryptic

Tryptic bad miss

Tryptic missed cleavage Tryptic

Identified Spectra: Yeast - 1 D

Page 17: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

II. Identify What we CanDerive Class-specific FDR

• Tryptic– Simple– Expected missed cleavages– Unexpected missed cleavages

• Semitryptic (cleaved tryptic)– No missed cleavage

• In source (with parent at same retention)• In sample

– Missed cleavage• In source (with parent)• In sample (obey rules)• Uncommon – reject

• Others …

Page 18: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Atypical Peptide Ionsuse Sequence Search Method

• Tryptic only with many mods• Less common: Methylation, Phosphorylation, …• Artifacts: Na, K, Carbamyl• InsPecT/Pevzner (Unidentified, +70)

• High charge states, >2 missed cleavages

• Use class specific score thresholds

Page 19: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

HSA/Fibrinogen/Transferrin Mix

6124 Consensus Peptide Spectra, IT, Qtof, TofTof

Ion Trap Peptide Ions: 1300 HSA, 1100 Fibrinogen, 700 Transferrin

Page 20: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

contiguous = tryptic, exploded = semitryptic

Bad missMissed

'Insample'

Insource

Unknown modBad miss

Missed

Simple

Identified Peptide Spectra - Simple Protein Mix

Page 21: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

III. Library ofRecurring, Unidentified Spectra

• Create consensus spectra– From similar spectra from an experiment

• Combine from multiple experiments

• Identify spectra in other experiments– QA/QC: Artifacts, in standards, …– Apply other sequencing methods

Page 22: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Assign all Spectra• Identified Spectrum

– Matches library peptide or unidentified spectrum– Subset of peaks match library spectrum (impure)– Similar to a matched spectrum (cluster)

• Not a Peptide– Low S/N

• Maximum/Median <15– High charge state (many large peaks)

• Proteins, large fragments, …– One dominant peak

• Stable ion, not peptide– Singly charged (high/low abund < 1.2)

• Probable artifact, lower probability of identification– Narrow m/z range

• Peptide?

Page 23: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

exploded = identified, contiguous = unidentified

Peptide?

1+ No ID

OtherLow S/N

NoID Lib/Impure

NoID Lib

Peptide/Impure

Peptide

Spectrum Classification - Yeast - 1D

Page 24: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

exploded = identified, contiguous = unidentified

Spectrum Classification - Simple Protein Mix

Peptide?

1+ NoID

NarrowComplex

Dominant PeakLow S/N

NoID Lib/ClusterNoID Lib/Impure

NoID lib

Pep/Cluster

Pep/Impure

Peptide

Page 25: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Library Pipeline of the Future

assigned

No ID No IDPep.Lib

Unass.Lib

unassigned

No ID

Garbage filter

Sequence Search,

De Novo,Theoretical

Spec,Similarity, ...No ID

assigned

Mass spectrometer

Page 26: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

NCI/NIH - CPTAC:Clinical Proteomic Technology Assessment

for Cancer

http://proteomics.cancer.gov

Technology assessment; develop standard protocols and clinical reference sets; and evaluate methods to ensure data reproducibility.

Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University,

University of California, San Francisco,, and Vanderbilt University School of Medicine.

NCI grants (U24CA126476-01, U24CA126485-01, U24CA126480-01, U24CA126477-01, and U24CA126479-01).

Page 27: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Page 28: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

RT: 10.01 - 70.06

15 20 25 30 35 40 45 50 55 60 65 70

Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

Re

la

tive

A

bu

nd

an

ce

NL: 6.73E6

TIC F: ITMS + c ESI Full ms [300.00-2000.00] MS NCI_study2_021607_sample1B228_vial_03

Run-to-Run Chromatographic Reproducibility

RT: 9.99 - 70.13

10 15 20 25 30 35 40 45 50 55 60 65 70

Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

Re

la

tive

A

bu

nd

an

ce

NL: 5.53E6

TIC F: ITMS + c ESI Full ms [300.00-2000.00] MS NCI_study2_021607_sample1B33_vial_01

Page 29: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

CPTAC_STUDY2_WEEK1_1B144_01 2/27/2007 11:31:04 AMHPLC: CPTAC - Dilute 150x - Inj 2 ul

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

75.15493.21

46.50516.2741.98

543.2548.07569.75

56.53409.54

63.95575.3141.54

749.799.06

401.1166.44500.81

3.95363.79

12.11401.11

33.69322.18

25.07337.68

80.95528.38

85.52426.73

88.72445.12

NL: 6.63E8

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS CPTAC_STUDY2_WEEK1_1B144_01

CPTAC_STUDY2_WEEK1_1B144_01 #1745 RT: 16.39 P: + NL: 6.89E6F: FTMS + p NSI Full ms [300.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

345.52

517.78371.10

540.29692.30 741.20 869.43 1679.171612.381497.98 1925.52

nw_022207o_liebler_study2_Vanderbilt_... 2/24/2007 6:08:26 AM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

12.65390.14

11.79401.11

13.35421.06

45.28647.29

31.61387.4529.16

722.32 40.65409.543.43

313.0249.31

547.3223.46

358.8540.14

660.0653.26

507.3015.17

588.3368.38

671.8256.75749.38 70.47

682.7095.22

313.0290.34

313.0283.06

313.02

NL: 4.30E7

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS nw_022207o_liebler_study2_Vanderbilt_Orib2_week1_1B035_070224060826

No scan(s) match the scan filter.

20070511_CPTAC_1B100 5/11/2007 10:26:15 AM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bun

dan

ce

67.05493.21

42.27722.32 45.38

395.2458.53

575.31 78.46671.82

41.92492.75 55.91

500.7533.29749.792.79

319.1071.35

829.3879.61

454.6930.64

371.1084.49

319.1190.86

673.3614.78

371.1024.32

371.109.45

371.10

NL:1.08E8

TIC F: FTMS + p NSI Full ms [300.00-2000.00] MS 20070511_CPTAC_1B100

No scan(s) match the scan filter.

IN_LTQm_041907_1B274_02 4/20/2007 11:21:41 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

dan

ce

39.78647.55

68.00682.96

63.29672.12

10.71493.27

1.17444.98

46.11547.69

35.05660.35 76.11

992.8662.31

556.9632.59

409.8577.97

673.64 85.691133.34

52.60749.65

3.98538.07

31.90501.74

16.74722.60

21.81516.71

87.16840.73 95.80

835.36

NL: 5.50E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c ESI Full ms MS IN_LTQm_041907_1B274_02

IN_LTQm_041907_1B274_02 #3160 RT: 34.26 P: + NL: 1.05E5F: ITMS + c ESI Full ms

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Inte

nsity

318.42

864.14

535.49472.81635.43

451.36751.54352.85 874.75 1069.11 1280.621171.71 1451.95 1553.29 1897.561725.84

liebler_Vanderbilt_1B121_100 2/24/2007 11:17:00 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive

Ab

un

dan

ce

47.00647.53

69.41672.1651.41

547.79

42.50409.84

71.26682.89

58.37749.72

65.71484.14

31.29722.72

33.23493.0329.63

441.01 83.82673.6975.32

544.3823.82

538.085.02

401.1619.28

749.9599.02

406.4594.31

461.97

NL: 6.09E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c NSI Full ms [300.00-2000.00] MS liebler_Vanderbilt_1B121_100

No scan(s) match the scan filter.

BroadOrbitrap

VandyOrbitrap

NYUOrbitrap

INCAPSLTQ

NCI_study2_021607_sample1B228_vial_03 2/16/2007 8:45:21 PM sample 1B

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive

Ab

un

dan

ce

32.21492.49

40.30647.47

27.35722.5022.63

537.94 63.77671.96

66.20682.8835.90

660.212.69

536.2947.13

547.6919.46749.98 49.18

500.93 56.48829.583.66

444.4273.99

992.737.98508.17

78.50674.14

86.001133.70

96.96435.96

NL: 2.51E6

Base Peak m/z= 400.00-2000.00 F: ITMS + c ESI Full ms MS NCI_study2_021607_sample1B228_vial_03

NCI_study2_021607_sample1B228_vial_03 #3205 RT: 34.26 P: + NL: 5.70E4F: ITMS + c ESI Full ms

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10000

20000

30000

40000

50000

Inte

nsi

ty

566.86

849.57

390.88

631.19

354.17 860.65530.80680.41 952.15731.19 1625.481237.471022.05 1364.34 1521.92 1696.42 1865.63 1951.58

NISTLTQ

VandyLTQ

0703141B289 3/15/2007 12:58:07 PM

RT: 0.00 - 100.00

0 10 20 30 40 50 60 70 80 90 100

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

39.43575.73

33.33410.00

41.64547.76

24.18516.87

42.88501.15

23.07722.47 47.81

481.7025.84718.0618.27

408.1053.04498.08

58.16426.98 95.61

432.1770.94615.41

65.81419.06

84.48419.13

74.13419.12

1.01445.03

15.131451.05

NL: 5.80E5

Base Peak m/z= 400.00-2000.00 F: ITMS + c NSI Full ms [300.00-2000.00] MS 0703141B289

No scan(s) match the scan filter.

PurdueLTQ

YICENQDSISSK

Lab-to-Lab Chromatography

Page 30: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

HSA_CAM_SigmaA9511_5H_8MS2_m2_10de_040406_05

Page 31: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Measures of Reproducibility

• Identified ions– Unique peptides, Ions, Spectrum counts

• Unidentified components– Classify by type, link to origin

• Ion cluster analysis– MS1 linked to MS2

• Chromatography– Time evolution of ion clusters

Page 32: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Ion Component Analysis

Page 33: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

Ion Component Analysis (Yeast)

1E-3 0.01 0.1

10

100

1000

Oversampling

Relative Component Intensity

Co

un

ts

Components All MS2 Sampled Peptides

Undersampling

Page 34: MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.

1E-4 1E-3 0.01 0.1

10

100

1000

Nu

mb

er

of C

om

po

ne

nts

Component Intensity

Components in Replicate Runs

total

sampled

identified

▲▼ run 1,2 ■ in both


Top Related