+ All Categories
Home > Documents > Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser...

Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser...

Date post: 26-Dec-2015
Category:
Upload: tracey-simmons
View: 222 times
Download: 0 times
Share this document with a friend
50
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012 University of Adelaide December, 2012 1
Transcript
Page 1: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

1

Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS

Karl R. ClauserBroad Institute of MIT and Harvard

BioInfoSummer 2012University of Adelaide

December, 2012

Page 2: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

2

Topics Covered

• Basics of phospho site identification and localization• Evolution of phosphoproteomic literature MS/MS reporting• Modification site localization algorithm development• 2010 ABRF-iPRG study of phosphopeptide ID and site localization • Emerging false localization rate (FLR) metrics

Page 3: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

3

Localizing a Phosphorylation SiteL/F|P/A/D|T/s/P/S T A\T K

L/F|P/A/D|t S/P/S T A\T K

Page 4: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

4

PTM Site LocalizationTest all Locations, Examine Score Gaps

No possibleambiguity

SingleSite

MultipleSites

AVsEEQQPALK

# PO4 sites = # S,T, or Y

AVS(1.0)EEQQPALK

APS(0.99)LT(0.0)DLVKAPsLTDLVK *APSLtDLVK -

Locations Tested Conclusion

S(0.50)S(0.50)S(0.0)AGPEGPQLDVPRsSSAGPEGPQLDVPR * SsSAGPEGPQLDVPR * SSsAGPEGPQLDVPR -

VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGRVTNDIsPEsSPGVGR *VTNDIsPESsPGVGR *VTNDISPEssPGVGR -VtNDIsPESSPGVGR -VtNDISPEsSPGVGR -VtNDISPESsPGVGR -

Page 5: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

5

PTM Site Localization – Confident Localization

(K)A/P|s|L/T D|L\V K(S)

APS(0.99)LT(0.0)DLVK

Page 6: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

6

PTM Site Localization – Ambiguous Localization

(R)S s/S/A/G/P E/G/P Q L|D|V|P R(E)

S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR

Page 7: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

7

PTM Site Localization – Ambiguous Localization2 sites: 1 confident, 1 ambiguous

(R)V T N D|I|s/P E|s S/P G V\G R(R)

VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR

Page 8: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

8

Reliability of LC/MS/MS Phosphoproteomic Literature ~2005Citation Approach Instrument #sites #ambiguous Scores Site Supplem.

sites Shown Ambiq LabeledShown Spectra

Ballif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no2004 MCP, 3, digest, SCX1093-1101 LC/MS/MS

Rush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no2005, Nat Biotech, 23, pTyr Ab94-101 LC/MS/MS

Collins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no2005, J Biol Chem, 280, peptide IMAC5972-5982 LC/MS/MS

Gruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no2005 MCP, 4, SCX, IMAC310-327 LC/MS/MS

“Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it was tabulated as ambiguous.”

“All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences were manually assigned…”

“All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establish their credibility.”

“Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.”

Page 9: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

9

• The site(s) of modification Within each peptide sequence, all modifications must be clearly located (unless ambiguous; see below) and the manner in which this was accomplished (through computation or manual inspection) must be described.

• A justification for any localization score threshold employed.• Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from

those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must clearly labeled as such.

Examples of ambiguities include:• Modified peptides in which one or more modification sites are ambiguous.• Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be

assigned.• Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section

IV).• Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities

may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in negative ion mode for assignment of phosphorylation over sulfonation), or biological or chemical strategies.

• Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or accompany the manuscript as described in guideline II.

MCP Guideline for publishing PTM data ~2010

III. POST-TRANSLATIONAL MODIFICATIONSStudies focusing on posttranslational modifications (PTMs) require specialized methodology and documentation to assign the type(s) and site(s) of the modification(s). The guidelines in this section apply to PTMs that occur under physiological conditions and to which biological significance may be assigned, such as phosphorylation, glycosylation, etc. as well as purposefully induced chemical modifications of central importance to the results of the study, such as chemical cross‐linking. These guidelines do not apply to common modifications arising from sample handling or preparation such as oxidation of Met or alkylation of Cys. In addition to the tabular presentation(s) of the data described in guideline II, the following information is required:

http://www.mcponline.org/

Page 10: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

10

Supplemental Table Links to Each Labeled Spectrum

Page 11: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

11

Spectrum Mill Scoring of MS/MS Interpretations

Peak Selection: De-Isotoping, S/N thresholding,Parent - neutral removal, Charge assignment

Match to Database Candidate Sequences

Score=

Assignment Bonus(Ion Type Weighted)

+Marker Ion Bonus

(Ion Type Weighted) -

Non-assignment Penalty(Intensity Weighted)

12.68 92%

SPI (%)Scored Peak Intensity

Page 12: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

12

Spectrum Mill Variable Modification Localization Score

VML score = Difference in Score of same identified sequences with different variable modification localizations

VML score > 1.1 indicates confident localization

Why a threshold value of 1.1?1 implies that there is a distinguishing ion of b or y ion type0.1 means that when unassigned, the peak is 10% the intensity of the base peak

Page 13: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

13

*

*

Page 14: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

14

VML Scoring - Room for Improvement

S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK

VML score: 1.09

y12

b2

Page 15: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

15

VML Scoring - Room for Improvement

VML score: 0.49S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR

S(0.0)T(0.0)[S(0.5)T(0.5)]P[T(0.5)S(0.5)]PGPR

Page 16: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

16

Phosphosite Localization Scoring - Ascore

http://ascore.med.harvard.edu/Supports Sequest results only, Linux onlyBeausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292.

7

0.07 0.07

Page 17: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

17

Phosphosite Localization Scoring - Andromeda

P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ]) = (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ])

PTM score = -10 x log (P)

p: 0.04 - use the 4 most intense fragment ions per 100 m/z unitsn: total num possible b/y ions in the observed mass range for all possible combinations of PO 4 sites in a peptidek: number of peaks matching n

Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48.Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422.

Page 18: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

18

True Probability or Just Effective Scores?

Peak selection assumptions• All regions of spectrum equally likely

• multiply charged fragments below precursor• some 100-300 m/z values not possible, dipeptide AA combinations• tolerance in Da, not ppm

• Tall and short peak intensities equally diagnostic

Fragment ion type assumptions• All ion types equally probable• Neutral losses ignored, y-H3P04, y-H2O

Page 19: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

19

Phosphosite Localization Scoring - PhosphoRS

Taus, T., Kocher, T., Pichler, P., Paschke, C., Schmidt, A., Henrich, C., and Mechtler, K. (2011) J Proteome Res. 10(12): 5354-62.

N: total # of extracted peaksd: fragment ion mass tolerancew: full mass range of spectrum

Score all theoretical fragment ions, not just site determining ions.

Page 20: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

20

Key Aspects of Scoring Localizations

• Select peaks in spectrum to be used for identification/localization• Test all sequence/location possibilities• Assign fragment ion types to peaks

• Allow for peaks to have different ion type assignments for conflicting localization possibilities

• Use score differences to make decision on localization certainty/ambiguity• Decide upon conservative/aggressive thresholds.

• Provide a clear representation of the certainty/ambiguity in localization of each site

• Allow for multiple sites with mix of certainty and ambiguity in localization• Distinguish between:

• Ambiguity – no distinguishing evidence, i.e. either possibility• Ambiguity – conflicting evidence, multiple co-eluting isoforms present

How can we calculate a false localization rate as a standard measure of certainty for phosphosite assignment across a dataset?

Page 21: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

iPRG: Informatic Evaluation of Phosphopeptide Identification and

Phosphosite Localization

ABRF 2010, Sacramento, CAMarch 22, 2010

21

Page 22: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Study Goals

22

1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories

2. Characterize the underlying reasons why result sets differ

3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource

Page 23: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Study Design

23

• Use a common dataset• Use a common sequence database• Allow participants to use the bioinformatic tools

and methods of their choosing• Use a common reporting template• Fix the identification confidence (1% FDR)• Require an indication of phosphosite ambiguity

per spectrum• Ignore protein inference – for now

Page 24: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Study Materials and Instructions to Participants

24

• 1 Orbitrap XL dataset (3 files)– RAW, mzML, mzXML,

MGF, pkl or dta – conversions by ProteoWizard

• 1 FASTA file (SwissProt human seq’s. v57.1)

• 1 template (Excel)• 1 on-line survey (Survey

Monkey)

1. Analyze the dataset2. Report the phosphopeptide

spectrum matches in the provided template

3. Complete an on-line survey4. Attach a 1-2 page description

of your methodology

Page 25: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Reporting Template

25

Name of data file (e.g., D20090930_PM_K562_SCX-IMAC_fxn03)

Identifiers should be unique scan numbers from data file but may also refer to a merged range of MS/MS scans (e.g., Scan:19, 2316.19.19.3.dta, 2316.19.19.3.pkl).

Precursor m/z as submited to search engine

Precursor charge reported by search engine

Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a trailing symbol (e.g. SLS#GS#PCPK) OR a string in parentheses (e.g. SLS(ph)GS(ph)PCPK) immediately following each phosphorylated residue. Only phosphorylation of S, T and Y will be compared; all other modifications (e.g., oxidized M) will be ignored. It will be assumed that all modifications indicated on S, T or Y are phosphorylations.

Protein identifier(s) from Fasta file. Use multiple values if peptide is found in multiple proteins, e.g., Q9NZ18; Q9UQ35. Protein inference will not be scored.

Total number of phosphorylations as evidenced by the precursor m/z and MS2 spectrum.

'Y' indicates this match is BETTER than the confidence threshold. 'N' indicates the match is WORSE. Please report BOTH types of identifications in your ranked list. Is this match above 1% FDR identification threshold (Y|N)?

Indicate 'Y' if ALL phosphorylations have been confidently localized. 'N' if one or more have not. Are ALL phosphosites unambiguously localized (Y|N)?

Peptide identification score reported by search engine (e.g., E-value, p-value, probability, Mascot score, etc.)

File Spectrum IdentifierPrecursor m/z

Precursor Charge Peptide Sequence Accession(s)

Num. Phospho sites

Peptide Identification Certainty

Phosphosite Localization Certainty

Peptide Identification Score

D20090930_PM_K562_SCX-IMAC_fxn03Scan:908 558.7576 2 qGsPVAAGAPAK Q9NZI8 1 Y Y 0.0002097

D20090930_PM_K562_SCX-IMAC_fxn04Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1 Y N 45.41

D20090930_PM_K562_SCX-IMAC_fxn03Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2 Y N 30.09

D20090930_PM_K562_SCX-IMAC_fxn03Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2 Y N 31.79

D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 SLsGSsPcPK Q9UQ35 2 Y N 0.0112023D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 sLSGSsPcPK Q9UQ35 2 Y N 0.0915611

ABRF iPRG 2010 Study Template: Phosphorylated Peptide AnalysisInstructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to '[email protected]' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey --------------------->

REQUIRED FIELDS

Page 26: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

26

55%

45%

Membership (n=33)

ABRF MemberNon-member

73%

9%

6%6% 6%

Type of Lab

AcademicBiotech/Pharma/IndustryContract Research OrgGovernmentOther

9% 6%

15%

70%

Location

AsiaAustralia/New ZealandEuropeNorth Amercia

39%

15%

42%

3%

Resource Lab Status

Conduct both core func-tions and non-core lab researchCore onlyNon-core research labSoftware development only

58%

9%

12%

18%3%

Primary Job Function

Bioinformatician/DeveloperDirector/ManagerLab ScientistMass SpectrometristOther

1-2 years 3-4 years 5-10 years >10 years Unanswered02468

10121416

Proteomics Experience

• 59 requests / 32 submissions (54% return) 2 retractions + 7 iPRG members and 1 guest

Page 27: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Software Tools Used

27

Phosphosite Localization

0

1

2

3

4

5

6

Ascore

custo

m

In-house

MaxQ

uant

msInsp

ect

Myri

Match

NNScore PLS

Phosphinato

r

PhosphoSc

ore

Prophossi

Spectrum M

ill

Peptide Identification

02468

10121416

Masco

t

X!Tandem

OMSS

A

SEQUEST

Myri

Match

in-house

PeptideProphet

Scaffold

InsPecT

PepARML

Peptizer

pFind

TPP

iProphet

MaxQ

uant

msInsp

ect

MSP

epSearch

OpenMS/TOPP

ProteinPro

phetPvie

w

SpectraST

SpectrumM

ill

thegp

m

Page 28: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

The SCX/IMAC Enrichment Approach for Phosphoproteomics

28

Sample: 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Protocol: Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638.Lysis: 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitorsSCX: PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn .IMAC: Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7MS/MS: Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1

and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm

Page 29: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Preliminary Analysis of SCX Fractions and Dataset Selection

2929

0

500

1000

1500

2000

2500

3000

3500

2 3 4 5 6 7 8 9 10 11 12

SCX fr #

# sp

ectr

a

z4

z3

z2

Precursor z

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12

SCX fr#

% d

isti

nc

t p

ep

tid

es

3P

2P

1P

# phosphosites

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12

SCX fr #

% d

isti

nc

t p

ep

tid

es

6SC

5SC

4SC

3SC

2SC

1SC

0SC

-1SC

Solution charge

Frxn 3: multi-phosphositesFrxn 4: single phospho, single basicFrxn 12: multi-basic residues (RHK)

Page 30: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

From 30,000 Ft.

30

0

1000

2000

3000

4000

5000

6000

7000

8000

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

8704

84i

8524

6

1386

7

2044

1v

4081

6i

2010

9

5030

8i

2985

0v

5636

5

6639

8

9194

3i

4758

7

7126

3

6521

1

6310

3

9721

9i

2081

4

6196

3v

1862

1

7463

7

1576

9

7711

4

6651

4

7711

5

# spectra Id Yes# spectra Loc Yes# unique Peptides UC ID Yes

Participant alias14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486

i45682

870484

i85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

Spectral pre-processing Ih IhRr, Ih Ih Ih Bw Ih Ih Mq Sm   Sm         Mc

Rr, Xc     Mq

Di, Mq Bw Ih       Em     Ih    

R, Xc Ih

precursor m/z adjusted     Y Y Y Y Y Y Y Y   Y         Y         Y                 Y   Y  

nterm acetyl Y     Y Y   Y   Y Y         Y           Y Y   Y                      

Peptide identification

My, Om, Se, Xt, Pp

Om, Xt, Pp, TPP, Ip, Sp Se Pf Pf

Se, Pp Mp Om

Ma, Mq Sm

Ma, My, Om, Xt, Pl Sm

Ma, My, Om, Xt, Pl

Ma, Om, Xt Xt Ma

My, Xt, In

Ma, Ih Ma In Ma Ma Se

Ma, In, Op, Pz Se

Ma, Sc, Xt

Xt*, Sc Ma

Xt, Gp

Ma, Xt, Sc Pv

Ma, Ih  

Se, Pp, Ih

Ma, Xt, Sc

Phosphosite localization Ih Ih As IhPf, As As Ih Ph Mq Sm Ih Sm Ih As     Id

As, Ih Ma In Mq   Ps In Ih   Ih As Ih   Ih

Ih, Pr   As Ih

Page 31: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Software Program Abbreviations

31

Software Program KeyAscore AsBioworks BwDistiller Diextract_msn EmTheGPM Gpin-house IhInspect InIdPicker IpiProphet IdMascot Mamsconvert McmsInspect MiMyriMatch MmMSPepSearch + Spec Lib. MpMaxQuant MqmsInspect MsOMSSA OmOpenMS OppFind PfPhosphinator PhpepARML PlPeptideProphet PpPeptizer PzProphossi PrPhosphoScore PsPview PvReAdW RrScaffold ScSEQUEST SeSpectrum Mill SmSpectraST + Spec Lib. SpXcalibur XcX!Tandem XtX!Tandem (k-score) Xt*

The data analysis tools used by the participants were collected from the on-line survey as reported by the participants. Many participants used multiple search engines and most used a software tool to localize the phosphosites. Moreover, many in-house (Ih) or custom software tools were used in the study, only some of which are published. The key at the left can be used to decode the names of the software tools in the table above, and the table is sorted (by number of confident peptide identifications), exactly as in the histogram above.

Page 32: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Relative Performance: Identification By Fraction

32

0

500

1000

1500

2000

2500

3000

3500

400014

941

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

8704

84i

8524

6

1386

7

2044

1v

4081

6i

2010

9

5030

8i

2985

0v

5636

5

6639

8

9194

3i

4758

7

7126

3

6521

1

6310

3

9721

9i

2081

4

6196

3v

1862

1

7463

7

1576

9

7711

4

6651

4

7711

5

# s

pe

ctr

a I

d Y

es

# spectra Id Yes Frxn 3# spectra Id Yes Frxn 4# spectra Id Yes Frxn 12

Performance was not equivalent

across the 3 fractions for all

participants.

Some participants saw more unique

peptides than others.

0

500

1000

1500

2000

2500

3000

3500

4000

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

8704

84i

8524

6

1386

7

2044

1v

4081

6i

2010

9

5030

8i

2985

0v

5636

5

6639

8

9194

3i

4758

7

7126

3

6521

1

6310

3

9721

9i

2081

4

6196

3v

1862

1

7463

7

1576

9

7711

4

6651

4

7711

5

# u

niq

ue

pep

tid

es U

C Id

Yes

# unique peptides UC Id Yes Frxn 3

# unique peptides UC Id Yes Frxn 4

# unique peptides UC Id Yes Frxn 12

Page 33: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Room for Improvement in ID Certainty Thresholds

33

0

200

400

600

800

1000

1200

1400

1600

1800

1494

187

133

2273

086

010

1380

084

940v

2089

9i53

706

9253

6i87

0486

4568

287

0484

8524

613

867

2044

1v40

816i

2010

950

308i

2985

0v56

365

6639

891

943i

4758

771

263

6521

163

103

9721

9i20

814

6196

3v18

621

7463

715

769

7711

466

514

7711

5

# sp

ectr

a

#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single

ii

Frxn 3 – most multiple phos per peptide

0

400

800

1200

1600

2000

2400

2800

1494

187

133

2273

086

010

1380

084

940v

2089

9i53

706

9253

6i87

0486

4568

287

0484

8524

613

867

2044

1v40

816i

2010

950

308i

2985

0v56

365

6639

891

943i

4758

771

263

6521

163

103

9721

9i20

814

6196

3v18

621

7463

715

769

7711

466

514

7711

5

# sp

ectr

a

#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single

ii

Frxn 12 – highest precursor charges

0

1000

2000

3000

4000

1494

187

133

2273

086

010

1380

084

940v

2089

9i53

706

9253

6i87

0486

4568

287

0484

8524

613

867

2044

1v40

816i

2010

950

308i

2985

0v56

365

6639

891

943i

4758

771

263

6521

163

103

9721

9i20

814

6196

3v18

621

7463

715

769

7711

466

514

7711

5

# sp

ectr

a

#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single

ii

Frxn 4 – most phosphopeptides

Gray means – Number of spectra where < 2 people agreed on the Id

85246: 1205 spectra with 3-15 phosphosites, 624 spectra with 4-15

20814: ?, Frxn 12 >> Frxn 3,477114, 77115: merged multiple scans, so

can’t be compared with other 33

Page 34: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4

34

YY: Y – identification Y – localizationYN: Y – identification N – localizationNS: N – identification, but top sequence same as consensusND: N – identification, and top sequence different than consensus

Page 35: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Subset of Participants Used for Localization Analysis

35

Excluded 0 0% localization1 100% localizationF FDR - very high?R Replicate submissionM Merged spectraC Categorization ErrorsA Y Loc only when

no possible ambiguity

0

1000

2000

3000

4000

5000

6000

7000

8000

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

8704

84i

8524

6

1386

7

2044

1v

4081

6i

2010

9

5030

8i

2985

0v

5636

5

6639

8

9194

3i

4758

7

7126

3

6521

1

6310

3

9721

9i

2081

4

6196

3v

1862

1

7463

7

1576

9

7711

4

6651

4

7711

5

# sp

ectr

a

# spectra Id Yes# spectra Loc Yes

RF 1 0 1 A0 F 1 CM 0 M

35

22

0

1000

2000

3000

4000

5000

6000

7000

8000

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

1386

7

2044

1v

2010

9

5030

8i

5636

5

9194

3i

4758

7

7126

3

9721

9i

6196

3v

1862

1

# sp

ectr

a

# spectra Id Yes# spectra Loc Yes

Page 36: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

If Participants Agree on the Identity, Do They Also Agree Site Localization Can be Certain?

36

Frxn 4Subset of472 spectrafor which20/22 participantsall agree onIdentity

No possibility of ambiguity

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

NPA10%25%40%55%70%85%100%

% participants indicating localization Yes

% o

f s

pe

ctr

a

Page 37: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

What Fraction of the Time Do They Agree On Localization(s)?

37

4685, 79%

563, 10%

670, 11% 100% partic agree

67-99% partic agree

< 67% partic agree

5918Y loc

5918/8050 spectra with > 2/22 Loc Yesand Site Ambiguity Possible

8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12)

5918

798

498

836

0 1000 2000 3000 4000 5000 6000 7000

# Y loc 2-22 partic

#Y loc 1 partic

# N loc all partic

no ambiguity

# spectra

For all of the participants that agree on identity when• site ambiguity is possible (#S,T,Y > # phos)• >2 participants mark Loc=Y

For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites

Page 38: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Which Participants are More Likely to Disagree on Localization?

38

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

1386

7

2044

1v

2010

9

5030

8i

5636

5

9194

3i

4758

7

7126

3

9721

9i

6196

3v

1862

1%

of

spec

tra

in m

ino

rity

lo

cali

zati

on

ch

oic

e

0.0%

5.0%

10.0%

15.0%

20.0%

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

1386

7

2044

1v

2010

9

5030

8i

5636

5

9194

3i

4758

7

7126

3

9721

9i

6196

3v

1862

1% o

f sp

ectr

a in

min

ori

ty lo

caliz

atio

n c

ho

ice

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

1494

1

8713

3

2273

0

8601

0

1380

0

8494

0v

2089

9i

5370

6

9253

6i

8704

86i

4568

2

1386

7

2044

1v

2010

9

5030

8i

5636

5

9194

3i

4758

7

7126

3

9721

9i

6196

3v

1862

1%

of

spec

tra

in m

ino

rity

lo

cali

zati

on

ch

oic

e

# Spectra with Loc Agreement 50.1-99.9%

Frxn 3: 154Frxn 4: 498Frxn 12: 227

x-axis is sorted in descending order of# identified

Page 39: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Liberal Localizers are More Disagreeable

39

The participants who are the most willing to localize

are more likely to disagree with the majority view.

x-axis is sorted in descending order of # localized / # identified

Page 40: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

A Challenging Problem

40

P(m/z) -H3PO4

879

3/7 DSAIPVESDtDDEGAPR

14/21 said can identify peptide but can not localize site

4/7 DSAIPVEsDtDDEGAPR

Page 41: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Primary Observations from iPRG 2010 study

41

1. Wide range of spectra marked confidently identified.2. Wide range of spectra marked confidently localized.3. If all of the participants agree on the identification,

phosphosite ambiguity is possible, and that localization is possible, for 79% of the spectra, participants unanimously agree on the localization(s).

4. For the remaining 21%, the participants who are liberal localizers are more likely to disagree with the majority view.

Page 42: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

A BR F

Proteome InformaticsResearch Group

Acknowledgements

42

iPRG Members•Paul A. Rudnick (chair) – NIST•Manor Askenazi - Dana-Farber Cancer Institute•Karl R. Clauser - Broad Institute of MIT and Harvard•William S. Lane - Harvard University•Lennart Martens - Ghent University, Belgium•Karen Meyer-Arendt - University of Colorado•W. Hayes McDonald - Vanderbilt University•Brian C. Searle - Proteome Software, Inc.•Jeffrey A Kowalak (EB Liaison) – NIMH

Additional Contributors• Philipp Mertins, The Broad Institute

–All wet lab work and an analysis• Steve Gygi, Harvard Medical School

–Test datasets• Matthew Chambers, Vanderbilt University Medical Center

–Data format conversions (ProteoWizard)• Steve Stein and Yuri Mirokhin, NIST

–A K562 phosphopeptide spectral library• Renee Robinson, Harvard University

–“The Anonymizer”

Page 43: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

Emerging False Localization Rate (FLR) Metrics

43

Target/Decoy for localizationDecoy - AA’s that can not biologically bear the modification

IssuesAllow decoys only during localization, not during identification

otherwise will bias identification FDRAmbiguity – more allowed sites will yield more ambiguous

assignments, so may need to score targets and decoys separately then compare

Frequency - decoy AA occurrence should be similar to target AAsotherwise FLR will be inaccurate

Proximity – a decoy AA nearer the site of a target AA has better chance of matchingPro and Glu often found in the consensus motifs of many kinases

Page 44: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

AA Frequency in the Proteome

44

http://proteomics.broadinstitute.org/millhtml/faindexframe.htmselect the Calculate statistics utility

Page 45: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

• Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-CID): 70,000 phosphopeptide spectra identified

• Altered Batch-Tag to allow for phosphorylation of Pro and Glu

• Filtered results to only phosphopeptide IDs containing one S, T or Y

• Modification site known

• Local FLR: SLIP score of 6 = 95% correct

• Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.

Baker, P.R., Trinidad, J.C., and Chalkley, R.J. (2011) Mol Cell Proteomics. M111.008078.

ProteinProspector SLIP Scoring and Local FLR

Page 46: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

Closing Thoughts

46

• More research in the area of FLR metric calculation is critical to the field for developing standard confidence thresholds for modification site localization.

• An ambiguous modification localization decision for a particular peptide spectrum match is far preferable to getting it wrong.

• As more raw LC-MS/MS data from PTM studies is deposited in the public domain, it becomes increasingly possible for knowledgebases to undertake efforts to reprocess the data with the most recent algorithms and scoring metrics and enforce uniform quality standards on the information they disseminate.

• PHOSIDA (www.phosida.com) disseminates modification sites identified and localized in publications emerging only from research in the laboratory of Matthias Mann. So all MS/MS data has been analyzed through a common software platform and subject to consistent scoring thresholds.

Review ArticleModification Site Localization Scoring: Strategies and PerformanceChalkley, RJ and Clauser, KRMol Cell Proteomics 2012 11: 3-14. doi:10.1074/mcp.R111.015305.http://www.mcponline.org/

Page 47: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

Canonical pathways in lung cancer are being aggressively targeted for drug development

47

Janku et al. J Thoracic Oncol 2011; 6: 1601-1612

Page 48: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

EML4-ALK fusion

Crystal, Clinical Advances in Hematology & Oncology, 2011, 9, 207-214.

Page 49: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

Targeted therapy development time

49

Gerber and Minna Cancer Cell 2010; 18: 548-551

Page 50: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012.

The future of lung cancer management

50

• Diagnose earlier• Prognosticate better• Treat more precisely• Monitor more effectively

Herbst et al. NEJM 2008


Recommended