Proteomics Informatics Protein characterization: post...

Post on 10-Jun-2020

7 views 0 download

transcript

Proteomics Informatics – Protein characterization: post-translational

modifications and protein-protein interactions (Week 10)

Top down / bottom up

Top down

Bottom up

mass/charge

inte

nsit

y

Top down Bottom up

Charge distribution

mass/charge in

ten

sit

y

mass/charge

inte

nsit

y

1+

2+

3+

4+

27+

31+

Top down Bottom up

m = 1035 Da m = 1878 Da m = 2234 Da

Isotope distribution

mass/charge in

ten

sit

y

mass/charge

inte

nsit

y

Fragmentation

Top down Bottom up

Fragmentation

Correlations between modifications

Top down

Bottom up

Alternative Splicing

Top down

Bottom up

Exon 1 2 3

Top down

Kellie et al., Molecular BioSystems 2010

Protein mass spectra

Fragment mass spectra

Protein Complexes

A

B

A

C D

Digestion

Mass spectrometry

Sowa et al., Cell 2009

Protein Complexes – specific/non-specific binding

Protein Complexes – specific/non-specific binding

Choi et al., Nature Methods 2010

Tackett et al. JPR 2005

Protein Complexes – specific/non-specific binding

Analysis of Non-Covalent Protein Complexes

Taverner et al., Acc Chem Res 2008

Non-Covalent Protein Complexes

Schreiber et al., Nature 2011

More / better quality

interactions

Affinity Capture Optimization Screen

+

Cell extraction

Lysate clearance/

Batch Binding Binding/Washing/Eluting

SDS-PAGE

Filtration

LaCava, Hakhverdyan, Domanski, Rout

Over 20 different extraction and washing

conditions ~ 10 years or art.

(41 pullouts are shown)

Molecular Architecture of the NPC

Actual model

Alber F. et al. Nature (450) 683-694. 2007

Alber F. et al. Nature (450) 695-700. 2007

Cloning nanobodies for GFP pullouts

• Atypical heavy chain-only IgG antibody produced in camelid family – retain high affinity for antigen without light chain

• Aimed to clone individual single-domain VHH antibodies against GFP – only ~15 kDa, can be recombinantly expressed, used as bait for pullouts, etc.

• To identify full repertoire, will identify GFP binders through combination of high-throughput DNA sequencing and mass spectrometry

VHH clone for

recombinant

expression

Cloning llamabodies for GFP pullouts

Llama GFP

immunization

Lymphocyte

total RNA Crude serum

VHH amplicon

454 DNA

sequencing

RT / Nested PCR IgG fractionation &

GFP affinity purification

VHH DNA

sequence library

GFP-specific

VHH fraction

LC-MS/MS

GFP-specific

VHH clones

Bone marrow

aspiration Serum bleed

500 400 300

1000 bp

0

100,000

200,000

300,000

400,000

500,000

No

. o

f R

ea

ds

Read length (bp)

V

H

VHH

Fridy, Li, Keegan, Chait, Rout

CDR3: 100.0% (14/14); combined CDR: 100.0% (33/33); DNA count: 10

MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33; DNA count: 1

MADVQLVESGGGLVQSGGSRTLSCAASGRVLATYHLGWFRQSPGREREAVAAITWSAHSTYYSDSVKGRFTISIDNARNTGYLQMNSLKPEDTAVYYCTVRHGTWFTVSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33); DNA count: 1

MAQVQLVESGGALVQAGASLSVSCAASGGTISKYNMAWFRRAPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1

MAQVQLEESGGGLVQAGDSLTLSCSASGRTFTNYAMAWSRQAPGKERELLAAIDAAGGATYYSDSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1

MAQVQLVESGGGRVQAGGSLTLSCVGSEGIFWNHVMGWFRQSPGKDREFVARISKIGGTTNYADSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS

CDR1 CDR2 CDR3

Underlined regions are covered by MS

Rank sequences according to:

CDR3 coverage; Overall coverage;

Combined CDR coverage; DNA counts;

Identifying full-length sequences from peptides

Sequence diversity of 26 verified anti-GFP nanobodies

• Of ~200 positive sequence hits, 44 high confidence clones were synthesized

and tested for expression and GFP binding: 26 were confirmed GFP binders.

• Sequences have characteristic conserved VHH residues, but significant

diversity in CDR regions.

FR1 CDR1 FR2 CDR2 CDR3 FR3 FR4

HIV-1

gp120

Lipid Bilayer gp41

MA

CA

NC

PR

IN

RT

RNA

Particle

Genome

env

rev

vpu

tat

nef

3’ LTR 5’ LTR

vif gag

pol vpr

CA MA NC p6

PR RT IN

gp41 gp120

9,200 nucleotides

Genetic-Proteomic Approach

Tagged Viral Protein

Tag

Protein Complex SDS-PAGE

*

Mass

Spectrometry

I-Dirt for Specific Interaction

3xFLAG Tagged HIV-1 WT HIV-1

Infection

Light Heavy

(13C labeled Lys, Arg)

1:1 Mix

Immunoisolation

MS

I-DIRT = Isotopic Differentiation of Interactions as Random or Targeted

Lys Arg (+6 daltons) (+6 daltons)

Modified from Tackett AJ et al., J

Proteome Res. (2005) 4, 1752-6.

IDIRT and Reverse IDIRT

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.40 0.50 0.60 0.70 0.80 0.90 1.00

Sp

ec

ific

ity,

Re

rve

rse

Specificity, Forward

gp160 IDIRT: Forward-Reverse Ratio Comparison

Env-3xFLAG Vif-3xFLAG

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Re

vers

e R

ati

o

Forward Ratio

Forward and Reverse Ratio Comparison N = 273, ≥ 3 peptides quantified, S/N = 10.0

Luo, Jacobs, Greco, Cristae, Muesing, Chait, Rout

Protein Exchange

Vif-3F

Heavy labeled Vif-3F lysate

IP in heavy labeled Vif-3F

lysate

Vif-3F

Light labeled wt lysate

Incubation with light labeled wt lysate

Vif-3F

15min

Vif-3F

5min

Stable Interactor

Vif-3F

Interactor with fast

exchange

60min

Env Time Course SILAC

• Differentially labeled

infection harvested

at early or late

stage of infection

• Distinguish proteins

that interact with

Env at early or late

stage during

infection

Early during infection Late during infection

Light Heavy

(13C labeled

Lys, Arg) 1:1 Mix

Immunoisolation

MS

Early interactor Late interactor

M/Z

Peptides

Fragments

Fragmentation

Proteolytic

Peptides

Enzymatic Digestion

Protein

Complex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-Linked

Protein Complex

Interaction Partners by Chemical Cross-Linking

M/Z

Peptides

Fragments

Fragmentation

Proteolytic

Peptides

Enzymatic Digestion

Protein

Complex

Chemical Cross-Linking

MS

MS/MS

Isolation

Cross-Linked

Protein Complex

Interaction Sites by Chemical Cross-Linking

Cross-linking

protein

n peptides with reactive groups

(n-1)n/2 potential ways to cross-link peptides pairwise

+ many additional uninformative forms

Protein A + IgG heavy chain 990 possible peptide pairs

Yeast NPC ˜106 possible peptide pairs

Protein Crosslinking by Formaldehyde

~1% w/v Fal

20 – 60 min

~0.3% w/v Fal

5 – 20 min

1/100 the volume

LaCava

Protein Crosslinking by Formaldehyde

RED: triplicate experiments, FAl treated grindate

BLACK: duplicated experiments, FAl treated cells (then ground)

SCORE: Log Ion Current / Log protein abundance Akgöl, LaCava, Rout

Cross-linking

Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides. For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation. High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides. Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Number of fragment ions

Pro

bab

ilit

y o

f L

ocali

zati

on

Phosphopeptide

identification

mprecursor = 2000 Da

Dmprecursor = 1 Da

Dmfragment = 0.5 Da

Phosphorylation

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f L

ocali

zati

on

Number of fragment ions

ID

3

Localization (dmin=3)

mprecursor = 2000 Da

Dmprecursor = 1 Da

Dmfragment = 0.5 Da

Phosphorylation

dmin>=3 for 47%

of human tryptic

peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f L

ocali

zati

on

Number of fragment ions

ID

3

2

Localization (dmin=2)

mprecursor = 2000 Da

Dmprecursor = 1 Da

Dmfragment = 0.5 Da

Phosphorylation

dmin=2 for 33% of

human tryptic

peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f L

ocali

zati

on

Number of fragment ions

ID

3

2

1

Localization (dmin=1)

mprecursor = 2000 Da

Dmprecursor = 1 Da

Dmfragment = 0.5 Da

Phosphorylation

dmin=1 for 20% of

human tryptic

peptides

Localization of modifications

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Pro

bab

ilit

y o

f L

ocali

zati

on

Number of fragment ions

ID3211*

Localization

(d=1*)

mprecursor = 2000 Da

Dmprecursor = 1 Da

Dmfragment = 0.5 Da

Phosphorylation

Localization of modifications

Peptide with two possible modification sites

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsit

y

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsit

y

Matching

Localization of modifications

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsit

y

Matching

Which assignment does

the data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

AAYYQK

Visualization of evidence for localization

AAYYQK

Visualization of evidence for localization

AAYYQK

AAYYQK

Visualization of evidence for localization

3

2

1

3

2

1

Estimation of global false localization rate using decoy sites

By counting how many times the phosphorylation is localized to

amino acids that can not be phosphorylated we can estimate the

false localization rate as a function of amino acid frequency.

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

0

0.005

0.01

0.015

0.02

0 0.05 0.1 0.15

Amino acid frequency

Fa

lse l

oc

ali

zati

on

fre

qu

en

cy

Y

S2

1

Sm

1

How much can we trust a single localization assignment?

If we can generate the distribution of scores for

assignment 1 when 2 is the correct assignment, it is

possible to estimate the probability of obtaining a certain

score by chance for a given peptide sequence and

MS/MS spectrum assignment.

SSmm

21

0

2

1

2

1

2

0

2

1

2

1

2

2

1

1

dSSF

dSSFp

S m

)(

)(

1.

2.

Is it a mixture or not?

If we can generate the distribution of scores for

assignment 2 when 1 is the correct assignment, it is

possible to estimate the probability of obtaining a certain

score by chance for a given peptide sequence and

MS/MS spectrum assignment.

S1

2

Sm

2

SSmm

21

0

1

2

1

2

1

0

1

2

1

2

1

1

2)(

)(2

dSSF

dSSFp

Sm

1.

2.

ppppthth

and1

2

2

11 and 2

ppppthth

and1

2

2

11

ppppthth

and1

2

2

1

ppppthth

and1

2

2

11 or 2

Ø )( ppSS mm 1

2

2

121

Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsit

y

Matching

Which assignment does

the data support?

1, 1 or 2, or 1 and 2?

Localization of modifications

Proteomics Informatics – Protein characterization: post-translational

modifications and protein-protein interactions (Week 10)