+ All Categories
Transcript
Page 1: Prediction of protein localization and membrane protein topology

Prediction of protein localization and membrane protein topology

Gunnar von Heijne

Department of Biochemistry and Biophysics

Stockholm Bioinformatics Center

Stockholm University

Page 2: Prediction of protein localization and membrane protein topology

Stockholm Bioinformatics Center

www.sbc.su.se

sorting

Page 3: Prediction of protein localization and membrane protein topology

Protein localization

Page 4: Prediction of protein localization and membrane protein topology

Protein sorting in a eukaryotic cell

SP

Page 5: Prediction of protein localization and membrane protein topology

The ’canonical’ signal peptide

n h c

-3 -1

n-region: positively charged

h-region: hydrophobic

c-region: more polar, small residues in -1, -3

mTP

Page 6: Prediction of protein localization and membrane protein topology

mTPs are rich in R & K and can form amphiphilic helices

(Abe et al., Cell 100:551)

cTP

mTP bound to Tom20

Page 7: Prediction of protein localization and membrane protein topology

Typical chloroplast transit peptide

IV X A A

mature

MA-

no G,P,K,R

no D,E

high S,T

no D,E

high S,T

high R

no D,E

high S,T

ANN

Page 8: Prediction of protein localization and membrane protein topology

A simple artificial neural network (ANN)

A C G T A C G T A C G T

A A G AC

1 0 0 0 1 0 0 0 0 0 1 0

ACGnot

ACG output layer

input layer

Inside ANN

Page 9: Prediction of protein localization and membrane protein topology

Artificial neural networks:a summary

- a high-quality dataset (positive and negative examples)

- an ANN architecture (can be optimized)

- all internal parameters in the ANN are systematically optimized during a training session

- evaluate the predictive performance using cross- validation

ChloroP

Page 10: Prediction of protein localization and membrane protein topology

ChloroP(Prot.Sci. 8:978)

0

10

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

MEME score

residue

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

network score

-30

-20

-10

TargetP

Page 11: Prediction of protein localization and membrane protein topology

TargetP - a four-state SP/mTP/cTP/other predictor

(JMB 300:1105)

performance

Page 12: Prediction of protein localization and membrane protein topology

TargetP sensitivity/specificity

sens spec

SP .91 .96

mTP .82 .90

cTP .85 .69

other .85 .78

sens = tp/(tp+fn) spec = tp/(tp+fp)

Other predictors

Page 13: Prediction of protein localization and membrane protein topology

Other ways to predict localization

- amino acid composition

- sequence homology

- domain structure

- phylogenetic profiles

- expression profiles

Membrane proteins

Page 14: Prediction of protein localization and membrane protein topology

Popular prediction programs

SignalP (NN, HMM)

ChloroP

TargetP

LipoP

-------

MitoProt

PSORT

Membrane proteins

www.cbs.dtu.dk

Page 15: Prediction of protein localization and membrane protein topology

Membrane protein topology

Page 16: Prediction of protein localization and membrane protein topology

A simulated lipid bilayer(Grubmüller et al.)

QuickTime™ and aYUV420 codec decompressorare needed to see this picture.

Page 17: Prediction of protein localization and membrane protein topology

Only two basic structures(Quart.Rev.Biophys. 32:285)

Helix bundle ß-barrel

Lipid/prot interactions

Page 18: Prediction of protein localization and membrane protein topology

Most MPs are synthesized at the ER

SP

Page 19: Prediction of protein localization and membrane protein topology

The basic model(courtesy Bill Skach)

prediction

Page 20: Prediction of protein localization and membrane protein topology

Topology prediction

Page 21: Prediction of protein localization and membrane protein topology

TM helix lengths are typically 20-30 residues

(Bowie, JMB 272:780)

Trp, Tyr

Page 22: Prediction of protein localization and membrane protein topology

Trp & Tyr are enriched in the region near the lipid headgroups

(Prot.Sci. 6:808; 7:2026)

Loop lengths

Page 23: Prediction of protein localization and membrane protein topology

Loops tend to be short(Tusnady & Simon, JMB 283:489)

PI rule

Page 24: Prediction of protein localization and membrane protein topology

The ’positive inside’ rule(EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41)

N

C

+ + +

Bacterial IMin: 16% KR out: 4% KR

Eukaryotic PMin: 17% KR out: 7% KR

Thylakoid membranein: 13% KR out: 5% KR

Mitochondrial IMIn: 10% KR out: 3% KR

in

out

prediction

Page 25: Prediction of protein localization and membrane protein topology

The positive-inside rule applies to all organisms

(Nilsson, Persson & von Heijne, submitted)

0

10

20

30

40

50

60

70

80

90

100

110

A C D E F G H I K L M N P Q R S T V W Y

(D+E) (K+R) (W+Y)

num

ber

of g

enom

es

amino acid

Page 26: Prediction of protein localization and membrane protein topology

Topology can be manipulated(Nature 341:456)

Lep constructs expressed in E. coli

f-Met-Ala-Asn-Met-Phe-

H1 H2

P1

P2

+

+

- -

QSLNASASE

H1 H2

P1

P2

++

+

+ +

+

++

+

+

- -

---

f-Met-Ala-Asn-Met-Phe-

Ala-Asn-Met-(Lys) -Phe-

H1H2

P1

P2

+

+

- -

QSLNASASE

4-

-

Lep wt Lep' Lep'-inv

periplasm

cytoplasm10+

2+

2+

4+

0+0+

PK

Page 27: Prediction of protein localization and membrane protein topology

Topology prediction - a classical problem in bioinformatics

MDSQRNLLVIALLFVSFMIWQAWE....

4 characteristics

Page 28: Prediction of protein localization and membrane protein topology

Three important characteristics

~20 hydrophobic residues

predictors

’Positive inside’ rule

Trp, Tyr

Page 29: Prediction of protein localization and membrane protein topology

Popular topology predictors

TMHMM (HMM)HMMTOP (HMM)TopPred (h-plot + PI-rule)MEMSAT (dynamic programming)TMAP (h-plot, mult. alignment)PHD (NN, mult. alignment)

toppred

Page 30: Prediction of protein localization and membrane protein topology

TopPred(JMB 225:487)

0 100 200 300 400-3

-2

-1

0

1

2

3

position

<H>

http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html 2 3 5 4 2 2

1 0 0 1 1 0

2

∆+ = 17

2

1

3

0

5

0

4

1

2

3

0

2

∆+ = 9

- construct all possible topologies

- rank based on +

E. coli LacY

TMHMM

Page 31: Prediction of protein localization and membrane protein topology

TMHMM(Sonnhammer et al., ISMB 6:175, Krogh et al., JMB

305:567)

h & l models

www.cbs.dtu.dkwww.sbc.su.se

A hidden Markov model-based method

Page 32: Prediction of protein localization and membrane protein topology

HMMTOP(Tusnady & Simon, JMB 283:489)

performance

Page 33: Prediction of protein localization and membrane protein topology

Helix & loop models in TMHMM

HMMTOP

Page 34: Prediction of protein localization and membrane protein topology

TMHMM performance(Krogh et al., JMB 305:567; Melén et al. JMB 327:735)

Discrimination globular/membrane:sens & spec > 98%

Correct topology: 55-60%

Single TM identification:sensitivity: 96%specificity: 98%

Training set:160 membrane proteins650 globular proteins

# of TM proteins

Page 35: Prediction of protein localization and membrane protein topology

Can performance be improved?

Consensus predictions

Multiple alignments

Experimental constraints

# of TM proteins

Page 36: Prediction of protein localization and membrane protein topology

’Consensus’ predictions indicate reliability

(FEBS Lett. 486:267)

0

0,2

0,4

0,6

0,8

1

5/0 4/1 3/2 & 3/1/1 2/1/1/1

60 E. coli proteins

majority level

frac

tion

corr

ect/

cove

rage

5 prediction methods used

46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes

Partial consensus

Page 37: Prediction of protein localization and membrane protein topology

TMHMM reliability scores(Melén et al. JMB 327:735)

TMHMM output:

1. Mean probability pmean

2. Minimum probability pmin(label)

3. PbestPath/PallPaths

Sequence: M C Y G K C I p(i): 0.78 0.78 0.78 0.76 0.76 0.08 0.03 p(h): 0.00 0.00 0.02 0.02 0.15 0.85 0.93 p(o): 0.22 0.22 0.20 0.20 0.08 0.07 0.04 Label: i i i i i h h

S3 results

Page 38: Prediction of protein localization and membrane protein topology

TMHMM (score 3)Prediction accuracy vs. coverage

Test set bias

60

70

80

90

100

0 20 40 60 80 100

perc

ent

corr

ect

coverage

~70%~45%

92 bacterial proteins

Page 39: Prediction of protein localization and membrane protein topology

”Experimentally known topologies” is a biased sample

0

10

20

30

40

test set

C. elegans

S.cerevisiae

E.coli

perc

ent

0-0.

25

0.25

-0.5

0.5-

0.75

0.75

-1

score interval

Estimate true performance

Page 40: Prediction of protein localization and membrane protein topology

Correlation between accuracy and TMHMM S3 score

02040608010000.20.40.60.81

mean score

perc

ent

corr

ect

genomes

Page 41: Prediction of protein localization and membrane protein topology

Expected TMHMM performance on proteomes

E. coli

S. cerevisiae

test set

C. elegans

40

50

60

70

80

90

100

0 25 50 75 100

coverage

perc

ent

corr

ect

Add C-term.

Page 42: Prediction of protein localization and membrane protein topology

Original TMHMM prediction, one TM helix missing

TMHMM prediction with C-terminus fixed to inside

Experimental information helps(JMB 327:735)

improvement

Page 43: Prediction of protein localization and membrane protein topology

When the location of the C-terminus is

known, the correct topology is predicted for

an estimated ~70% of all membrane proteins

(~ 55% when not known)

Reporter fusions

Experimental information helps(JMB 327:735)


Top Related