+ All Categories
Home > Documents > Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1...

Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1...

Date post: 13-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
69
© Burkhard Rost (TU Munich) /65 Protein Prediction - Part 1: Structure 1 Wednesday May 25, 2011
Transcript
Page 1: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Protein Prediction - Part 1: Structure

1Wednesday May 25, 2011

Page 2: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Announcements

Videos: SciVee www.rostlab.orgTHANKS : Tim Karl + Haitam Sohby NO lectures: Tue May 31(!) studentische vollversammlung Thu Jun 2 (Ascension) Thu Jun 16 ?

LAST lecture: Jul 7Examen: Jul 12 (?), 10:30 (likely this room)

• Makeup: likely: October 13 - morning

CONTACT: Marlena Drabik [email protected]

2

Wednesday May 25, 2011

Page 3: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Today: Secondary structure prediction 1

LAST WEEKs• Secondary structure prediction: principles on white board THIS WEEK

• Secondary structure prediction methods - detailsNEXT WEEK

• Student Assembly & Ascension Day -> no lecture2 WEEKs from now

• Marc Ofmann: Molecular Dynamics (MD)• Comparative modeling

3

Wednesday May 25, 2011

Page 4: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

1D prediction:gory details

4Wednesday May 25, 2011

Page 5: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

PHDsec: the un-g(l)ory details

76% is average over distribution: ≈ 10%

5

Wednesday May 25, 2011

Page 6: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Prediction accuracy varies!

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

6Wednesday May 25, 2011

Page 7: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

PHDsec: the un-g(l)ory details

76% is average over distribution: ≈ 10%stronger predictions more accurate

7

Wednesday May 25, 2011

Page 8: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Stronger predictions more accurate!

.

0

20

40

60

80

100

0

20

40

60

80

100

3 4 5 6 7 8 9

Q per protein3 fit: Q3fit = 21 + 8.7 * Q

3

Q3 p

er p

rote

in

Reliability index averaged over protein

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

H=0.5E=0.4L=0.1

H=0.8E=0.1L=0.1

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

8Wednesday May 25, 2011

Page 9: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

PHDsec: the un-g(l)ory details

76% is average over distribution: ≈ 10%stronger predictions more accurateWARNING: reliability index almost factor 2 too large for single sequences

9

Wednesday May 25, 2011

Page 10: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Details PHDsec: Multiple alignment

single sequences => accuracy clearly lower

id nali Q3sec Q2accAA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDOBS EEEE E E EEEEEE EEEEEE EEEEEEHHHEEEE30 N 26 70 77 EEEEEEE EEE EEEEE EEEE EE EEEself 1 63 72 EEEEEEE EEEE EEEEE EEEEEE HHHHH

10

Wednesday May 25, 2011

Page 11: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?Limit of prediction accuracy reached? Comparative modeling or de novo?Ultimate rôle in structure prediction (1D-3D)? Will secondary structure and 3D prediction merge completely?

11

Wednesday May 25, 2011

Page 12: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?

12

Wednesday May 25, 2011

Page 13: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Evolution has it!

.

0

20

40

60

80

100

0 50 100 150 200 250

Perc

enta

ge se

quen

ce id

entit

y

Number of residues aligned

Sequence identityimplies structural

similarity !

Don't know region

13

C Sander & R Schneider 1991 Proteins 9:56-68B Rost 1999 Prot Engin 12:85-94

Wednesday May 25, 2011

Page 14: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Different alignment strategies

Method SWISS-PROT BIGE<1 E<10-3 E<1 E<10-3

BLAST 8.2 7.6 9.7 9.2simple ClustalW 4.4 5.4profile ClustalW 5.4 7.1MaxHom with McLachlan 7.2 7.5 9.0 8.9MaxHom with BLOSUM62 8.3 7.9 9.5 9.1BLAST-filter 7.9 7.6 9.5 9.2profile-based BLAST 8.2 7.8 9.6 9.1

significant difference > 0.44 > 0.44 > 0.44 > 0.44

14Wednesday May 25, 2011

Page 15: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Different alignment strategies

Method SWISS-PROT BIGE<1 E<10-3 E<1 E<10-3

BLAST 8.2 7.6 9.7 9.2simple ClustalW 4.4 5.4profile ClustalW 5.4 7.1MaxHom with McLachlan 7.2 7.5 9.0 8.9MaxHom with BLOSUM62 8.3 7.9 9.5 9.1BLAST-filter 7.9 7.6 9.5 9.2profile-based BLAST 8.2 7.8 9.6 9.1

significant difference > 0.44 > 0.44 > 0.44 > 0.44

15Wednesday May 25, 2011

Page 16: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Small vs. big

-30

-20

-10

0

10

20

30

40

-20 -10 0 10 20 30

Q 3(NR

DB)

- Q 3(s

ingl

e seq

uenc

e)

Q3(SWISS-PROT) - Q3(single sequence)

-20 -10 0 10 20 30-30

-20

-10

0

10

20

30

40

Q3(SWISS-PROT) - Q3(single sequence)

A: BLAST-E cut-off < 1 B: BLAST-E cut-off < 10-20

C:

16Wednesday May 25, 2011

Page 17: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Different alignment strategies 3

Method SWISS-PROT BIGE<1 E<10-3 E<1 E<10-3

BLAST 8.2 7.6 9.7 9.2simple ClustalW 4.4 5.4profile ClustalW 5.4 7.1MaxHom with McLachlan 7.2 7.5 9.0 8.9MaxHom with BLOSUM62 8.3 7.9 9.5 9.1BLAST-filter 7.9 7.6 9.5 9.2profile-based BLAST 8.2 7.8 9.6 9.1

significant difference > 0.44 > 0.44 > 0.44 > 0.44

17Wednesday May 25, 2011

Page 18: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Accuracy vs. E-value BLAST

E-value b PHDsec c PHDacc d

100 8.7 4.4

20 9.1 4.9 10 9.5 5.0 1 9.7 5.2 10-1 9.5 5.3 10-2 9.2 5.3 10-3 9.1 5.2 10-4 8.9 5.2 10-7 8.5 5.0 10-20 6.9 4.5

significant difference >0.44 >0.39

18Wednesday May 25, 2011

Page 19: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Accuracy vs. E-value PSI-BLAST

Iteration E-value b PHDsec c PHDacc d

10 7.3 3.0 1 9.3 4.2

10-1 10.1 4.810-2 10.1 5.010-3 10.0 5.010-4 10.1 5.110-7 9.9 5.110-20 9.6 5.210-60 9.4 5.0

significant difference >0.44 >0.3

19Wednesday May 25, 2011

Page 20: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Accuracy vs. pollution

Number of iterations b h 10-4 c h 10-10 c

filtered d non-filtered d filtered d non-filtered d

1 9.5 9.7 9.5 9.72 9.9 10.0 9.8 10.0

3 10.1 9.8 10.1 10.04 9.6 9.3 10.1 9.86 9.3 8.8 9.9 9.710 8.1 7.4 9.7 9.5

significant difference >0.44 >0.44 >0.44 >0.44

20Wednesday May 25, 2011

Page 21: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

50

60

70

80

90

50 60 70 80 90

Pred

ictio

n ac

cura

cy u

sing

itera

ted

PSI-

BLA

ST

Prediction accuracy using pairwise BLAST

PSI-BLAST not always best

21Wednesday May 25, 2011

Page 22: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

-5

0

5

10

15

20

25

1 10 100 1000

Q3

alig

nmen

t - Q

3 sin

gle s

eque

nce

Number of proteins aligned

More = better?

22Wednesday May 25, 2011

Page 23: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

BLAST on SWISS-PROTBLAST on NRDBPSI-BLAST on NRDBPSI-BLAST excludingBLAST hits

0

20

40

60

80

100

0 20 40 60 80 100

Num

ber o

f seq

uenc

es in

alig

nmen

t

Percent of proteins

Data deluge not enough for greedy bioinformaticians

23Wednesday May 25, 2011

Page 24: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?

24

Wednesday May 25, 2011

Page 25: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?? ... that depends

25

Wednesday May 25, 2011

Page 26: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?that dependsLimit of prediction accuracy reached?

Comparative modeling or de novo?

Ultimate rôle in structure prediction (1D-3D)? Will secondary structure and 3D prediction merge completely?

26

Wednesday May 25, 2011

Page 27: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

1D secondary structure prediction: Quo vadis?

27Wednesday May 25, 2011

Page 28: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

How to assess secondary structure prediction methods?

28Wednesday May 25, 2011

Page 29: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

How to assess performance?

29

?Wednesday May 25, 2011

Page 30: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

How to assess performance?

29

?gr

oups

groups

groupsgroupsgroups

Wednesday May 25, 2011

Page 31: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: Truth from where

30

Standard of truth:PDB -> DSSP -> string HEL

Wednesday May 25, 2011

Page 32: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

many many many?

31

Wednesday May 25, 2011

Page 33: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

many many many?

32

Wednesday May 25, 2011

Page 34: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

33

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

Wednesday May 25, 2011

Page 35: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

Some 500+ proteins appear to workANY 500+ do?

34

Wednesday May 25, 2011

Page 36: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

ANY 500+ do?NOwe need to sample the “true distribution”

(redundancy reduction/bias reduction)

35

Wednesday May 25, 2011

Page 37: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

36

standard deviation ~ 10% (in Q3)assume 2500 “effective” proteins in data set

Wednesday May 25, 2011

Page 38: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

37

standard deviation ~ 10% (in Q3)assume 2500 “effective” proteins in data set

method 1: Q3=76.521%method 2: Q3=76.301%

Do they differ?

Wednesday May 25, 2011

Page 39: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

38

σ ~ 10% (in Q3) / nprot=2500

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 0.2%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 40: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

39

σ ~ 10% (in Q3) / nprot=2500

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 0.2%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 41: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

39

σ ~ 10% (in Q3) / nprot=2500

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 0.2%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 42: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /6540

Wednesday May 25, 2011

Page 43: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

41

σ ~ 10% (in Q3) / nprot=2500

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 0.2%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 44: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

41

σ ~ 10% (in Q3) / nprot=2500

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 0.2%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 45: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set size

ANY 500+ do?NOwe need to sample the “true distribution”

(redundancy reduction/bias reduction)

is that ENOUGH?

42

Wednesday May 25, 2011

Page 46: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set

anything else to consider in choosing the data set?

43

?Wednesday May 25, 2011

Page 47: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: data set

set of sequence-unique/unbiased proteins that are ideally also sequence-unique/unbiased with respect to anything used to develop the methods to assess

-> NEW proteins

44

Wednesday May 25, 2011

Page 48: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

45

σ ~ 10% (in Q3) / nprot=100

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 1%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 49: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: standard error

45

σ ~ 10% (in Q3) / nprot=100

method 1: Q3=76.521%method 2: Q3=76.301%

Rule of thumb: StdError = σ / √nprot-> 1%

YES, the difference is statistically significant (although borderline)

Wednesday May 25, 2011

Page 50: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Art 2 Assess: anything else?

Anything else to consider?

46

?Wednesday May 25, 2011

Page 51: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Evaluation alternatives

47

Method 1 predicts proteins P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11Method 2 predicts P2, P4, P6, P8, P10, P11Method 3: P1, P3, P5, P7, P9, P10, P11Method 4: P0, P10, P11

Wednesday May 25, 2011

Page 52: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Ranking not stable!

29 different worse than 11 identical

VA Eyrich, IYY Koh, D Przybylski, O Graña, F Pazos, A Valencia and B Rost (2003) Proteins 53 Suppl 6 548-60

© Burkhard Rost (Columbia New York)

Wednesday May 25, 2011

Page 53: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Compare methods on identical data sets!!

49Wednesday May 25, 2011

Page 54: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

one proteinPDB vs prediction

weeksummary

Compile results at

PDB

Prediction servers

secondary structure, fold recognition

inter-residue contacts / distancescomparative modelling, fold recognition

Satellites/Mirrors

everyweek

everyday

User• browse• query• ftp

Results

staticpages

Collect HTMLUpdate central pages

EVA-DBSend sequences

Analyse: pairwise BLAST

Analyse:• PSI-BLAST• MaxHom• sequence- unique sets

Get PDB

EVA: automatic continuous EVAluation of structure prediction

50Wednesday May 25, 2011

Page 55: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

EVA: secondary structure

Method B Q3 C Q3 Claim D SOV E Info F CorrH G CorrE H CorrL I Class K BAD L

PROF 76.0 72 0.35 0.67 0.63 0.55 82 2.7PSIPRED 76.0 76.5-78.3 M 72 0.36 0.65 0.62 0.55 78 2.8SSpro 76.0 76 71 0.35 0.67 0.63 0.56 83 2.8

JPred2 75.0 76.4 69 0.34 0.65 0.60 0.54 76 2.6PHDpsi 75.0 71 0.33 0.65 0.60 0.54 81 3.0

PHD 71.4 71.6 68 0.28 0.59 0.58 0.49 77 4.3

Copenhagen 78 N 77.8

Wang/Yuan 53 O

76%

51Wednesday May 25, 2011

Page 56: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Secondary structure predictions differ

52Wednesday May 25, 2011

Page 57: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Accuracy varies for proteins!

0

5

10

15

20

25

30 40 50 60 70 80 90 100

PSIPREDSSproPROFPHDpsiJPred2PHD

Perce

ntage

of al

l 150

prote

ins

Percentage correctly predicted residues per protein

53Wednesday May 25, 2011

Page 58: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Some proteins predicted better

30

40

50

60

70

80

90

0 20 40 60 80 100

Acc

urac

y pe

r pro

tein

(Q3)

Cumulative percentage of proteins

54Wednesday May 25, 2011

Page 59: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

-30

-20

-10

0

10

20

30

55 60 65 70 75 80 85 90 95

ave-PSIPREDave-SSproave-PROFave-PHDpsiave-JPred2ave-PHD

55 60 65 70 75 80 85 90 95

Devi

ation

of m

ethod

from

avera

ge

Per-protein prediction accuracy averaged over 6 methods

Averaging over many methods not always good!

55Wednesday May 25, 2011

Page 60: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Reliability correlates with accuracy!

70

75

80

85

90

95

100

70

75

80

85

90

95

100

0 20 40 60 80 100

JPred2PHDPROFPSIPRED

0 20 40 60 80 100

Perc

enta

ge o

f cor

rect

ly p

redi

cted

resid

ues

Percentage of residues predicted

56Wednesday May 25, 2011

Page 61: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Secondary structure prediction 2005

history1st generation 50-55%2nd generation 55-62%3rd generation 1992 70-72% 2000 > 76% 2010 > 78%

57

Wednesday May 25, 2011

Page 62: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Secondary structure prediction 2005

history1st generation 1970s 50-55% 55

2nd generation1980s 55-62% 62 + 7

3rd generation 1992 70-72% 72 +10

2000 > 76% 76 + 4 2011 > 78% 78 + 2

58

Wednesday May 25, 2011

Page 63: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Secondary structure prediction 2005

history1st generation 1970s 50-55% 552nd generation1980s 55-62% 62 + 73rd generation 1992 70-72% 72 +10 2000 > 76% 76 + 4 2011 > 78% 78 + 2

what improves (2002)?database growth +3PSI-BLAST +0.5new training +1‘clever method’ +1

59

Wednesday May 25, 2011

Page 64: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Quo vadis?

1980: 55% simple1990: 60% less simple1993: 70% evolution2000: 76% more evolution2011: 78% even more evolutionwhat is the limit?

60

Wednesday May 25, 2011

Page 65: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Quo vadis?

1980: 55% simple1990: 60% less simple1993: 70% evolution2000: 76% more evolution2011: 78% even more evolutionwhat is the limit?

88% for proteins of similar structure80% for 1/5th of proteins with families > 100

61

Wednesday May 25, 2011

Page 66: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Quo vadis?

1980: 55% simple1990: 60% less simple1993: 70% evolution2000: 76% more evolution2011: 78% even more evolutionwhat is the limit?

88% for proteins of similar structure80% for 1/5th of proteins with families > 100 missing: better definition of secondary structure including long-range interactionsstructural switcheschameleon / folding

62

Wednesday May 25, 2011

Page 67: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Conclusion: secondary structure prediction

big gain through using evolutionary informationare we going to reach above 80%? How high?continuous secondary structurebetter methodsother featuresuse secondary structure: ASP M Young, Kirshenbaum, Dill, S Highsmith: Predicting conformational switches in proteins. Protein Sci 1999, 8:1752-1764.

63

Wednesday May 25, 2011

Page 68: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

FAQ of secondary structure prediction

What is the best alignment?that dependsLimit of prediction accuracy reached? noComparative modeling or de novo?specialist is bestUltimate rôle in structure prediction (1D-3D)? Will secondary structure and 3D prediction merge completely?

64

Wednesday May 25, 2011

Page 69: Protein Prediction - Part 1: Structure · 2011. 5. 26. · Today: Secondary structure prediction 1 LAST WEEKs • Secondary structure prediction: principles on white board THIS WEEK

© Burkhard Rost (TU Munich) /65

Announcements

Videos: SciVee www.rostlab.orgTHANKS : Tim Karl + Haitam Sohby NO lectures: Tue May 31(!) studentische vollversammlung Thu Jun 2 (Ascension) Thu Jun 16 ?

LAST lecture: Jul 7Examen: Jul 12 (?), 10:30 (likely this room)

• Makeup: likely: October 13 - morning

CONTACT: Marlena Drabik [email protected]

65

Wednesday May 25, 2011


Recommended