A Four-Body Statistical Potential For Protein Fold Recognition

1

A Four-Body Statistical Potential A Four-Body Statistical Potential For Protein Fold RecognitionFor Protein Fold Recognition

Bala Krishnamoorthy and Alex Tropsha

UNC Chapel Hill

Nov 17, 2003

2

Four-Body PotentialsFour-Body Potentials

OutlineOutline

Four-body statistical potentials

Application to folding simulations

Application to predictions from CASP5 and Livebench 6

Hypothesis

Motivation

3


MotivationMotivation

Knowledge of protein structure is essential to understand their function(s)

Number of proteins (sequences known) is growing exponentially

Traditional methods for determining protein structure (X-ray crystallography, NMR etc.) do not yield quick results

Need to develop statistical methods that help with protein fold recognition

4


HypothesisHypothesis

Specific nearest neighbor residue contacts in protein structures have non-random propensities for occurrence.

The propensities of occurrence of nearest neighbor clusters can be used to score compatibility between protein sequence and structure

5


SNAPPSNAPP

Simplicial Neighborhood Analysis of Protein Packing

2-D Packing2-D Packing

2-D: 3 neighbors in mutual contact

3-D: 4 neighbor clusters

3-D Packing3-D Packing

6


Objective definition of the nearest neighborhood of each residue is needed

Use the Voronoi diagram of the protein

- gives convex hulls around each residue (represented as a point) that define the nearest neighborhood of the residue

Delaunay triangulation – defined as the dual of the Voronoi diagram

7

Four-Body PotentialsFour-Body PotentialsTessellation of protein structure (in 3D)

Residues are represented by their side-chain centers (or by their C-α atoms)

Protein structure represented as an aggregate of space filling, non-intersecting and irregular tetrahedra

Nearest neighbor residues are identified as unique sets of four residues each

(tetrahedral quadruplets)

8


Four-body Statistical PotentialsFour-body Statistical Potentials

Denote each quadruplet by { i , j , k , l }

i,j,k and l can be any of the 20 amino acids

Total number of possible quadruplets is 8855

AALVVALITLKMYYYY …

9


Based on the back-bone connectivity of {i,j,k,l}, there can be five types of tetrahedra (indexed as 0,1,2,3 and 4 respectively )

The propensities of the {i,j,k,l} quadruplets of each type t could be used to develop four-body statistical potentials

10

Four-Body PotentialsFour-Body PotentialsFour-body compositional propensities of Delaunay simplicesFour-body compositional propensities of Delaunay simplices

a – individual AA frequencyalpijkl_t

C a i a j akp

tp

t – frequency of type t tetrahedra

C – combinatorial factor

ijkl_tp

qijkl_t log

fijkl_t

- observed frequency of occurrence in the training set of quad {ijkl} in a type t tetrahedron

f ijkl_t

pijkl_t- expected frequency of occurrence in the training set of residues i,j,k and l in a type t tetrahedron

i

11


diverse training set of 1166 protein chains with known structure

For a test conformation, the total log-likelihood score is calculated by adding the score for each tetrahedron in its Delaunay tessellation.

Higher Score ↔ better structure

12

Comparison of pre- and post-TS (transition) structure of CI2 vs. native CI2 *

*structures courtesy of Dr. E. Shaknovich, Harvard (Ref: J. Mol. Biol. 296 (2000) p1183-1188)

Pre-TS (six structures) Post-TS (20 structures) Native

MD Simulation of proteins


Go potentials (native structure specific) fail to discriminate between the three!

13

20

30

40

50

60

70

80

90

100

110

120

1 2 3 4 5 6 7 8 9 10 11 1213 14 1516 17 18 1920 21 2223 24 2526 27

instances (red-pre(6), yellow-post(20), green-native)

tota

l s

co

re

N.B. - The 5th pre-TS instance actually had a 0.10 probability of folding (the other five pre-TS structures had ~ 0 probability of folding)

Comparison of total scores for pre- and post-TS structures of CI2 vs. native CI2


14Profile ProCAM of Post-TS structure

V13

V31

L49

V51

V51

0

5

10

15

20

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

residue #

log

-lik

elih

oo

d s

core

pre

post

native

V13L8

A16

I20

I29V31

V47

L49

I57

Structure profiles of pre-TS vs. post-TS structure of CI2

Four-Body PotentialsFour-Body PotentialsFour-Body PotentialsFour-Body Potentials

15Pre-TS Post-TS

SNAPP analysis of pre-TS vs. post-TS structure of CI2


16

0

2

4

6

8

10

12

14

16

18

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56

residue #

log

-lik

elih

oo

d s

core

pre

post

native

Y8

L16F18

W35

A37

G46

I48

Y52

Structure profiles of pre-TS vs. post-TS structure of SH3


17


Scoring Livebench 6 and CASP5 predictions

Livebench Automated evaluation of structure prediction servers

Set 6 had 32 “easy” and 66 “hard” targets

CASP 5

3D coordinate models submitted for 56 targets

Native structure of 33 targets has been released

- rank 3D predictions using four-body potentials

- compare with the ranking using global structural similarity measures (like MaxSub)

18


To compare rankings, use predictive index (PI)

Here, E – experimental values, P – predicted values

19

Four-Body PotentialsFour-Body PotentialsLivebench 6

10 models for each target made by PMODELLER

PI for 28 “easy” targets and 38 “hard” targets

Easy <PI> Std(PI)

4B pot 0.83 0.20

MJ 0.70 0.39

PMOD 0.80 0.19

(at least one model had a non-zero MaxSub score)

Hard <PI> Std(PI)

4B pot 0.83 0.11

MJ 0.74 0.18

PMOD 0.84 0.15

20


CASP 5

For 18 targets (out of 33), the native structure ranked better than all predictions

For 26 (out of 33) targets, the native structure was ranked within the top 3.5 % of all the predictions

CASP5 <PI> Std(PI)

4B pot 0.61 0.18

MJ 0.39 0.20

CRMSD 0.63 0.22

21


Conclusions

A four-body statistical scoring function is developed based on the Delaunay tessellation of proteins

Discriminates native from decoy structures in most of the cases

Distinguishes pre- and post-transition state structures and the native structure from MD folding simulation trajectories

Highly effective in the accurate ranking of Livebench 6 and CASP5 predictions

Date post:	13-Jan-2016
Category:	Documents
Upload:	chiko
View:	33 times
Download:	0 times

A Four-Body Statistical Potential For Protein Fold Recognition

Documents