+ All Categories
Home > Documents > Computational Methods for Protein Structure Prediction

Computational Methods for Protein Structure Prediction

Date post: 07-Feb-2022
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
24
2017/12/6 1 Computational Methods for Protein Structure Prediction Ying Xu
Transcript

2017/12/6 1

Computational Methods for Protein Structure Prediction

Ying Xu

2017/12/6 2

Outline

introduction to protein structures

the problem of protein structure prediction

why it is possible to predict protein structures

protein tertiary structure prediction protein threading

2017/12/6 3

Protein Sequence, Structure and Function

>1MBN:_ MYOGLOBIN (154 AA)MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKAGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG

Protein sequence

Protein structure

Oxygen storage

Protein function

2017/12/6 4

Protein Structure protein sequence folds into a “unique” shape (“structure”) that

minimizes its free potential energy

2017/12/6 5

Protein Structures Primary sequence

Secondary structure

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

-helix

-sheet

anti-parallel

parallel

2017/12/6 6

Protein Structures Tertiary structure

Quaternary structure

2017/12/6 7

2017/12/6 8

Why Protein Structure?

Importance of protein structure– knowledge of the structure of a protein enable us to understand its

function and functional mechanism– design better mutagenesis experiments– structure-based rational drug design

2017/12/6 9

Protein Structure PredictionProblem: Given the amino acid sequence of a protein, computationally predict its 3-dimensional shape?

MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKAGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG

? ……..

2017/12/6 10

Why We Can Predict Structure In theory, a protein structure can be solved

computationally

A protein folds into a 3D structure to minimizes itsfree potential energy

Anfinsen’s classic experiment on Ribonuclease A folding in the 1960’s

energy functions

This problem can be formulated as an optimization problem

protein folding problem, or ab initio folding

2017/12/6 11

Why We Can Predict Structure The problem is exceedingly difficult to solve

the search space is defined by psi/phi angles of backbone and side-chain positions

the search space is enormous even for small proteins! the number of local minima increases exponentially of

the number of residues

Theoretically solvable but practically infeasible!

2017/12/6 12

Methods for Protein Structure Prediction

ab initio use first principles to fold proteins does not require templates high computational complexity

homology modeling similar sequence similar structures practically very useful, need homologues

protein threading many proteins share the same structural fold a folding problem becomes a fold recognition problem

2017/12/6 13

Why We Can Predict Structure Theoretical studies suggest that the vast majority of the proteins

in nature fall into not much more than 1,000 structural folds

This realization has fundamentally changed how protein structures can be predicted

The structure prediction problem becomes that for a protein sequence, find which of the structural folds the protein can fold into, plus possibly some structural refinement

MTYKLILN …. NGVDGEWTYTE

2017/12/6 14

Protein Threading Basic premise

Statistics from Protein Data Bank (~37,229 structures)

Chances for a protein to have a native-like structural fold in PDB are quite good (estimated to be 60-70%) Proteins with similar structural folds could be homologues or analogues

The number of unique structural (domain) folds in nature is fairly small (possibly a few thousand)

90% of new structures submitted to PDB in the past three years have similar structural folds in PDB

2017/12/6 15

Protein Threading

The goal: find the “correct” sequence-structure alignment between a target sequence and its native-like fold in PDB

Energy function – knowledge (or statistics) based rather than physics based Should be able to distinguish correct structural folds from incorrect

structural folds Should be able to distinguish correct sequence-fold alignment from

incorrect sequence-fold alignments

MTYKLILN …. NGVDGEWTYTE

2017/12/6 16

Protein Threading – energy function

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

how well a residue fitsa structural environment: E_s

how preferable to put two particular residues nearby: E_p

alignment gap penalty: E_g

total energy: E_p + E_s + E_g

find a sequence-structure alignment to minimize the energy function

2017/12/6 17

Protein Threading – energy function

A singleton energy measures each residue’s preference in a specific structural environments secondary structure solvent accessibility

Compare actual occurrence against its “expected value” by chance

where

2017/12/6 18

Protein Threading – energy function

A simple definition of structural environment secondary structure: alpha-helix, beta-strand, loop solvent accessibility: 0, 10, 20, …, 100% of accessibility each combination of secondary structure and solvent

accessibility level defines a structural environment• E.g., (alpha-helix, 30%), (loop, 80%), …

E_s: a scoring matrix of 30 structural environments by 20 amino acids E.g., E_s ((loop, 30%), A)

singleton energy term

2017/12/6 19

Protein Threading – energy function

Helix Sheet LoopBuried Inter Exposed Buried Inter Exposed Buried Inter Exposed

ALA -0.578 -0.119 -0.160 0.010 0.583 0.921 0.023 0.218 0.368ARG 0.997 -0.507 -0.488 1.267 -0.345 -0.580 0.930 -0.005 -0.032ASN 0.819 0.090 -0.007 0.844 0.221 0.046 0.030 -0.322 -0.487ASP 1.050 0.172 -0.426 1.145 0.322 0.061 0.308 -0.224 -0.541CYS -0.360 0.333 1.831 -0.671 0.003 1.216 -0.690 -0.225 1.216GLN 1.047 -0.294 -0.939 1.452 0.139 -0.555 1.326 0.486 -0.244GLU 0.670 -0.313 -0.721 0.999 0.031 -0.494 0.845 0.248 -0.144GLY 0.414 0.932 0.969 0.177 0.565 0.989 -0.562 -0.299 -0.601HIS 0.479 -0.223 0.136 0.306 -0.343 -0.014 0.019 -0.285 0.051ILE -0.551 0.087 1.248 -0.875 -0.182 0.500 -0.166 0.384 1.336LEU -0.744 -0.218 0.940 -0.411 0.179 0.900 -0.205 0.169 1.217LYS 1.863 -0.045 -0.865 2.109 -0.017 -0.901 1.925 0.474 -0.498MET -0.641 -0.183 0.779 -0.269 0.197 0.658 -0.228 0.113 0.714PHE -0.491 0.057 1.364 -0.649 -0.200 0.776 -0.375 -0.001 1.251PRO 1.090 0.705 0.236 1.249 0.695 0.145 -0.412 -0.491 -0.641SER 0.350 0.260 -0.020 0.303 0.058 -0.075 -0.173 -0.210 -0.228THR 0.291 0.215 0.304 0.156 -0.382 -0.584 -0.012 -0.103 -0.125TRP -0.379 -0.363 1.178 -0.270 -0.477 0.682 -0.220 -0.099 1.267TYR -0.111 -0.292 0.942 -0.267 -0.691 0.292 -0.015 -0.176 0.946VAL -0.374 0.236 1.144 -0.912 -0.334 0.089 -0.030 0.309 0.998

2017/12/6 20

Protein Threading – energy function

It measures the preference of a pair of amino acids to be close in 3D space.

Observed occurrence of a pair compared with its “expected” occurrence

pair-wise interaction energy term

uniform state model

2017/12/6 21

Protein Threading – energy function

ALA -140ARG 268 -18ASN 105 -85 -435ASP 217 -616 -417 17CYS 330 67 106 278 -1923GLN 27 -60 -200 67 191 -115GLU 122 -564 -136 140 122 10 68GLY 11 -80 -103 -267 88 -72 -31 -288HIS 58 -263 61 -454 190 272 -368 74 -448ILE -114 110 351 318 154 243 294 179 294 -326LEU -182 263 358 370 238 25 255 237 200 -160 -278LYS 123 310 -201 -564 246 -184 -667 95 54 194 178 122MET -74 304 314 211 50 32 141 13 -7 -12 -106 301 -494PHE -65 62 201 284 34 72 235 114 158 -96 -195 -17 -272 -206PRO 174 -33 -212 -28 105 -81 -102 -73 -65 369 218 -46 35 -21 -210SER 169 -80 -223 -299 7 -163 -212 -186 -133 206 272 -58 193 114 -162 -177THR 58 60 -231 -203 372 -151 -211 -73 -239 109 225 -16 158 283 -98 -215 -210TRP 51 -150 -18 104 52 -12 157 -69 -212 -18 81 29 -5 31 -432 129 95 -20TYR 53 -132 53 268 62 -90 269 58 34 -163 -93 -312 -173 -5 -81 104 163 -95 -6VAL -105 171 298 431 196 180 235 202 204 -232 -218 269 -50 -42 46 267 73 101 107 -324

ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL

2017/12/6 22

Threading Parameter Optimization

How to determine the weight of different energy term?Etotal = sEsingleton + pEpairwise + gEgap + ssEss

Select the weights to give the “best” threading performance on a training set (fold recognition and alignment accuracy)

Different weights for different classes? (superfamily, fold)pair-wise may contribute more for fold level threadingmutation/profile terms dominate in superfamily level threading

2017/12/6 23

Protein Threading – mathematical formulation

Etotal(

template structure (T)

Target sequence (S)S T K Y Q C D D ASi…………………………Si+8

Tj Tj+2Tj+4

Tj+7

2017/12/6 24

Protein Threading -- algorithm

Dynamic programming Heuristic algorithms for pair-wise interactions

Frozen approximation algorithm (A. Godzik et al.) Double dynamic programming (D. Jones et al.) Monte carlo sampling (S.H. Bryant et al.)

Rigorous algorithms for pair-wise interactions Branch-and-bound (R.H. Lathrop and T.F. Smith) Divide-and-conquer (Y. Xu et al.) --PROSPECT Linear programming (J. Xu et al.) –RAPTOR Tree decomposition (L. Cai et al.)

Rigorous algorithm for treating backbone and side-chain simultaneously (Li et al.)


Recommended