NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

Joe DeBartolo

amino acid

sequence

native protein

structure

structure prediction

protein design

An overview of my thesis

Why do prediction and design matter?

Structure Prediction. Growth of sequences outpaces experimental characterization. Knowing their structure provides insights into their function and interactions

Protein design. Understanding design principles can allow the creation of new proteins with therapeutic and industirial applications

PART I ItFix: Homology-free structure predictionPART II SPEED: ItFix enhanced with evolutionPART III Future directions in prediction

Protein structure prediction and design

PART IV Protein design

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR1° structure

Protein structure prediction

2° and 3° structure

local 2° structure

topology diagram

3D model

Residue-residue contact map

The Challenge:Distill the folding problem down to the basic principles, code them into an algorithm, and predict pathways and

structure without using homology

nativestructure

…LEKVQLN…

amino acid sequence

• Ramachandran angles• backbone hydrogen bonds

• long range sterics • Van der Waals• electrostatics• hydrophobic effect

y

f

• local sterics• solvation• backbone entropy

Capturing the interrelated forces of protein structure

local structures

α-helix β-strand

-180

180

-180180φ

ψ

turn

-180 180φ -180 180φ

The overlapping features of local protein structure

backboneRamachandran

torsionangles

backboneH-bonds

amphipathic sidechainpatterning

polar

apolar

y

f

mostly polarapolar

polar

• sterics • Van der Waals• electrostatics

• ramachandran angles• backbone hydrogen bonds• solvationy

f

• long-range hydrogen bonding

Capturing the interrelated forces of protein structure

long range effects

3° packing specificity of the chain

solvent exposed residues

apolar buried residues

hydrophobic effect surface residue placement

salt bridges and other favorable pairings

long-range hydrogen bonding

contacts that are highly

separated in sequence

y

f

The structure prediction challenge:To integrate all of these features into an algorithm

requirements

a way to sample conformations

-180 180φ

180

-180

ψ Xa way to evaluate conformations

Sample Ramachandran space

y

f

180

ψ

-180180-180 φ

Rama angle pairs describe entire conformation...NO sidechain rotamer sampling

Rama map of PDBRama angle pair

exclude sidechains beyond Cβ

1° and 2° structure information refines the Rama search space

180

-180

ψ

1° structure 2° structure

ALL-ALL-ALLEntire PDB ALL-ALL-ALL

180

-180

ψadd amino acid identity ALL-ALL-ALLALL-ASN-ALL

180

-180

ψadd neighbor identity ALL-ALL-ALLALL-ASN-GLY

-180

180

-180 180φ

ψadd 2° structure identity BETA-ALL-ALLALL-ASN-GLY

-180 180φ

-180 180φ

-180 180φ

y

f

The structure prediction challenge:To integrate all of these features into one algorithm

requirements


-180 180φ

180

-180


Discrete Optimized Potential EnergyKnowledge-based modeling of the energy of a conformation

The DOPE statistical potential

PDB

EnergyPDB(rij) = -ln( ProbPDB(rij) )

Shen and Sali, Proteins (2007)

residue iamino acid iatom type I

rij is the distance between atoms i and j

The DOPE atom pair energy…residue j

amino acid jatom type j

GLU-Cβ - GLU-CβLEU-Cβ - LEU-Cβ

DOPE

ene

rgy

Distance (Å)

DOPE

• orientation dependence• 2° structure dependence• eliminate local biases

I have added to DOPE…


DOPE

PW

ene

rgy

Distance (Å)

DOPE-PW

residue 1

residue 2

ρ2-1

ρ1-2

Ca

Cb

Cβ

Ca

,

CαCβ

Capturing sidechain orientation in a sidechain-free model

DeBartolo et al. PNAS 2009

Ca

Cβ

residue 1

residue 2

ρ1-2 ρ2-1 Ca

Cβ

High ρ (in-line)low ρ

PW = r = 212

221 )90()90( rr

ρ1-2 is the angle between two vectors

DOPE-PW (uniquely) captures the hydrophobic effect

Cα Cβ CαCb

Cα Cβ C α Cβ

C α CβC αCβ

Potential orientations of high PW

GLU-Cβ GLU-CβLEU-Cβ LEU-Cβ

DOPE

ene

rgy

Distance (Å)

hydrophobic residues pairs have lower energy at smaller distances

buried in the core

large distance preferred

DOPE-PW captures the amphipathic nature of β-sheets

C α

Cβ

C α

C β

potential orientations of low PWpolar and apolar residues prefer opposing sides of the β-sheet

C α

C β

C α

C β

GLU-Cβ LYS-CβGLU-Cβ LEU-Cβ

DOPE

ene

rgy

Distance (Å)

same side of β-sheet

opposite side of β-sheet

y

f

The challenge:To integrate all of these features into one algorithm

requirements


-180 180φ

180

-180


Not(Helix)

helixstrand

Not(Strand)

Coil subtypes

Fold with (f,y) from LibraryRestricted 1

Repeat until no further fixing is possible

FinalRound

“I2”

Remove trimers

Repeat removal

Fold with (f,y) from LibraryRestricted 2

Fold with (f,y) from LibraryRestricted final

“N”

φ-180° 180°

ψ

180°

-180°

ItFixIterative Fixing to reduce the conformational search

DeBartolo et al., PNAS 2009

Starting configuration 1° only (no 2o structure restriction)

“U”ψ

-180°

180°

“I1”Fold with (f,y) from LibraryInitial

Remove trimers of lowly-populated 2o structure

ψ

-180°

180°

2° st

ruct

ure

optio

n re

mov

ed

sear

ch sp

ace

is re

stric

ted

sampling library


Native ---HHHHHHHHHHHHHHH-----GGGHHHHHHHHHHHHHHHT---HHHHHHHHHH-TT-THHHHHHHH-ItFix ---HHHHHHHHHHHHHHHT-----S-HHHHHHHHHHHHHHHT-S--HHHHHHHHHT---HHHHHHHHH-SSPro ---HHHHHHHHHHHHHHHHHHE-TTHHHHHHHHHHHHHHHHT--HHHHHHHHHHT-TTHHHHHHHHHH-PSIPRED ---HHHHHHHHHHHHHHH-----HHHHHHHHHHHHHHHHHH----HHHHHHHHH----HHHHHHHHH--

Native -HHHHHHHHHHHTT-SS--HHHHHHHHHHHT--HHHHHHHHHHHHHHHH-ItFix --HHHHHHHHHHHH-----HHHHHHHHHHHH--S-HHHHHHHHHHHHHH-SSPro -HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHEEHEHHHHHHH--PSIPRED -HHHHHHHHHHHHH-----HHHHHHHHHHHHHHHHHHHHHHH-HHHH---

Native -EEEEEEEEETTTTEEEEE-TTS--EEEEGGGB-SSSS----TT-EEEEEEEEETTEEEEEEEEE--ItFix -EEEEEEEE-STTTEEEEEEET-T-EEEEEEE--SSS-----TS--EEEEEEES--S----EEEEE-SSPro --TEEEEEE-TTTTEEEE--TT--EEEEEEEHEETTT--E--TT-EEEEEEEE-TT--E-EE-----PsiPred --EEEEEEEE----EEEEE-----EEEEEEE--------------EEEEEEEE-----EEEEEE---

Native --BGGG---SEEEEE-TTS-EEEEEEHHHHHHHHHHTT-EEEEEETTSSS-EEEEE-ItFix -EEE-SSSSEEEEEE-TTS-EEEEEEHHHHHHHHHHHT--EEEE-TTSSS-EEEEE-SSPro --BBTEEE-EEEEEEETTT-EEEEE-HHHHHHHHHHHT--EEEE-TT----EEEE--PSIPRED ----------EEEEE-----EEEEE-HHHHHHHHHH----EEEE-------EEEE--

Native -HHHHHHHHHHHTT--HHHHHHHHTS-HHHHHHHHTTS-SS-TTHHHHHHHTT--HHHHH-ItFix -HHHHHHHHHHHHT--HHHHHHHHT--HHHHHHHHTT--SS----HHHHHHHT--HHHHH-SSPro ---HHHHHHHHHHHHHHHHHHHHHT-HHHHHHHHHTT-------HHHHHHHHHT--HHHH-PsiPred -HHHHHHHHHHH----HHHHHHHH---HHHHHHHH------HHHHHHHHHHH---HHHH--

Native -EEEEEETTS-EEEEE--TTSBHHHHHHHHHHHH---GGGEEEEETTEE--TTSBTGGGT--TT-EEEEEE-ItFix -EEEEEETTS-EEEEEE---S-B-HHHHHHHHHSS---SSEEEEETT----TT-B----------EEEEEE-SSPro -EEEEEEETTEEEEEEE---SHHHHHHHHHHHTTT---T--E--ETT-E--TT-EEEEEE--TT-EEEEEE-PSIPRED -EEEEEE----EEEEEE-----HHHHHHHHHHHH---HHHEEEEE--EE------HHH-------EEEEEE-

1af7 2.5 Å

1b72 1.6 Å

1csp 6.0 Å

1tif 4.2 Å

1r69 2.4 Å

1ubq 3.1 Å

Homology-free ItFix2° and 3° structure prediction results

2° S

truc

ture

freq

uenc

yMajor pathway

(from experiment)

Unfoldedstate

Round 1

Round 2

Round 3

Round 4

Round 9

residue index 73

10

10

10

10

10

10

1

Round 6

b1 b2 helix b4 b5 310 b3

b1 b2 helix b4 b5 310 b3

+ b3

+ b4

b1-b2 hairpin

+helix

+ b4

+helix

+ b3

+ b5

+310

helix

Nativestate

10

Round 0 Mimicking folding

pathways


Use basic principles of protein structure and folding.

Search strategies: mimic true folding behavior

i) Coupled 2° & 3° structure formation

ii) Iterative fixing to reduce the search

iii) Outputs pathway information

Energy functions: orientational and 2° structure dependence

Challenge:Distill the folding problem down to the basic principles, code them into an algorithm, and predict pathways and

structure without using homology

What novel about how we approached this challenge?

Part I Conclusions

PART II SPEED: ItFix enhanced with evolution PART I ItFix: Homology-free structure prediction

PART III Future directions in prediction

ψ

φ



Cover image of Protein Science, March 2010

MQIFVKTLTGKTITLEV

SPEED: Structure Prediction Enhanced by Evolutionary DiversityIncrease φ, ψ diversity and accuracy

target sequencesequencedatabase IEIKIRDIYSKTYKFMA

IEITCNDRLGKKVRVKC MRLFIRSHLHDQVVISA MKLSVKSPNGRIEIFNE LQFFVRLLDGKSVTLTF IEITLNDRLGKKIRVKC IEIWVNDHLSHRERIKC MDVFLMIRRQKTTIFDA IIVTVNDRLGTKAQIPA MRISVIKLDSTSFDVAV MNVNFRTILGKTYTITV MLLTVRDRSELTFSLQV MQIFVTTPSENVFGLEV MSLTIKF-GAKSIALSL MKYRIRTISNDEAVIEL … ~1000 sequences

multiple sequence alignment

ψ

φ

180°

-180°180°-180° ψ

φ

180°

-180°180°-180°

homology-free sampling SPEED

sampling

Uses sequence data base 107 seq’s, growing fast; PDB only 104 structures growing slowly

Round 2 Ramadistribution

homologyfree

…AGTYEFRKAKIT…

Rama Distribution

Fold 500xwith Eradial

Analyze 2° Structure Statistics

no

yes

Fold 10000x with Eradial or DOPE-PW (all α)

Final 2° Structure

ItFix

MultipleSequenceAlignment

SPEED

SPEED1tif position 4

{IND , IGD , VGN,…}MSA

φ-180° 180°

Homology-free

Final Ramadistribution


1tif position 4 INE

ψ

ψ

ψ

φ

2° structureconverged

180°

-180°180°-180°

-180°

180°-180°

180°

ItFix-SPEED overview

DeBartolo et al., Protein Sci. 2010


homologyfree

…AGTYEFRKAKIT…

Rama Distribution

Fold 500xwith Eradial

Analyze 2° Structure Statistics

no

yes

Fold 10000x with Eradial or DOPE-PW (all α)

Final 2° Structure

ItFix

MultipleSequenceAlignment

SPEED

SPEED1tif position 4

{IND , IGD , VGN,…}MSA

φ-180° 180°

Homology-free

Final Ramadistribution


1tif position 4 INE

ψ

ψ

ψ

φLargest cluster

Refine 100X each with DOPE-PWReject ∆Eradial> 0

min<Energy> 100

prediction

cluster

2° structureconverged

180°

-180°180°-180°

-180°

180°-180°

180°

ItFix-SPEED overview


Clustering predicts model accuracy and confidence

fold ItFix predicted 2° structure

cluster

1af7

1b72

1r69

Local Accuracy

Assaying accuracy

(i.e. we know whether we got it right or wrong)

identify best cluster

1.5 2.0 2.5 3.0 3.5 4.0 4.52

3

4

5

6

7

8

Mea

n C

a-R

MS

D to

nat

ive

of c

lust

er (Å

)

Mean Ca-RMSD between models in cluster (Å)

R2=0.85

Global Accuracy

ItFix

Global Distance Test

Cut

-off

Dist

ance

(Å)

Percentage of residues

RAPTORItFix

Cut-o

ff Di

stan

ce (Å

)

RAPTOR

ItFix

ItFix

T0405 D1 (6.4 Å )

Cut-o

ff Di

stan

ce (Å

)Cu

t-off

Dist

ance

(Å)

T0464 D1 (4.5 Å)

T0429 D2 (6.8 Å)

ItFix

Better template

T0482 (4.8 Å)Performance in CASP8

free modeling

loop insertion modeling

template identification using folding


Aashish Adhikari

Part II Conclusions

• Adding evolutionary information to ItFix improves the accuracy of the conformational search

• Clustering permits global and local prediction of cluster accuracy and uncertainty

• SPEED is successful in the CASP8 experiment

PART II SPEED: ItFix enhanced with evolution PART I ItFix: Homology-free structure prediction

PART III Future directions in prediction




2° and 3° structure

local 2° structure

topology diagram

3D model

3D contacts


Invert the structure prediction problem

design length fold wt % id (wt % sim)

top % id(top % sim)

top-wt % id(top-wt % sim)

protein L1 62 αβ 35 (61) 50 (62) 73 (86)

protein L2 62 αβ 45 (60) 45 (60) 73 (86)

ACP 98 αβ 41 (54) 39 (57) 67 (69)PCP 70 αβ 31 (56) 33 (56) 73 (84)S6 94 αβ 26 (43) 32 (46) 33 (52)

U1A 96 αβ 32 (57) 33 (57) 97 (100)FKB 107 αβ 42 (59) 44 (62) 96 (96)

zinc-finger 28 αβ 21 (38) N/A N/Atenascin 89 β 42 (64) 42 (64) 100 (100)

Current designs are very similar to parent sequences

Can we design a more unique protein sequence?

Design method

01010111

Restrict AA possibilities by burial in native structure for thehydrophobic effect

1

Find best sequences for maximum Rama propensity

2 MKLFVKTP…LTVTIR LIV R Epositional sequence

library

3 Monte Carlo search of Statistical Potential


DOPE

PW

ene

rgy

Distance (Å)

DOPE-PW

Hello Jello

soluble at induce

insoluble at induce

soluble at 3 hrs

insoluble at 3 hrsPreliminary wetlab analysis

• 1ds0 expresses in inclusion bodies• mutations enhance in vitro solubility• further experiments needed

wavelength (nm)

cd

design design-sol native

Thesis defenseConclusions

• Homology-free structure prediction can provide accurate models by mimicking folding pathways

• Adding evolutionary information improves the accuracy of the conformational search

• Inverting our homology-free prediction method into a design algorithm aims to generate unique amino acid sequences

Prof. Tobin SosnickProf. Karl FreedProf. Jinbo Xu

Glen HockyAndres ColubriJames FitzgeraldAbhishek JhaEsmael HaddadianJames HinshawAashish AdhikariJouko VirtanenChloe AntoniouJosiah ZaynerFeng ZhaoJian PengGrzegorz GawlakSrikanth Aravamuthan

Acknowledgements

Funding: NIH, NSF, Joint Theory Institute

ψ

φ

Enhancement of Ramachandran propensity

Nati

ve R

ama

prob

abili

ty

Enhancement in energy and structure prediction

• ∆∆E = -120 (arb. units)• 2X enhancement in native-like models in prediction

AASecStr

position

Seco

ndar

y St

ruct

ure

freq

uenc

y

Round 0

Round 1

Round 2

Round 3

Round 5

Round 7

10

10

10

10

10

01

1af72.7 Å

residue index

Round 1

Round 2

Round 3

Round 6

Round 4

Round 0

Round 1

Round 0

Round 6

Round 8

Round 4

Round 2

Round 1

Round 2

Round 3

Round 6

Round 4

Round 0

1di2 4.6 Å 1r69 2.4 Å

1b72 1.6 Å

2° structure by positionAmino acid by position

Nati

ve b

asin

pro

babi

lity

1b72

SPEED improves native φ, ψ probability across sequence

SPEED increases the native Rama probability

PDB id of target

% p

ositi

ons

with

PN

ative

> 0

.25

SPEED reduces cases where native φ, ψ has a

very low probability 1

2

3

4

native Rama regions180

ψ

-180180-180 φ

CMCα

Rg-Cα

Rg-phil

Rg-phob

CαCβ

Radial energy terms enforce productive chain collapse(global terms)

Rg-Cα: Root-squared distance of Cα from CM. Compactness of model

Ru-Cα: Root-mean-squared deviation of Cα from CM. Enforces a spherical model

Rg-phob/Rg-phil (burial ratio): best packing of hydrophobic residues

180

0 180-180

0

0

0

0

-180 180

-180 180

-180 180

-180

round0MQIFVKT…STLHLVLR

(e.g. pos. 67)

Rama distribution

round1

fold 2000X

Rama distribution

round2

fold 2000X

Rama distribution

round3

fold 2000X

Rama distribution

Eliminating the fixing thresholds from ItFix

WT:ILEHomologs: polar

PHE4

THR14

0.0 5.0 10.0 15.0 20.0 25.0 30.0

0

2

4

6

8

10

0.0 2.0 4.0 6.0 8.0 10.0 12.0

0

2

4

6

8

10

14.0 16.0

distance (Å)

distance (Å)

ener

gyen

ergy

DOPE-PWDOPE-PW-SPEED

DOPE-PWDOPE-PW-SPEED

An evolution-enhanced energy functionDOPE-PW-SPEED

WT:AlaHomologs: polar

Date post:	24-Feb-2016
Category:	Documents
Upload:	eshe
View:	29 times
Download:	0 times

NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

Documents