+ All Categories
Home > Documents > BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and...

BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and...

Date post: 16-Jan-2016
Category:
Upload: dennis-sullivan
View: 224 times
Download: 2 times
Share this document with a friend
Popular Tags:
67
BL5203: Molecular Recognition & Interaction BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Protein-Protein Interaction Y.Z. Chen Y.Z. Chen Department of Pharmacy Department of Pharmacy National University of Singapore National University of Singapore Tel: 65-6616-6877; Email: Tel: 65-6616-6877; Email: [email protected] ; Web: ; Web: http://bidd.nus.edu.sg Content Content Protein fold and structure Protein fold and structure Homology modeling Homology modeling Protein-protein docking Protein-protein docking
Transcript
Page 1: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

BL5203: Molecular Recognition & Interaction BL5203: Molecular Recognition & Interaction

Lecture 6: Modeling Protein Structure and Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Protein-Protein Interaction

Y.Z. ChenY.Z. ChenDepartment of PharmacyDepartment of Pharmacy

National University of SingaporeNational University of Singapore Tel: 65-6616-6877; Email: Tel: 65-6616-6877; Email: [email protected] ; Web: ; Web: http://bidd.nus.edu.sg

ContentContent

• Protein fold and structureProtein fold and structure

• Homology modelingHomology modeling

• Protein-protein dockingProtein-protein docking

Page 2: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Sizes of protein databasesSizes of protein databases

1

100

10,000

1,000,000

100,000,000

10,000,000,000

Protein

residues

Protein

sequences

Protein

structures

Protein

complexes

500M 1.6M 26K 1K

Page 3: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Swiss-Prot databaseSwiss-Prot database

Page 4: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein world

Protein fold

Protein structure classificationProtein structure classification

Protein superfamily

Protein familyNew Fold

Page 5: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

PDB New Fold GrowthPDB New Fold Growth

• The number of unique folds in nature is fairly small (possibly a few thousands)

• 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB

New folds

Old folds

New

PD

B s

truct

ure

s

Page 6: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein classificationProtein classification

• Number of protein sequences grow exponentially• Number of solved structures grow exponentially• Number of new folds identified very small (and

close to constant)• Protein classification can

– Generate overview of structure types– Detect similarities (evolutionary relationships) between

protein sequences

Page 7: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Problems in Protein Problems in Protein BioinformaticsBioinformatics

• 20,000 entries of proteins in the PDB

• 1000 - 2000 distinct protein folds in nature

• Thought to be only several thousand unique folds in all

• Prediction of structure from sequence– Fold recognition– Fragment construction

• Proteome annotation

• Protein-protein docking

Page 8: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein folding codeProtein folding code

Proteinfoldingcode

Proteinstructure

Protein sequence

Page 9: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Prediction of correct foldPrediction of correct foldQuery sequence Fold

recognition

Eisenberg et al.Jones, Taylor, Thornton

Matchedfold

Match sequence against library of known folds

Page 10: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computational RequirementsComputational Requirements

• 1 sequence search takes 12 mins (3Ghz)

• Benchmarking on 100 proteins with 100 runs for a simplex search of parameter space = 80 days

• 30 approaches explored = 7 years (on 1 cpu)

Page 11: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Types of Structure PredictionTypes of Structure Prediction

• De novo protein– methods seek to build three-dimensional

protein models "from scratch" – Example: Rosetta

• Comparative protein – modeling uses previously solved structures as

starting points, or templates.– Example: protein threading

Page 12: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Factors that Make Protein Structure Factors that Make Protein Structure Prediction a Difficult Task Prediction a Difficult Task

• The number of possible structures that proteins may possess is extremely large, as highlighted by the Levinthal paradox

• The physical basis of protein structural stability is not fully understood.

• The primary sequence may not fully specify the tertiary structure. – chaperones

• Direct simulation of protein folding is not generally tractable for both practical and theoretical reasons.

Page 13: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Homology ModelingHomology Modeling

• Homolog a protein related to it by divergent evolution from a common ancestor

• 40 % amino-acid identity with its homolog – NO large insertions or deletions – Produces a predicted structure

equivalent to that of a medium resolution experimentally solved structure

• 25 % of known protein sequences fall in a safe area implying they can be modeled reliably

Page 14: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Homology Modeling DefinedHomology Modeling Defined

• Homology modeling – Based on the reasonable assumption that two

homologous proteins will share very similar structures.

– Given the amino acid sequence of an unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated computationally, into the corresponding amino acid from the unknown structure.

Page 15: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Homology Modeling LimitationsHomology Modeling Limitations

• Cannot study conformational changes• Cannot find new catalytic/binding sites• Brainstorm lack of activity vs activity

– Chymotrypsionogen, trypsinogen and plasminogen– 40% homologous– 2 active, 1 no activity, cannot explain why

• Large Bias towards structure of template• Models cannot be docked together

Page 16: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Why Homology Modeling?Why Homology Modeling?

• Value in structure based drug design• Find common catalytic sites/molecular

recognition sites• Use as a guide to planning and interpreting

experiments• 70-80 % chance a protein has a similar fold to

the target protein due to X-ray crystallography or NMR spectroscopy

• Sometimes it’s the only option or best guess

Page 17: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein ThreadingProtein Threading

• A target sequence is threaded through the backbone structure of a collection of template proteins (fold library)

• Quantitative measure of how well the sequence fits the fold

• Based on assumptions – 3-D structures of proteins have characteristics that

are semi-quantitatively predictable– reflect the physical-chemical properties of amino

acids– Limited types of interactions allowed within folding

Page 18: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Fold Recognition MethodsFold Recognition Methods

• Bowie, Lüthy and Eisenberg (1991)• 2 approaches to recognition methods• Derive a 1-D profile for each structure in the fold

library and align the target sequence to these profiles – Identify amino acids based on core or external

positions– Part of secondary structure

• Consider the full 3-D structure of the protein template – Modeled as a set of inter-atomic distances– NP-Hard (if include interactions of multiple residues)

Page 19: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein ThreadingProtein Threading

• The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on each template

Page 20: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein ThreadingProtein Threading

Page 21: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Generalized Threading ScoreGeneralized Threading Score

• Want to correctly recognize arrangements of residues• Building a score function

– potentials of mean force – from an optimization calculation.

• G(rAB) = kTln (ρAB/ ρAB°)– G, free energy– k and T Boltzmanns constant and temperature respectively– ρ is the observed frequency of AB pairs at distance r. – ρ° the frequency of AB pairs at distance r you would expect to

see by chance.

• Z-score = (ENat - <Ealt>)/σ Ealt

– Natural energies and mean energies of all the wrong structures/ standard deviation

Page 22: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Scoring Different FoldsScoring Different Folds

• Goodness of fit score– Based on empirical energy

function– Modify to take into account

pairwise interactions and solvation terms

– High score means good fit– Low score means nothing

learned

Page 23: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Some Threading ProgramsSome Threading Programs

• 3D-pssm (ICNET). Based on sequence profiles, solvatation potentials and secondary structure.

• TOPITS (PredictProtein server) (EMBL). Based on coincidence of secondary structure and accesibility.

• UCLA-DOE Structure Prediction Server (UCLA). Executes various threading programs and report a consensus.

• 123D+ Combines substitution matrix, secondary structure prediction, and contact capacity potentials.

• SAM/HMM (UCSC). Basen on Markov models of alignments of crystalized proteins.

• FAS (Burnham Institute). Based on profile-profile matching algorithms of the query sequence with sequences from clustered PDB database.

• PSIPRED-GenThreader (Brunel) • THREADER2 (Warwick). Based on solvatation potentials and contacts

obtained from crystalized proteins. • ProFIT CAME (Salzburg)

Page 24: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Process of 3D Structure Process of 3D Structure Prediction by ThreadingPrediction by Threading

• Has this protein sequence similarity to other with a known structure?

• Structure related information in the databases• Results from threading programs• Predicted folding comparison• Threading on the structure and mapping of the

known data • A comparison between the threading predicted

structure and the actual one

Page 25: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein Threading Based on Multiple Protein Protein Threading Based on Multiple Protein Structure AlignmentStructure Alignment

Tatsuya Akutsu and Kim Lan SimTatsuya Akutsu and Kim Lan SimHuman Genome Center, Institute of Medical Science, Human Genome Center, Institute of Medical Science,

University of TokyoUniversity of Tokyo

• NP-Hard if include interactions between 2 or more AA

• Determine multiple structural alignments based on pair wise structure alignments – Center Star Method

Page 26: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Center Star MethodCenter Star Method• Let I0 be the maximum number of gap symbols placed before the first

residue of S0 in any of the alignments A(S0; S1); : : : ;A(S0; SN). Let IS0j be

the maximum number of gaps placed after the last character of S0 in any

of the alignments, and let Ii be the maximum number of gaps placed

between character S0;i and S0;i+1, where Sj:i denotes the i-th letter of

string Si

• Create a string S0 by inserting I0 gaps before S0, IjSo gaps after S0, and Ij

gaps between S0;I and S0;i+1.

• For each Sj (j > 0), create a pairwise alignment A(S0; Sj) between S0 and

Sj by inserting gaps into Sj so that deletion of the columns consisting of

gaps from A(S0; Sj) results in the same alignment as A(S0; Sj).

• Simply arrange A(S0; Sj )'s into a single matrix A (note that all A(S0; Sj )'s

have the same length).

Page 27: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Simple Threading AlgorithmSimple Threading Algorithm• Apply simple score function based on structure alignment algorithm

– Let X = x1……xN (input amino acid sequence)– Ci ( i-th column in A)

• Test and analyze results and/or apply constraints

Page 28: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Protein Threading with ConstraintsProtein Threading with Constraints

• Assume part of the input sequence xi…xi+k must correspond to part of the structure alignment c j…cj+k

• Apply constraints

Page 29: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Prediction PowerPrediction Power

• Entered in CASP3 competition• 17 predictions made• 3 targets evaluated as similar to correct folds• Only team to create a nearly correct model for

structure T0043• Best in competition

– 8 evaluated as similar to correct

Page 30: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Next time….Next time….

• In depth detail of– Multiple structural alignment program

• Multiprospector

– Global Optimum Protein Threading with Gapped Alignment

• Quality measures for protein threading models

• Improvements on threading-based models

Page 31: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Gapped AlignmentGapped Alignment

Page 32: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Trial structures for a local sequence taken from database of segments of known 3D structure

.

Fragment based methodFragment based method1 -Predict structure 1 -Predict structure of segmentof segment

Page 33: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Fragment based methodFragment based method2 - Construct trial model from segments2 - Construct trial model from segments

Page 34: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

1 Low resolution energy function used in initial search through conformational space

2 - Side chains represented by single “centroid” pseudoatom

3 - Major contributions from Hydrophobic burial Beta strand pairing Steric overlap Specific residue pair interactions

4 - Models then refined using explicit rotamer based side chain representation and potential from design method

Fragment based methodFragment based method3 - Identify good trial structures3 - Identify good trial structures

Page 35: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Fragment-based protein foldingFragment-based protein folding

observed

Cro repressor(1orc)

Page 36: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computational RequirementsComputational Requirements

• Methodology performs numerous simulations and looks for clusters

• One simulation takes 3 mins (3Ghz)

• Require 1,000 simulations per protein = 2 days

• Benchmark on 50 proteins = 100 days

Page 37: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Annotation procedure

MySQL database

New research

3D-GENOMICS - proteome 3D-GENOMICS - proteome annotationannotation

WWW

Databasesequences

Databasestructures

Proteomesequences

Functionaldata

Page 38: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Types of annotationTypes of annotation

Enzyme ABCEC 1.2.3.4- functionsuggested

E. coli Protein325-homologybut nofunction

membraneprotein

No similarsequence- orphan

structure

Page 39: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

3D-Genomics database3D-Genomics database-structural and functional annotation-structural and functional annotation

size

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

M. genitalium

H. pylori J99

A. aeolicus

M. jannaschii

P. horikoshii

H. influenzae

V. cholerae

M. tuberculosis H37Rv

B. subtilis

E. coli K12

S. cerevisiae

D. melanogaster

C. elegans

H. sapiens

fraction of proteome (% of residues)

structure

function

any homology

non globular

orphan

Page 40: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computational requirementsComputational requirements

• Today 800,000 protein sequences.

• Each sequence 15 mins to annotate on 2.5GHz cpu.

• Time today = 8,000 cpu days = 2.5 months with 100 processor farm.

• Need to update every 6 months.

• No of sequences will double in 2-3 years and so will keep pace with increase in compute power.

Page 41: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Modelling protein-protein Modelling protein-protein dockingdocking

Page 42: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Modelling protein-protein Modelling protein-protein dockingdocking

Page 43: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Coordinatesof mol 1

Coordinatesof mol 2

Rigid body search

List of possible complexes

Evaluate association energy

Flexibility to refine

List of complexes

Experimentalinformation

Protein-protein dockingProtein-protein docking

Page 44: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 1 - Generating ComplexesStep 1 - Generating Complexes

Page 45: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

+1

A(i,j,k) B(l,m,n)

C = A(i,j,k) x B(l,m,n)

+1-15

overlap+1 x -15

match+1 x +1

Shape complementarityShape complementarity

Page 46: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

+1

-1+1

-1

Charge in 1 = Q(i,j,k) Potential outside 2 V(l,m,n)

E = Q(i,j,k) x V(l,m,n)

Electrostatic complementarityElectrostatic complementarity

Page 47: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 2 - Modelling residue-Step 2 - Modelling residue-residue interactionsresidue interactions

V

I

E

Page 48: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 2 - Modelling residue-Step 2 - Modelling residue-residue interactionsresidue interactions

V

I

E

Page 49: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Empirical residue pair potentialsEmpirical residue pair potentials

a b

Analyse residues packing across 90 hetero-protein interfaces

A pair of residues pack if one atom-atom contact

Score(a,b) = log10 (Observed no a/b pairs) (Expected no a/b pairs)

< distance cut off (4.5A)

Page 50: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 3 - Including informationStep 3 - Including informationabout functional residuesabout functional residues

E

From literature

Page 51: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 3 - Including informationStep 3 - Including informationabout functional residuesabout functional residues

E

From literature

Page 52: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Step 4 - Refinement by Step 4 - Refinement by multicopymulticopy

Search for optimalcombination ofside-chain rotamersby energy calculation

+ Limitedrigid-body shifts

Page 53: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

CAPRI - blind test of dockingCAPRI - blind test of docking

unboundamylase

bound Ab - X-raybound Ab - predicted

Prediction / Actual:Difference =0.6A

Page 54: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computational RequirementsComputational Requirements

• 1 run of procedure takes 2 day on one 3Ghz processor

• Development tested on 30 protein complexes takes 60 days for one parameter set

• Applications– extension to predict which protein interacts with

another requires 1000s of docking simulations

Page 55: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Application areaApplication area

• Protein structure prediction– fold recognition– simulation

• Proteome annotation

• Protein-protein docking

Page 56: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computing costComputing cost

• Modelling algorithm on one protein 10 mins - 2 days on one 3GHz cpu

• But algorithm development requires consideration of several structures (50 -100) with different parameter sets.

• Hence years of cpu required

Page 57: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Structure prediction & sequence spaceStructure prediction & sequence space

ASDJFHLKASDLFHASDFLHUHOUIQWEQWEONBLQWEROKJASDFPOIQWERUHOQWEORSADFLKJIJ

ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFGQWOIEGTXKNBVALHERTASDLFHIUWERHSDDFGHKBJDDURMWOFBMFERTJFGJDKEGORTMVIRGHRT

ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFG

ASDJFHLKASDASDFLHUHOUIQWEONBLQWERASDFPOIQWERQWEORSADFLK

Page 58: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Multiple sequence alignments aid Multiple sequence alignments aid comparative protein modelingcomparative protein modeling

• 1 in 3 sequences are recognizably related to at least one protein structure.

• A significant fraction of the remaining 2/3 have solved structural homologues, but they are not recognized through sequence similarity searching techniques.

• Marti-Renom et al. (2000)

• Multiple sequence alignments greatly improve the efficacy and accuracy of almost all phase of comparative modeling.

• Venclovas (2001)

Page 59: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Computational protein designComputational protein design

Native structure

Iterative refinementNew sequence

Page 60: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Large scale sequence Large scale sequence generationgeneration

200,000Total sequences generated

4,000Processors available

80 daysTotal time of data collection

26,400Total backbone variants

264Total structures

“Reverse BLAST” study:

Page 61: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

““Reverse BLAST”: Reverse BLAST”: finding templates for finding templates for

comparative modelingcomparative modeling

Larson SM, Garg A, Desjarlais JR, Pande VS. (2003) Proteins: Structure, Function, and Genetics

Page 62: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Experiment: Sequence qualityExperiment: Sequence quality

ASDFASDFASDFASFDSAFASDFASDFAFASDFASDFASDFAFHFDIDIFERIDKDADHFYWTEFHHASDASDFYEFHGASDFVADHFYWTEFHHASDASDFYEFHGASDFVDGSAHDYERCNDFKAKSLKALSDFPLAK

Design BLAST E<0.01

Page 63: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Results: Sequence qualityResults: Sequence quality

1E-17

1E-16

1E-15

1E-14

1E-13

1E-12

1E-11

1E-10

1E-09

1E-08

1E-07

1E-06

1E-05

0.0001

0.001

0.01

0.1

1

10

0 25 50 75 100 125 150 175 200 225

Designed sequence profile (ranked by E-value)

E-v

alu

e o

f b

est

PD

B h

it

0

5

10

15

20

25

30

Ave

rag

e id

enti

ty t

o n

ativ

e se

qu

ence

(%

)

Page 64: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Method: “Reverse BLAST”Method: “Reverse BLAST”

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

BLAST E<0.01

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

Designed Sequences Hypothetical Proteins Structural Templates

Page 65: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Do the designed sequences help?Do the designed sequences help?

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

2 3 4 5 6 7 8 9 10

E-value threshold (-log(E))

hit

s w

ith

seq

uen

ce a

lig

nm

ent

: h

its

wit

ho

ut

0

20

40

60

80

100

120

140

160

Tota

l u

niq

ue

hit

s

Correctly identified structural templates

fold-increase in # of templates

fold-increase in # of genes

total hits

Page 66: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Remote homology detectionRemote homology detection

Page 67: BL5203: Molecular Recognition & Interaction Lecture 6: Modeling Protein Structure and Protein-Protein Interaction Y.Z. Chen Department of Pharmacy National.

Optimizing structural diversityOptimizing structural diversity

0

10

20

30

40

50

60

70

80

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

RMSD of structural ensemble (Angstroms)

(%)

0

1

2

3

4

5

6

Seq

uen

ce e

ntr

op

y

sequence entropy

prediction accuracy

prediction coverage

mean pairwise %ID

mean native %ID


Recommended