Introduction to Protein Structure folding and Dynamics: What can we ...

Post on 14-Feb-2017

218 views 0 download

transcript

Introduction to Protein Structure folding and Dynamics: What can we learn from Simulations?

Saraswathi Vishveshwara

Molecular Biophysics Unit

Indian Institute of Science

Bangalore

JNC, Nov 6, 2007

Dream of Structural Biologists

To Generate the three dimensional structures of proteins, given their sequences

Grand Challenge to Simulation

Outline

• Protein structure-function relationship

• Classification scheme of protein structures

• Protein folding problem-How is the uniqueness achieved?

• Methods of investigation: Knowledge based versus ab-initio methods

• Dynamics:

• Equilibrium dynamics: flexible and rigid regions

protein-ligand interactions

(with examples)

• non-equilibrium dynamics: Protein folding

examples of Unfolding/folding simulations

Future challenges

Proteins perform a myriad functionsProteins perform a myriad functions

The function depends crucially upon the folded states The function depends crucially upon the folded states

of the proteinsof the proteins

Biological Functions of Proteins

Protein Structure to Function

Fold

evolutionary

relationshipsBiological

multimeric

states

Disease states

mutations Active sites, enzyme clefts

Antigenic sites

Surface properties

3D STRUCTURE

Protein-Ligand

Interactions

Sequence -----> Structure ------> Function

How far have we reached?

Classification of Protein Architecture

Primary structure Secondary structure

Tertiary structure Quaternary structure

Classification of Protein Architecture

Primary structure

Describes the chemical sequence (amino acid)

Properties of an amino acid:Chirality is always L.“Length” of an amino acid unit is typically 3.6 Å.

Properties of a polypeptide:Peptide bond is stabilized by resonanceTypically in planar, trans configuration

Secondary Structures in Proteins

Alpha Helix

Beta Structures

After the alpha helix, the second most regular and identifiable conformation in polypeptides is the beta sheet.

The basic unit of the beta sheet is the beta strand, which consists of a fully extended polypeptide.

The beta strand is not a stable structure (no interactions between atomsthat are not close in the covalent structure). The beta strand is only stable when incorporated into a beta sheet.

Hydrogen bonds between peptide groups on adjacent beta strands stabilize structure.

The Beta Sheet

Classification of Protein Architecture:Hierarchical levels

Classification of Protein Architecture:Summary

Primary structure: sequence

Secondary structure: helices, sheets

Super-secondary: beta hairpin

Domain: 3 domains αααα1, αααα 2, αααα 3

Tertiary structure: αααα1-3 is a

single folded unit

Quaternary structure: αααα1-3 and

ββββ2M are different

molecules that associate

Proteins are HETERO polymer chains made up of: Proteins are HETERO polymer chains made up of:

An alphabet of 20 AMINO ACIDSAn alphabet of 20 AMINO ACIDS

The amino acids have The amino acids have different sizes and shapesdifferent sizes and shapes

and posses properties such as:and posses properties such as:

AcidicAcidic

BasicBasic

PolarPolar

AromaticAromatic

HydrophobicHydrophobic

Where does chemistry come into picture?

Amino acids in different

sizes

shapes

chemical properties

Hydrogen bonding capabilities of side chains

Types of interactions :

• Covalent

• Non-covalent

• Hydrogen bond

R1

H R2

H Rn

H

2N C

1α C ’ N C

1 α C’ N -------------------------- Cα C’ O H

Hα O Hα O H

αO

Primary

Types of Neighbours :

• Sequence

• Spatial

Secondary

Tertiary

Levels of Interactions in Proteins

Quaternary

Can we understand the rules of protein structure, folding and function?

Protein Folding Problem

Amino acid sequence –coded by genetic code

Folding of Proteins to unique three dimensional

structure-Code?

Incorporated in the sequence:

Landmarks in protein Folding :

Folded structure of a Protein-determined by

thermodynamics- Anfinsen-1973

Levinthal paradox- How is the speed of folding

matched with enormous number of conformational

search? (1968)

Nature selects sequences which have ability to fold

rapidly-Levinthal (1987)

Folding of the protein

Chain from random

to the native state

Energy Landscape View

Wolynes and coworkers

Protein Folding Problem

The information for folding of protein to unique three

dimensional structure is encoded in its sequence -Anfinsen

Geometry Optimization View: The super-secondary structure of proteins are

optimally packed

Maritan, Banavar, and co-workers

Development of Theoretical/Computational Methods

•Geometrical concepts G.N. Ramachandran

….and his Φ-ψ Map

•Energy considerations: Force field parameters, optimization,

Molecular Mechanics

Molecular Dynamics

•Difficulties in evaluating conformational entropy, Free-Energy

•Understanding the rules of structure and folding through

Protein-structure data analysis

Optimization involves

A wide range of interaction energy scales

Types of interactions order of energy

(Kcal/mole)

Covalent 100-150

Electrostatics 20-40

Hydrogen Bond 3-20

Van der Waals 0.1- 2

Hydrophobic Interaction (conformational

entropy)

Geometry of Side Chains:

Wide variation in Shapes and Sizes

Close packing should involve a subtle balance of

shape complimentarity and the energy of interactions

First principles to Protein structure

OR

Derive rules from the observed data

A few references

Conformational Analysis of proteins

1963 Ramachandran G.N., Ramakrishnan C. and Sasisekharan V. J Mol Biol., 7, 95-9.

1969 Scheraga, Calculation of polypeptide conformation. Harvey Lect. 63:99-138

Protein Structure Prediction

1974 Chou and Fasman, Prediction of protein conformation, Biochemistry;13:222-245

1975 Levitt and Warshal, Computer simulation of protein folding, Nature 253, 694

1977 McCammon.. Karplus, Dynamics of folded proteins, Nature, 267, 585

1999, Liwo, & Scheraga, Protein structure prediction by global optimization, PNAS, 96, 5482-5485

2001, Hassinen & Peräkylä, New energy terms for reduced protein modelsJ. Comput Chem

Protein potentials of mean force

1971,Pohl, Nature, 234, 277 1976,Tanaka and Scheraga, Macromolecules, 9, 142-159 1978,Warme and Morgan, J. Mol. Biol 118, 273-2871985, Miyazawa & Jernigan, Macromolecules, 18, 534 1990, Sippl, J. Mol. Biol. 213, 8191992, Jones et al

Books

1989,Fasman "Prediction of Protein Structure " 1994, Merz and le Grand "Protein Folding Problem & Tertiary Structure Prediction"1996,Sternberg "Protein structure prediction: A practical approach"2000, Webster "Protein structure prediction: Methods" 2001,Friesner "Protein Folding Problem & Tertiary Structure Prediction" 2002, Tsigelny "Protein structure prediction: A bioinformatic ...”

G. N. Ramachandran

….and his Φ-ψ Map

Geometrical Concept-reduction in conformational space

molecular mechanics force fieldmolecular mechanics force field

bonded interactionsbonded interactions

( ) ( )2

021 rrkrv bb −=

( ) ( )2

021 rrkrv bb −= ( ) ( )2

021 θθθ −= aa kv ( ) ( )( )0φφcos1φ −+= nkv dd

nonnon--bonded interactionsbonded interactions

( ) ( ) ( )

6

6

12

12

ij

ij

ij

ij

r

C

r

C

ijlj rv −= ( )ij

ji

r

qq

ijc rv0πε4

=

Molecular Dynamics Simulations

�Starting point: The coordinates of all the atoms

�Define the classical equations of motion

�Set the parameters using force fields

�Raising the system to the selected temperature

�Time dependent integration

�Length of simulation

�Generation of the equilibrium trajectories

(coordinates of the final state)

�Analysis of the trajectory for properties

Equilibrium Simulations of proteins

Analysis of parameters

Time averaged structure and fluctuations

Root Mean square deviation (RMSD)

Residue-wise RMSD

Backbone, Side chains, Secondary structures, Surface residues, ….

Time dependent motions

(in terms of internal parameters- phi, psi, chi torsional angles)

Global motionsLoop movementsInter-domain movementsHinge bending

Correlated fluctuations

Between loops, between residues

Interaction with ligands, solvents

Protein-ligand Interactions: Hydrogen bondsHydrophobic pocketsInteractions

mediated through solvent molecules (water)

Biologically relevant information from simulations :

A case study of Ribonuclease family proteins

Functional Diversity of Ribonuclease-A Superfamily

(Common function: Cleavage of RNA )Protein Specificity Other Functions

Ribonuclease A Single stranded RNA (Pancreatic)BS-Rnase A Single and Double Stranded RNA Aspermatogenicity(Seminal fluid) DNA:RNA Hybrid Immunosuppression

Antiviral,Antitumor

Angiogenin Weaker than Rnase A Angiogenesis(Plasma, Tumor cells)

Eosinophil Proteins:Eosinophil-Derived-Neurotoxin Weaker than Rnase A Helminthotoxicity(EDN) & (not active on small substrates) (toxic to parasites)Eosinophil Cationic Protein Neurotoxicity(ECP) Cytotoxicity

Similarities among the Rnase-A Family Proteins

1.Sequence Homology:

2. 3-D Structure 3. Biological Function: Ribonuclease activity:

cleaves 3’-5’ Phosphodiester bond

1DYT_A RPPQFTRAQWFAIQHISLN------PPRCTIAMRAINNYRWRCKNQNTFLRTTFANVVNVCGN 57

1HI2_A KPPQFTWAQWFETQHINMT------SQQCTNAMQVINNYQRRCKNQNTFLLTTFANVVNVCGN 58

7RSA__ ---KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQ 60

2ANG_A --QDNSRYTHFLTQHYDAKPQGR-DDRYCESIMRRRGLTSP-CKDINTFIHGNKRSIKAICEN 59

. : * .** . * *: . ** ***: . .: :* :

1DYT_A QSIRCPHNRTLNNCHRSRFRVPLLHCDLINPGAQNISNCRYADRPGRRFYVVACDNRDPR-DSPRYPVVPVHLDTTI---- 133

1HI2_A PNMTCPSNKTRKNCHHSGSQVPLIHCNLTTPSPQNISNCRYAQTPANMFYIVACDNRDQRRDPPQYPVVPVHLDRII---- 135

7RSA__ KNVACKNGQT--NCYQSYSTMSITDCRETGSS--KYPNCAYKTTQANKHIIVACEG-------- NPYVPVHFDASV---- 124

2ANG_A K----NGNPHRENLRISKSSFQVTTCKLHGGS--PWPPCQYRATAGFRNVVVACENG--------- -LPVHLDQSIFRRP 123

. * * . : * . . * * . :***:. . :***:* :

1DYT- Eosinophil cationic Protein

1HI2- Eosinophil Derived Neurotoxin

7RSA- Ribonuclease A

2ANG- Angiogenin

Simulations and Analysis

RMSD Trajectories

RMSD as a function of simulation time

continuous line: w.r.t <MD>

broken line: w.r.t crystal structure

RMSD as a function of residue number

Features of Hydrogen Bonds

•Donor/Acceptor

•Secondary Structures

•Non-secondary structural hydrogen bonds

•Side-chain hydrogen bonds

•Dynamically stable

•Rearrangement during dynamics

Mapping of Intra-protein dynamically stable hydrogen bonds

d( D….A) < 3.2 A

Angle (H-D…..A) between 0 to +(-) 60deg

Sanjeev, Vishveshwara, JBSD, 2005

MRT as a function of Residue number

Maximum Residence Time (MRT)

A measure of the capacity to interact with water molecules

Bridging Water

Internal Water

Active-site Water

Different Types of Protein-Water Interactions

( From ECP Simulation)

OPEN FORMCC CLOSED FORM

Eosinophil Cationic Protein (ECP)

Alternate conformations found during the simulation. Different

patterns of Water molecules (grey) are seen

Comparison of invariant water positions in Rnase-A Family Proteins

Angiogenin

Ribonuclease-A EDNECP-A

ECP-B

• backbone N-H

• backbone =O

Side chain on the cartoon

MRT > 1000

A schematic representation of the network of hydrogen bonds (including the protein-ligand atoms and the water molecule) obtained from the simulation of Eosinophil Cationic Protein-CpG complex.

Simulations to understand protein folding and unfolding processes

Unfolding simulations are less time consuming than the folding ones

Unfolding simulations

The measured parameters in protein unfolding studies are

♦ RMSD of intermediate states from the native state

♦ transition states

♦ Ф values

♦ the structure of final state.

Identification of the unfolding transition state of CI 2 by MD simulation

• Chymotrypsin inhibitor 2 (CI 2) is a 83 residue protein

• Extensively studied

• A pseudo wild type protein (pwt) with E14→A and E15→A and 1.7Å resolution

Aijun Li and V. Daggett ,JMB 257 (1996) 412-429

Five simulations

♦ 298K (crystal structure)♦ 498K-(crystal structure) (2.2ns)

-(NMR-derived solution structure(3) (>1ns)

Analysis:

• RMSD value vs. simulation timeIdentification of transition state by

these plots

♦ Relative accessible surface area vs. simulation time

♦ Percentage of secondary and tertiary contacts vs. simulation time

♦ The H-bonds vs. simulation time

413.8165-701.0NMR3MD4(TS4)

234.2495-1001.2NMR3MD3(TS3)

413.74330-3351.0NMR1MD2(TS2)

433.35220-2252.2XTALMD1(TS1)

%Native H – bonds

RMSD (Å)

TS (ps)

SimulationTime (ns)

Startingstructure

Simulation

Summary of all the four different unfolding simulations at 498K and their transitions states

● By the analysis of these curves and observed changes in the structure of protein transition states were identified

● The transition state is partially structured● α-helix is weakened but partially intact and the β-sheet is totally

disrupted in transition state

RMSD Plots

The percentage of secondary and tertiary structure as a function of simulation time at 498K

Ф value Analysis

The experimental parameter for identifying the transition state is Ф value

ФF = ∆GT-U- ∆G’T-U / ∆GF-U- ∆G’F-U

= ∆ ∆GT-U / ∆ ∆G’ F-U

Where ∆GT-U → the free energy difference betweenthe transition and unfolded state,

∆GF-U → the free energy difference betweennative and unfolded state,

and ∆G’→ represent the mutated state.

The Ф value by the MD simulation is given by:

ФMD = NTS,wt – NTS,mut / NN,wt – NN,mut= ∆NTS / ∆NN

where NTS,wt = no. of van der Waals contacts in transition state in wild type

and NTS,mut = no. of van der Waals contacts in transition state in the mutated

proteinsimilarly NN represents the native state

To Summarize………

� The transition states in all four simulations are similar

� The calculated Ф values by MD agree with the experimentally measured Ф values(R = 0.94)

� The disruption of hydrophobic core and associated secondary structure is the rate determining step of the unfolding process V. Daggett et. al., JMB 257 (1996) 430-440

The effect of temperature on the pathway of unfolding of CI2

• Seven simulations at different temperatures for different time were carried out

(1) 298K (50ns), (2) 348K (80ns),(3) 373K (94ns), (4) 398K (40ns), (5) 448K (40ns), (6) 473K (20ns), (7) 498K (20ns)

Ryan Day et. al., JMB 322 (2002) 189-203,

The change in Cα RMSD (Å) values with simulation time

Lessons from multiple temperature simulations

� The unfolding pathway of CI2 is independent of temperature

� The global unfolding events are same in all simulations

� The average number of tertiary contacts in the unfolding transition state remain same (~172)

� The thermal denaturation of proteins is an activated process taking place on an energy landscape that is not grossly changed by elevated temperature. A. Fersht &V. Daggett Mol Cell Bio. 4(2003) 497-502

The protein folding problem can be viewed as three different problems:

1. defining the thermodynamic folding code

2. devising a good computational structure prediction algorithm

3. folding speed (Levinthal’s paradox) — the kineticquestion of how a protein can fold so fast

The protein folding problem: when will it be solved?Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera andVincent A VoelzCurrent Opinion in Structural Biology 2007, 17:342–346

2 Computational protein structure prediction:

Bioinformatics based (Knowledge based) - more successful

Physics based:- only a few attempts

1. The first milestone :Duan Y, Kollman PA: Pathways to a protein folding intermediateobserved in a 1-microsecond simulation in aqueous solution.Science 1998, 282:740-744.

2. IBM Blue Gene group folded the 20-residue Trp-cage peptide within 1 A using 92 ns of molecular dynamics.

Pitera JW, Swope W: Understanding folding and design:replica-exchange simulations of ‘‘Trp-cage’’ miniproteins.Proc Natl Acad Sci USA 2003, 100:7587-7592.

3. Folding@Home, a distributed grid computing system, folded the protein villin

Zagrovic B, Snow CD, Shirts MR, Pande VS: Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing. J Mol Biol 2002, 323:927-937

Details of Folding simulations designed mini-protein BBA5-

Vijay Pande et.al

• System: 23 residue designed mini, soluble, monomeric protein BBA5

• Ac-YRVPSYDFSRSDELAKLLRQHAG-NH2

• It contains all the three elementary units of secondary structure

• It folds in the absence of disulphide bonds or metal-binding centers

• Two mutants F8-W and V3-Y

Vijay Pande et. al., Nature 420 (2002): 102-106;

Structure of 23 residues designed mini protein BBA5

Simulation details

• Starting structure: fully extended conformer (φ = -135 and ψ = 135)

● Simulation time: Total 15,000 single-mutant folding simulations (20ns) at 278K and 298K

● Similarly 9000 and 8500 double-mutant folding simulations at 278K (20ns) and 338K (10ns)

● Computer used: 30,000 volunteer computers aroundthe world for several months (106 CPU days)

● The folding rate constant is given by : k = Nfolded / t* Ntotal

where Nfolded → no. of simulations that reached the folded state in time t out of Ntotal

Findings from the folding simulations

� In total 32,500 folding trajectories: β hairpin was observed in 1100 and α-helix was in 21,000

� In 9000 double mutant folding trajectories at 278K, 16 were folded after 20ns simulation

� The two state assumption is valid by thermodynamic data

� Following experimental data are in agreement with simulation:

♦ The helical structure in unfolded state, ♦ Fragment secondary structure propensity, ♦ Rate formation of helix, hairpin and ♦ Rate of folding

3. Folding speed and mechanism:

Robert L Baldwin, " understanding the mechanism of protein folding might lead to fast computational algorithms for predicting native structures from their amino acid sequences”

This has been a central challenge. To instruct a computer program to find a native state more efficiently than Monte Carlo or molecular dynamics, we need more. We need to know the microscopic folding routes.

To summarize the status of protein Simulations..

Problems addressedReliable force fields

Simulation time- Good enough for equilibrium properties of small proteins

Demonstration of folding in a few small peptides/proteins

Challenges aheadComfortable simulation of large proteins, assembles like ribosome, membrane proteins

Ab-initio folding simulations on a routine basis

Better understanding of the basic principles, which will enable reliable folding simulations

Combined QM/MD studies to investigate the processes involving changes in the covalent states

Comparison between the calculated Ф values for transition state by MD simulations (for 3different native states)and experimentally measured ФF values for 11 mutated residues