Date post: | 14-Feb-2017 |
Category: |
Documents |
Upload: | trinhquynh |
View: | 218 times |
Download: | 0 times |
Introduction to Protein Structure folding and Dynamics: What can we learn from Simulations?
Saraswathi Vishveshwara
Molecular Biophysics Unit
Indian Institute of Science
Bangalore
JNC, Nov 6, 2007
Dream of Structural Biologists
To Generate the three dimensional structures of proteins, given their sequences
Grand Challenge to Simulation
Outline
• Protein structure-function relationship
• Classification scheme of protein structures
• Protein folding problem-How is the uniqueness achieved?
• Methods of investigation: Knowledge based versus ab-initio methods
• Dynamics:
• Equilibrium dynamics: flexible and rigid regions
protein-ligand interactions
(with examples)
• non-equilibrium dynamics: Protein folding
examples of Unfolding/folding simulations
Future challenges
Proteins perform a myriad functionsProteins perform a myriad functions
The function depends crucially upon the folded states The function depends crucially upon the folded states
of the proteinsof the proteins
Biological Functions of Proteins
Protein Structure to Function
Fold
evolutionary
relationshipsBiological
multimeric
states
Disease states
mutations Active sites, enzyme clefts
Antigenic sites
Surface properties
3D STRUCTURE
Protein-Ligand
Interactions
Sequence -----> Structure ------> Function
How far have we reached?
Classification of Protein Architecture
Primary structure Secondary structure
Tertiary structure Quaternary structure
Classification of Protein Architecture
Primary structure
Describes the chemical sequence (amino acid)
Properties of an amino acid:Chirality is always L.“Length” of an amino acid unit is typically 3.6 Å.
Properties of a polypeptide:Peptide bond is stabilized by resonanceTypically in planar, trans configuration
Secondary Structures in Proteins
Alpha Helix
Beta Structures
After the alpha helix, the second most regular and identifiable conformation in polypeptides is the beta sheet.
The basic unit of the beta sheet is the beta strand, which consists of a fully extended polypeptide.
The beta strand is not a stable structure (no interactions between atomsthat are not close in the covalent structure). The beta strand is only stable when incorporated into a beta sheet.
Hydrogen bonds between peptide groups on adjacent beta strands stabilize structure.
The Beta Sheet
Classification of Protein Architecture:Hierarchical levels
Classification of Protein Architecture:Summary
Primary structure: sequence
Secondary structure: helices, sheets
Super-secondary: beta hairpin
Domain: 3 domains αααα1, αααα 2, αααα 3
Tertiary structure: αααα1-3 is a
single folded unit
Quaternary structure: αααα1-3 and
ββββ2M are different
molecules that associate
Proteins are HETERO polymer chains made up of: Proteins are HETERO polymer chains made up of:
An alphabet of 20 AMINO ACIDSAn alphabet of 20 AMINO ACIDS
The amino acids have The amino acids have different sizes and shapesdifferent sizes and shapes
and posses properties such as:and posses properties such as:
AcidicAcidic
BasicBasic
PolarPolar
AromaticAromatic
HydrophobicHydrophobic
Where does chemistry come into picture?
Amino acids in different
sizes
shapes
chemical properties
Hydrogen bonding capabilities of side chains
Types of interactions :
• Covalent
• Non-covalent
• Hydrogen bond
R1
H R2
H Rn
H
2N C
1α C ’ N C
1 α C’ N -------------------------- Cα C’ O H
Hα O Hα O H
αO
Primary
Types of Neighbours :
• Sequence
• Spatial
Secondary
Tertiary
Levels of Interactions in Proteins
Quaternary
Can we understand the rules of protein structure, folding and function?
Protein Folding Problem
Amino acid sequence –coded by genetic code
Folding of Proteins to unique three dimensional
structure-Code?
Incorporated in the sequence:
Landmarks in protein Folding :
Folded structure of a Protein-determined by
thermodynamics- Anfinsen-1973
Levinthal paradox- How is the speed of folding
matched with enormous number of conformational
search? (1968)
Nature selects sequences which have ability to fold
rapidly-Levinthal (1987)
Folding of the protein
Chain from random
to the native state
Energy Landscape View
Wolynes and coworkers
Protein Folding Problem
The information for folding of protein to unique three
dimensional structure is encoded in its sequence -Anfinsen
Geometry Optimization View: The super-secondary structure of proteins are
optimally packed
Maritan, Banavar, and co-workers
Development of Theoretical/Computational Methods
•Geometrical concepts G.N. Ramachandran
….and his Φ-ψ Map
•Energy considerations: Force field parameters, optimization,
Molecular Mechanics
Molecular Dynamics
•Difficulties in evaluating conformational entropy, Free-Energy
•Understanding the rules of structure and folding through
Protein-structure data analysis
Optimization involves
A wide range of interaction energy scales
Types of interactions order of energy
(Kcal/mole)
Covalent 100-150
Electrostatics 20-40
Hydrogen Bond 3-20
Van der Waals 0.1- 2
Hydrophobic Interaction (conformational
entropy)
Geometry of Side Chains:
Wide variation in Shapes and Sizes
Close packing should involve a subtle balance of
shape complimentarity and the energy of interactions
First principles to Protein structure
OR
Derive rules from the observed data
A few references
Conformational Analysis of proteins
1963 Ramachandran G.N., Ramakrishnan C. and Sasisekharan V. J Mol Biol., 7, 95-9.
1969 Scheraga, Calculation of polypeptide conformation. Harvey Lect. 63:99-138
Protein Structure Prediction
1974 Chou and Fasman, Prediction of protein conformation, Biochemistry;13:222-245
1975 Levitt and Warshal, Computer simulation of protein folding, Nature 253, 694
1977 McCammon.. Karplus, Dynamics of folded proteins, Nature, 267, 585
1999, Liwo, & Scheraga, Protein structure prediction by global optimization, PNAS, 96, 5482-5485
2001, Hassinen & Peräkylä, New energy terms for reduced protein modelsJ. Comput Chem
Protein potentials of mean force
1971,Pohl, Nature, 234, 277 1976,Tanaka and Scheraga, Macromolecules, 9, 142-159 1978,Warme and Morgan, J. Mol. Biol 118, 273-2871985, Miyazawa & Jernigan, Macromolecules, 18, 534 1990, Sippl, J. Mol. Biol. 213, 8191992, Jones et al
Books
1989,Fasman "Prediction of Protein Structure " 1994, Merz and le Grand "Protein Folding Problem & Tertiary Structure Prediction"1996,Sternberg "Protein structure prediction: A practical approach"2000, Webster "Protein structure prediction: Methods" 2001,Friesner "Protein Folding Problem & Tertiary Structure Prediction" 2002, Tsigelny "Protein structure prediction: A bioinformatic ...”
G. N. Ramachandran
….and his Φ-ψ Map
Geometrical Concept-reduction in conformational space
molecular mechanics force fieldmolecular mechanics force field
bonded interactionsbonded interactions
( ) ( )2
021 rrkrv bb −=
( ) ( )2
021 rrkrv bb −= ( ) ( )2
021 θθθ −= aa kv ( ) ( )( )0φφcos1φ −+= nkv dd
nonnon--bonded interactionsbonded interactions
( ) ( ) ( )
6
6
12
12
ij
ij
ij
ij
r
C
r
C
ijlj rv −= ( )ij
ji
r
ijc rv0πε4
=
Molecular Dynamics Simulations
�Starting point: The coordinates of all the atoms
�Define the classical equations of motion
�Set the parameters using force fields
�Raising the system to the selected temperature
�Time dependent integration
�Length of simulation
�Generation of the equilibrium trajectories
(coordinates of the final state)
�Analysis of the trajectory for properties
Equilibrium Simulations of proteins
Analysis of parameters
Time averaged structure and fluctuations
Root Mean square deviation (RMSD)
Residue-wise RMSD
Backbone, Side chains, Secondary structures, Surface residues, ….
Time dependent motions
(in terms of internal parameters- phi, psi, chi torsional angles)
Global motionsLoop movementsInter-domain movementsHinge bending
Correlated fluctuations
Between loops, between residues
Interaction with ligands, solvents
Protein-ligand Interactions: Hydrogen bondsHydrophobic pocketsInteractions
mediated through solvent molecules (water)
Biologically relevant information from simulations :
A case study of Ribonuclease family proteins
Functional Diversity of Ribonuclease-A Superfamily
(Common function: Cleavage of RNA )Protein Specificity Other Functions
Ribonuclease A Single stranded RNA (Pancreatic)BS-Rnase A Single and Double Stranded RNA Aspermatogenicity(Seminal fluid) DNA:RNA Hybrid Immunosuppression
Antiviral,Antitumor
Angiogenin Weaker than Rnase A Angiogenesis(Plasma, Tumor cells)
Eosinophil Proteins:Eosinophil-Derived-Neurotoxin Weaker than Rnase A Helminthotoxicity(EDN) & (not active on small substrates) (toxic to parasites)Eosinophil Cationic Protein Neurotoxicity(ECP) Cytotoxicity
Similarities among the Rnase-A Family Proteins
1.Sequence Homology:
2. 3-D Structure 3. Biological Function: Ribonuclease activity:
cleaves 3’-5’ Phosphodiester bond
1DYT_A RPPQFTRAQWFAIQHISLN------PPRCTIAMRAINNYRWRCKNQNTFLRTTFANVVNVCGN 57
1HI2_A KPPQFTWAQWFETQHINMT------SQQCTNAMQVINNYQRRCKNQNTFLLTTFANVVNVCGN 58
7RSA__ ---KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQ 60
2ANG_A --QDNSRYTHFLTQHYDAKPQGR-DDRYCESIMRRRGLTSP-CKDINTFIHGNKRSIKAICEN 59
. : * .** . * *: . ** ***: . .: :* :
1DYT_A QSIRCPHNRTLNNCHRSRFRVPLLHCDLINPGAQNISNCRYADRPGRRFYVVACDNRDPR-DSPRYPVVPVHLDTTI---- 133
1HI2_A PNMTCPSNKTRKNCHHSGSQVPLIHCNLTTPSPQNISNCRYAQTPANMFYIVACDNRDQRRDPPQYPVVPVHLDRII---- 135
7RSA__ KNVACKNGQT--NCYQSYSTMSITDCRETGSS--KYPNCAYKTTQANKHIIVACEG-------- NPYVPVHFDASV---- 124
2ANG_A K----NGNPHRENLRISKSSFQVTTCKLHGGS--PWPPCQYRATAGFRNVVVACENG--------- -LPVHLDQSIFRRP 123
. * * . : * . . * * . :***:. . :***:* :
1DYT- Eosinophil cationic Protein
1HI2- Eosinophil Derived Neurotoxin
7RSA- Ribonuclease A
2ANG- Angiogenin
Simulations and Analysis
RMSD Trajectories
RMSD as a function of simulation time
continuous line: w.r.t <MD>
broken line: w.r.t crystal structure
RMSD as a function of residue number
Features of Hydrogen Bonds
•Donor/Acceptor
•Secondary Structures
•Non-secondary structural hydrogen bonds
•Side-chain hydrogen bonds
•Dynamically stable
•Rearrangement during dynamics
Mapping of Intra-protein dynamically stable hydrogen bonds
d( D….A) < 3.2 A
Angle (H-D…..A) between 0 to +(-) 60deg
Sanjeev, Vishveshwara, JBSD, 2005
MRT as a function of Residue number
Maximum Residence Time (MRT)
A measure of the capacity to interact with water molecules
Bridging Water
Internal Water
Active-site Water
Different Types of Protein-Water Interactions
( From ECP Simulation)
OPEN FORMCC CLOSED FORM
Eosinophil Cationic Protein (ECP)
Alternate conformations found during the simulation. Different
patterns of Water molecules (grey) are seen
Comparison of invariant water positions in Rnase-A Family Proteins
Angiogenin
Ribonuclease-A EDNECP-A
ECP-B
• backbone N-H
• backbone =O
Side chain on the cartoon
MRT > 1000
A schematic representation of the network of hydrogen bonds (including the protein-ligand atoms and the water molecule) obtained from the simulation of Eosinophil Cationic Protein-CpG complex.
Simulations to understand protein folding and unfolding processes
Unfolding simulations are less time consuming than the folding ones
Unfolding simulations
The measured parameters in protein unfolding studies are
♦ RMSD of intermediate states from the native state
♦ transition states
♦ Ф values
♦ the structure of final state.
Identification of the unfolding transition state of CI 2 by MD simulation
• Chymotrypsin inhibitor 2 (CI 2) is a 83 residue protein
• Extensively studied
• A pseudo wild type protein (pwt) with E14→A and E15→A and 1.7Å resolution
Aijun Li and V. Daggett ,JMB 257 (1996) 412-429
Five simulations
♦ 298K (crystal structure)♦ 498K-(crystal structure) (2.2ns)
-(NMR-derived solution structure(3) (>1ns)
Analysis:
• RMSD value vs. simulation timeIdentification of transition state by
these plots
♦ Relative accessible surface area vs. simulation time
♦ Percentage of secondary and tertiary contacts vs. simulation time
♦ The H-bonds vs. simulation time
413.8165-701.0NMR3MD4(TS4)
234.2495-1001.2NMR3MD3(TS3)
413.74330-3351.0NMR1MD2(TS2)
433.35220-2252.2XTALMD1(TS1)
%Native H – bonds
Cα
RMSD (Å)
TS (ps)
SimulationTime (ns)
Startingstructure
Simulation
Summary of all the four different unfolding simulations at 498K and their transitions states
● By the analysis of these curves and observed changes in the structure of protein transition states were identified
● The transition state is partially structured● α-helix is weakened but partially intact and the β-sheet is totally
disrupted in transition state
RMSD Plots
The percentage of secondary and tertiary structure as a function of simulation time at 498K
Ф value Analysis
The experimental parameter for identifying the transition state is Ф value
ФF = ∆GT-U- ∆G’T-U / ∆GF-U- ∆G’F-U
= ∆ ∆GT-U / ∆ ∆G’ F-U
Where ∆GT-U → the free energy difference betweenthe transition and unfolded state,
∆GF-U → the free energy difference betweennative and unfolded state,
and ∆G’→ represent the mutated state.
The Ф value by the MD simulation is given by:
ФMD = NTS,wt – NTS,mut / NN,wt – NN,mut= ∆NTS / ∆NN
where NTS,wt = no. of van der Waals contacts in transition state in wild type
and NTS,mut = no. of van der Waals contacts in transition state in the mutated
proteinsimilarly NN represents the native state
To Summarize………
� The transition states in all four simulations are similar
� The calculated Ф values by MD agree with the experimentally measured Ф values(R = 0.94)
� The disruption of hydrophobic core and associated secondary structure is the rate determining step of the unfolding process V. Daggett et. al., JMB 257 (1996) 430-440
The effect of temperature on the pathway of unfolding of CI2
• Seven simulations at different temperatures for different time were carried out
(1) 298K (50ns), (2) 348K (80ns),(3) 373K (94ns), (4) 398K (40ns), (5) 448K (40ns), (6) 473K (20ns), (7) 498K (20ns)
Ryan Day et. al., JMB 322 (2002) 189-203,
The change in Cα RMSD (Å) values with simulation time
Lessons from multiple temperature simulations
� The unfolding pathway of CI2 is independent of temperature
� The global unfolding events are same in all simulations
� The average number of tertiary contacts in the unfolding transition state remain same (~172)
� The thermal denaturation of proteins is an activated process taking place on an energy landscape that is not grossly changed by elevated temperature. A. Fersht &V. Daggett Mol Cell Bio. 4(2003) 497-502
The protein folding problem can be viewed as three different problems:
1. defining the thermodynamic folding code
2. devising a good computational structure prediction algorithm
3. folding speed (Levinthal’s paradox) — the kineticquestion of how a protein can fold so fast
The protein folding problem: when will it be solved?Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera andVincent A VoelzCurrent Opinion in Structural Biology 2007, 17:342–346
2 Computational protein structure prediction:
Bioinformatics based (Knowledge based) - more successful
Physics based:- only a few attempts
1. The first milestone :Duan Y, Kollman PA: Pathways to a protein folding intermediateobserved in a 1-microsecond simulation in aqueous solution.Science 1998, 282:740-744.
2. IBM Blue Gene group folded the 20-residue Trp-cage peptide within 1 A using 92 ns of molecular dynamics.
Pitera JW, Swope W: Understanding folding and design:replica-exchange simulations of ‘‘Trp-cage’’ miniproteins.Proc Natl Acad Sci USA 2003, 100:7587-7592.
3. Folding@Home, a distributed grid computing system, folded the protein villin
Zagrovic B, Snow CD, Shirts MR, Pande VS: Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing. J Mol Biol 2002, 323:927-937
Details of Folding simulations designed mini-protein BBA5-
Vijay Pande et.al
• System: 23 residue designed mini, soluble, monomeric protein BBA5
• Ac-YRVPSYDFSRSDELAKLLRQHAG-NH2
• It contains all the three elementary units of secondary structure
• It folds in the absence of disulphide bonds or metal-binding centers
• Two mutants F8-W and V3-Y
Vijay Pande et. al., Nature 420 (2002): 102-106;
Structure of 23 residues designed mini protein BBA5
Simulation details
• Starting structure: fully extended conformer (φ = -135 and ψ = 135)
● Simulation time: Total 15,000 single-mutant folding simulations (20ns) at 278K and 298K
● Similarly 9000 and 8500 double-mutant folding simulations at 278K (20ns) and 338K (10ns)
● Computer used: 30,000 volunteer computers aroundthe world for several months (106 CPU days)
● The folding rate constant is given by : k = Nfolded / t* Ntotal
where Nfolded → no. of simulations that reached the folded state in time t out of Ntotal
Findings from the folding simulations
� In total 32,500 folding trajectories: β hairpin was observed in 1100 and α-helix was in 21,000
� In 9000 double mutant folding trajectories at 278K, 16 were folded after 20ns simulation
� The two state assumption is valid by thermodynamic data
� Following experimental data are in agreement with simulation:
♦ The helical structure in unfolded state, ♦ Fragment secondary structure propensity, ♦ Rate formation of helix, hairpin and ♦ Rate of folding
3. Folding speed and mechanism:
Robert L Baldwin, " understanding the mechanism of protein folding might lead to fast computational algorithms for predicting native structures from their amino acid sequences”
This has been a central challenge. To instruct a computer program to find a native state more efficiently than Monte Carlo or molecular dynamics, we need more. We need to know the microscopic folding routes.
To summarize the status of protein Simulations..
Problems addressedReliable force fields
Simulation time- Good enough for equilibrium properties of small proteins
Demonstration of folding in a few small peptides/proteins
Challenges aheadComfortable simulation of large proteins, assembles like ribosome, membrane proteins
Ab-initio folding simulations on a routine basis
Better understanding of the basic principles, which will enable reliable folding simulations
Combined QM/MD studies to investigate the processes involving changes in the covalent states
Comparison between the calculated Ф values for transition state by MD simulations (for 3different native states)and experimentally measured ФF values for 11 mutated residues