How Fast-Folding Proteins Fold

DOI: 10.1126/science.1208351, 517 (2011);334 Science

et al.Kresten Lindorff-LarsenHow Fast-Folding Proteins Fold

This copy is for your personal, non-commercial use only.

clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

here.following the guidelines

can be obtained byPermission to republish or repurpose articles or portions of articles

): July 23, 2012 www.sciencemag.org (this information is current as of

The following resources related to this article are available online at

http://www.sciencemag.org/content/334/6055/517.full.htmlversion of this article at:

including high-resolution figures, can be found in the onlineUpdated information and services,

http://www.sciencemag.org/content/suppl/2011/10/27/334.6055.517.DC1.html can be found at: Supporting Online Material

http://www.sciencemag.org/content/334/6055/517.full.html#relatedfound at:

can berelated to this article A list of selected additional articles on the Science Web sites

http://www.sciencemag.org/content/334/6055/517.full.html#ref-list-1, 9 of which can be accessed free:cites 57 articlesThis article

http://www.sciencemag.org/content/334/6055/517.full.html#related-urls4 articles hosted by HighWire Press; see:cited by This article has been

http://www.sciencemag.org/cgi/collection/biochemBiochemistry

subject collections:This article appears in the following

registered trademark of AAAS. is aScience2011 by the American Association for the Advancement of Science; all rights reserved. The title

CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience

on

July

23,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

http://oascentral.sciencemag.org/RealMedia/ads/click_lx.ads/sciencemag/cgi/reprint/L22/689219202/Top1/AAAS/PDF-Sigma-Science-120101/sh4971_Science_sponsor_logo_v5.raw/71304a2f5a56414e596e7341436b5a32?x

http://www.sciencemag.org/about/permissions.dtl

http://www.sciencemag.org/about/permissions.dtl

http://www.sciencemag.org/content/334/6055/517.full.html

http://www.sciencemag.org/content/334/6055/517.full.html#related

http://www.sciencemag.org/content/334/6055/517.full.html#ref-list-1

http://www.sciencemag.org/content/334/6055/517.full.html#related-urls

http://www.sciencemag.org/cgi/collection/biochem

http://www.sciencemag.org/

How Fast-Folding Proteins FoldKresten Lindorff-Larsen,1*† Stefano Piana,1*† Ron O. Dror,1 David E. Shaw1,2†

An outstanding challenge in the field of molecular biology has been to understand the processby which proteins fold into their characteristic three-dimensional structures. Here, we report theresults of atomic-level molecular dynamics simulations, over periods ranging between 100 msand 1 ms, that reveal a set of common principles underlying the folding of 12 structurally diverseproteins. In simulations conducted with a single physics-based energy function, the proteins,representing all three major structural classes, spontaneously and repeatedly fold to theirexperimentally determined native structures. Early in the folding process, the protein backboneadopts a nativelike topology while certain secondary structure elements and a small number ofnonlocal contacts form. In most cases, folding follows a single dominant route in which elementsof the native structure appear in an order highly correlated with their propensity to form in theunfolded state.

Protein folding is a process of molecularself-assembly during which a disorderedpolypeptide chain collapses to form a com-

pact and well-defined three-dimensional struc-ture. Hundreds of studies have been devoted tounderstanding the mechanisms underlying thisprocess, but experimentally characterizing thefull folding pathway for even a single protein—let alone for many proteins differing in size,topology, and stability—has proven extremelydifficult. Similarly, simulating the folding of asmall protein at an atomic level of detail is adaunting task. Both experimental and compu-tational studies have thus generally focused onone protein at a time, with such studies eachperformed under different conditions or withdifferent techniques. Possibly because of theresulting heterogeneity of the available data,numerous theories have been proposed to de-scribe protein folding and no consensus hasbeen reached on which of these theories, if any,is correct (1).

Our research group has developed a special-ized supercomputer, called Anton, which greatlyaccelerates the execution of atomistic moleculardynamics (MD) simulations (2, 3). In addition,we recently modified the CHARMM force fieldin an effort to make it more easily transferableamong different protein classes (4). Here, we havecombined these advances to study the foldingprocess of fast-folding proteins through equilib-rium MD simulations (2). We studied 12 proteindomains (5) that range in size from 10 to 80 aminoacid residues, contain no disulfide bonds or pros-thetic groups, and include members of all threemajor structural classes (a-helical, b sheet andmixed a/b). Of these 12 protein domains, 9 repre-sent the nine folds considered in a review of fast-folding proteins (6). Asmost of these nine proteinscontain only a helices, we also included two ad-

ditional a/b proteins and a stable b hairpin toincrease the structural diversity of the set of pro-teins examined.

In our simulations, all of which used a singleforce field (4) and included explicitly representedsolvent molecules, 11 of the 12 proteins foldedspontaneously to structures matching their exper-imentally determined native structures to atomic

resolution (Fig. 1). The native state of the 12thprotein, the Engrailed homeodomain, provedunstable in simulation. We were, however, ableto fold a different homeodomain (7) with thesame overall structure; the results reported belowpertain to this variant, rather than the Engrailedhomeodomain.

For all 12 proteins that folded in simulation,we were also able to perform simulations nearthe melting temperature, at which both foldingand unfolding could be observed repeatedly ina single, long equilibrium MD simulation. Foreach of the 12 proteins, we performed betweenone and four simulations, each between 100 msand 1 ms long, and observed a total of at least10 folding and 10 unfolding events. In total, wecollected ~8 ms of simulation, containing morethan 400 folding or unfolding events. For 8 ofthe 12 proteins, the most representative structureof the folded state fell within 2 Å root meansquare deviation (RMSD) of the experimentalstructure (Fig. 1). This is particularly notablegiven that the RMSD calculations included theflexible tail residues and that, in some cases,there was no experimental structure available

1D. E. Shaw Research, New York, NY 10036, USA. 2Centerfor Computational Biology and Bioinformatics, ColumbiaUniversity, New York, NY 10032, USA.

*These authors contributed equally to the manuscript.†To whom correspondence should be addressed. E-mail:[email protected] (D.E.S.); [email protected] (K.L.-L.); [email protected] (S.P.)

Fig. 1. Representative structures of the folded state observed in reversible folding simulations of 12proteins. For each protein, we show the folded structure obtained from simulation (blue) superimposed onthe experimentally determined structure (red), along with the total simulation time, the PDB entry of theexperimental structure, the Ca-RMSD (over all residues) between the two structures, and the folding time(obtained as the average lifetime in the unfolded state observed in the simulations). Each protein islabeled with a commonly used name, although in several cases, we studied mutants of the parent se-quence [amino acid sequences of the 12 proteins and simulation details are presented in (5)]. PDB entriesin italics indicate that the structure has not been determined for the simulated sequence and that, instead,we compare it with the structure of the closest homolog in the PDB. The calculated structure was obtainedby clustering the simulations (26) to avoid bias toward the experimentally determined structure.

www.sciencemag.org SCIENCE VOL 334 28 OCTOBER 2011 517

REPORTS

on

July

23,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


for the exact sequence that we simulated (we in-stead calculated the RMSD to the structure ofthe protein with the most similar sequence). Theproteins exhibiting the largest deviations fromtheir experimental structures (BBL, protein B,and the homeodomain) are all three-helix bun-dles; this finding hints at a minor residual force-field deficiency. It has been argued, however,that the native state for at least one of thesethree proteins may depend on experimental con-ditions (8); it is thus possible that these devia-tions might reflect genuine differences betweenthe protein’s structure at the simulated temper-ature and at the lower temperatures used forexperimental structure determination. Overall,comparison with available experimental data in-dicates that the force field provides a reasonabledescription of the structure, thermodynamics,and kinetics of the 12 proteins [see (5) for a moredetailed comparison], which affords some con-fidence in the accuracy of the folding mecha-nisms observed in simulation.

Among the many analyses that can be per-formed on this data set, we focus here on eluci-dating the general principles that underlie proteinfolding and do not discuss in detail the propertiesof each individual system. In particular, we haveused this data set to examine several importantand unresolved general questions (1): (i) What isthe general nature and order of events that lead tofolding? (ii) What role, if any, is played by theresidual structure in the unfolded state? (iii) Howmany distinct folding pathways are present, andhow different are they from one another? and (iv)Is there a free-energy barrier for folding, and whatis its magnitude?

As a first step, we partitioned all trajectoriesinto folded, unfolded, and transition-path seg-ments (5). Unfolding transitions were analyzedin reverse so that all transitions could be treatedas folding events. For each folding and unfoldingevent, we quantified the formation of the nativetopology (9, 10), secondary structure, and non-local native contacts along the transition path(Fig. 2). We found that the formation of a nativetopology and secondary structure begins earlierthan the formation of most nonlocal contacts.Whereas most contacts are formed late, a fewspecific key contacts are formed relatively earlyin the transition paths (5). In most cases, forma-tion of secondary structure appears to decreasethe solvent-exposed area of the protein (fig. S2),in line with experimental observations (1).

Analysis of the unfolded state observed inthe simulations reveals the presence of both na-tive and non-native secondary structure elements.On average, the 12 proteins contain 16% helicaland 5% sheet structure in the unfolded state (5).These secondary structure elements form tran-siently (partially or completely), but are typicallyonly marginally stable in the absence of the sta-bilizing tertiary interactions, and they persist fortens to hundreds of nanoseconds in the unfoldedstate (Fig. 3 and fig. S6). The propensity to formlocal nativelike structure in the unfolded state

correlates strongly with the order of formation oflocal nativelike structure along the transition path(Fig. 3). In particular, initiation sites for foldingare preferentially located in regions that have ahigh propensity to form native structure in theunfolded state (11). In helical proteins, these re-

gions often correspond to individual helices, andwe find that the heliceswith the highest stability inthe unfolded state generally form first during fold-ing (Fig. 3). These observations support a mech-anism for protein folding in which the formationof a subset of key long-range native contacts early

Fig. 2. Formation of topology, native contacts, and secondary structure during protein folding. (A) Thethree panels show the accumulation of native secondary structure, nonlocal native contacts, and nativetopology during a single folding event for a3D. Each of the three quantities was normalized such that theaverage value in the unfolded state was zero, and the average value in the folded state was one. Above thethree panels we show seven representative structures from this transition path, with the correspondingtime points shown with arrows. This analysis was repeated for each of the 24 folding and unfolding eventsobserved for this protein, and for each of these transitions, the relative orders of formation of secondarystructure, contacts, and topology were quantified by integration of these time series (with the resultingintegrals, corresponding to the area under the curves, here represented by the area of the red shading).High values of this integral thus correspond to early formation of the corresponding quantity during afolding event. (B) The 24 transitions of a3D in a scatter plot are represented, with each of the black pointscorresponding to the time series integral for a single folding event (unfolding events were analyzed inreverse). The red point corresponds to the folding event shown in (A), and the green point represents theaverage of the time series integrals over all 24 transitions (error bars represent SEM). (C) We repeated thisanalysis for 11 of the 12 proteins (chignolin was omitted because of its small size). Each point shows theaverage value over all folding and unfolding events observed for one protein [as described above for thegreen point in (B)]. Each point is labeled with the PDB code of the relevant protein (see also Fig. 1). Mostproteins fall below the diagonal in these plots, showing that topology and secondary structure developearlier than the full set of native contacts.

28 OCTOBER 2011 VOL 334 SCIENCE www.sciencemag.org518

REPORTS

on

July

23,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


in the folding process is sufficient to establisha nativelike topology and to stabilize the nativesecondary structure elements that are only tran-siently formed in the unfolded state (9).

To quantify the heterogeneity of the foldingprocess, we examined the order of formation ofstructural elements in the transition paths of the12 proteins. Each individual transition path is,of course, different from all the others at a suf-ficiently detailed level of resolution, but transi-tion paths where structural elements are formedin a similar order are typically defined as be-longing to a common “pathway.” Theory andprevious simulations suggest that protein foldingmay be a highly heterogeneous process, with mul-tiple such pathways each accounting for a smallfraction of the total flux (1, 12). There is experi-mental evidence for heterogeneous folding mech-anisms in only a few two-state folding proteins

(1, 13), but this could be attributed to the diffi-culty of distinguishing similar pathways in ex-periments. Indeed, at a coarser level, differentpathways may still share a large fraction of struc-tural features, and we define such pathways asbelonging to the same folding “route.” For eachof the 12 proteins, we address the questions ofhowmany folding pathways are present and howmany different routes these pathways represent.

As a first step, we determined for each pro-tein how many folding pathways are traversedthat are distinct in the sense that native inter-actions are formed in different orders and thatthe pathways do not interconvert on the transitionpath time scale (allowing individual transitionpaths to be robustly assigned to a specific path-way) (14, 15). In particular, we defined a metricto calculate the distances between two individ-ual transition paths and used these distances to

cluster the folding and unfolding transitions foreach protein into structurally distinct pathways(5). We find that for most proteins, the transitionpaths can be robustly assigned to two or threewell-defined clusters (5), which reveals the ex-istence of a small number of parallel pathways.

We then examined whether these parallel path-ways arise because of “structural noise” super-imposed on a single, well-defined folding route(16) or because the protein does in fact fold alongmultiple, distinct routes. To distinguish betweenthese two scenarios, we quantified the similarityof the different pathways (as defined in the pre-vious paragraph) by calculating the fraction ofthe native contacts they share at various inter-mediate points. We find that in most cases theorder of formation of the native structure is simi-lar in the different pathways (Fig. 3); for 9 out of11 proteins considered, the pathways belonging

Fig. 3. Order of nativestructure formation alongthe transition pathwayand the average distancefrom the native confor-mation in the unfoldedstate. The colored linesrepresent a quantity thatmeasures when an aminoacid residue adopts a na-tivelike structure (with asmall value indicatingearly formation); the dif-ferent colors representthe results for the differ-ent folding pathways thatwe obtained, as describedin the main text. The av-erage fraction of nativestructure in the unfoldedstate is shown by theblack lines. The positionsof helices (red) and sheets(blue) in the native stateare shown above eachgraph together with thelocation of proline resi-dues (green circles). Notethat proline residues areoften located at initiationsites; we speculate thatthis observation can be ex-plained by the fact thatprolinehasarestrictedcon-formational space avail-able and thus facilitatesthe local ordering of thepolypeptide backbone.

www.sciencemag.org SCIENCE VOL 334 28 OCTOBER 2011 519

REPORTS

on

July

23,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


to different clusters share more than 60% of thenative contacts formed at any given time duringthe folding process (fig. S5) (chignolin was ex-cluded from this analysis because of its smallnumber of contacts). Thus, for each of these nineproteins, the distinct pathways appear to share alargely common folding nucleus, and we sug-gest that they are best considered to be varia-tions of a single folding route. In these cases, weexpect that it would be difficult to distinguishthe different pathways using currently availableensemble experiments, as the pathwayswould besimilarly affected by most mutations or changesin temperature or solvent composition. Althoughthe exact number of pathways and routes deter-mined by our analysis is dependent on the de-tailed criteria used to categorize the foldingtransitions, the overall picture that emerges isone where folding is usually a relatively homoge-neous process in which individual structural ele-ments tend to form in a well-defined order (17).

The remaining two simulated proteins, NTL9and the protein G variant, clearly exhibit twostructurally distinct folding routes. Both are a/bproteins of moderate size, and the differencebetween the routes is a different order of for-mation of the b strands. (The third a/b protein inour set of proteins, BBA, which is only 20 aminoacid residues long and has only two b strands,folds via a single pathway.) In the case of theredesigned variant of protein G that we studied,the principal difference between the two routesis the order in which the two hairpins form. Thisobservation is in line with experimental resultson wild-type protein G and its redesigned NuG2variant, which share the same fold as the proteinG variant considered in this study (18). In par-ticular, in wild-type protein G, hairpin 2 foldsfirst, whereas in NuG2, hairpin 1 folds first. Theprotein G variant that we simulated (5), which isintermediate in sequence between wild-type pro-tein G and NuG2, populates both wild-type pro-tein G–like and NuG2-like pathways. Althoughmost of the proteins we considered fold with asingle folding nucleus that is shared among thedifferent pathways, our results for NTL9 and pro-tein G suggest that caution should be exercisedin generalizing this finding; larger proteins, par-ticularly those with b-sheet structure, may indeedbe characterized by multiple folding nuclei andtruly distinct folding routes (12).

Finally, we examined the thermodynamicsand kinetics of the folding process, and in par-ticular the existence and size of the free-energybarrier for folding. Some of the proteins we havesimulated have been suggested to fold in a down-hill fashion, defined by the absence of a distinctfree-energy barrier for folding. To explore suchissues, we first used a previously establishedmethod (2, 19) to project the folding free-energysurfaces along an optimized reaction coordinate.In all 12 cases, the application of this methodyielded folding free-energy barriers smaller than4.5 kBT (where kB is Boltzmann’s constant andT is temperature), consistent with the fact that

all these proteins are fast folders. For three pro-teins (BBL, protein B, and the homeodomain),we were unable to identify a free-energy barrierseparating the folded and unfolded states (anobservation that proved robust against changesin the parameterization of the reaction coordinate).The lack of a free-energy barrier in these casesmay, in principle, be due to an inability to prop-erly separate the folded and unfolded basins byusing a single reaction coordinate. The presenceof a substantial free-energy barrier, however,would also be expected to give rise to a sepa-ration of time scales between the folding processand relaxation within the folded and unfoldedstates. A calculation of the dynamical content(2, 5) shows that, for all proteins where the cal-culated barrier is smaller than 1.5 kBT, there islittle or no separation of time scales betweenoverall folding and faster local relaxation. Thisprovides support for the notion that the foldingrate of these proteins is not determined by asingle, well-defined free-energy barrier, at leastunder experimental conditions corresponding tothose used in our simulations. For these proteins,the time scales for the formation of individualstructural elements overlap with those for fold-ing, giving rise to more complex kinetic behaviorthat we do not expect to be satisfactorily describedby a single exponential relaxation.

In addition to providing information aboutthe height of the free-energy barrier, the analy-sis in the previous section identifies structureswhose formation appears to be rate-limiting forfolding. The structures that lie between the un-folded and folded states and have equal proba-bility to fold or unfold are in each case compactand nativelike; they contain 60 to 97% of thenative secondary structure, and their contactorder is 60 to 100% of that of the native state (5).Earlier work based on combining simulationsand experiments found that the transition stateensemble for folding has a contact order that is~70% that of the native state (20, 21); we spec-ulate that the slightly higher value found here(the average over the 12 proteins is 85%) may becaused by a Hammond shift due to the high tem-peratures at which the simulationswere performed.

The results presented here provide a unifiedpicture of the folding of 12 small proteins. Wefind that elements of local nativelike structure aretransiently formed in the unfolded state; the for-mation of a few additional key contacts providesfurther stabilization for these structural elementsand initiates the folding transition. In most cases,folding then proceeds along a single, dominantroute, where additional structural elements areformed in awell-defined sequence, with “optional”noise (16). For two proteins, however, we findclear evidence for heterogeneous folding mech-anisms with differing transition state “classes”(12). The ensemble of structures found on thefree-energy barrier has nativelike topologies withpartial formation of secondary structure and ter-tiary contacts, in line with conclusions drawnfrom experiments (1, 22, 23). Also notable is the

fact that a single force field was able to con-sistently fold a substantial number of proteins,spanning all three of the major structural classes,to their native states. The results of this ratherstringent test (24, 25) suggest that current mo-lecular mechanics force fields are sufficiently ac-curate to make long–time scale MD simulation apowerful tool for characterizing large conforma-tional changes in proteins.

References and Notes1. T. R. Sosnick, D. Barrick, Curr. Opin. Struct. Biol. 21,

12 (2011).2. D. E. Shaw et al., Science 330, 341 (2010).3. D. E. Shaw et al., Millisecond-scale molecular dynamics

simulations on Anton. Proceedings of the ACM/IEEEConference on Supercomputing (SC09), Portland, OR,14 to 20 November 2009 (ACM, New York, 2009);10.1145/1654059.1654099.

4. S. Piana, K. Lindorff-Larsen, D. E. Shaw, Biophys. J. 100,L47 (2011).

5. Materials and methods are available as supportingmaterial on Science Online.

6. J. Kubelka, J. Hofrichter, W. A. Eaton, Curr. Opin.Struct. Biol. 14, 76 (2004).

7. P. S. Shah et al., J. Mol. Biol. 372, 1 (2007).8. G. Settanni, A. R. Fersht, J. Mol. Biol. 387, 993

(2009).9. K. Lindorff-Larsen, P. Røgen, E. Paci, M. Vendruscolo,

C. M. Dobson, Trends Biochem. Sci. 30, 13 (2005).10. P. Røgen, J. Phys. Condens. Matter 17, S1523

(2005).11. K. Modig et al., FEBS Lett. 581, 4965 (2007).12. V. S. Pande, Grosberg AYu, T. Tanaka, D. S. Rokhsar,

Curr. Opin. Struct. Biol. 8, 68 (1998).13. C. F. Wright, K. Lindorff-Larsen, L. G. Randles, J. Clarke,

Nat. Struct. Biol. 10, 658 (2003).14. P. Lenz, S. S. Cho, P. G. Wolynes, Chem. Phys. Lett. 471,

310 (2009).15. B. C. Gin, J. P. Garrahan, P. L. Geissler, J. Mol. Biol. 392,

1303 (2009).16. S. W. Englander, L. Mayne, M. M. Krishna, Q. Rev.

Biophys. 40, 287 (2007).17. A. D. Pandit, A. Jha, K. F. Freed, T. R. Sosnick, J. Mol. Biol.

361, 755 (2006).18. B. Kuhlman, D. Baker, Curr. Opin. Struct. Biol. 14, 89

(2004).19. R. B. Best, G. Hummer, Proc. Natl. Acad. Sci. U.S.A. 102,

6732 (2005).20. E. Paci, K. Lindorff-Larsen, C. M. Dobson, M. Karplus,

M. Vendruscolo, J. Mol. Biol. 352, 495 (2005).21. T. R. Sosnick, Protein Sci. 17, 1308 (2008).22. A. R. Fersht, Proc. Natl. Acad. Sci. U.S.A. 92, 10869

(1995).23. K. Lindorff-Larsen, M. Vendruscolo, E. Paci, C. M. Dobson,

Nat. Struct. Mol. Biol. 11, 443 (2004).24. P. L. Freddolino, C. B. Harrison, Y. Liu, K. Schulten,

Nat. Phys. 6, 751 (2010).25. J. C. Faver et al., PLoS ONE 6, e18868 (2011).26. X. Daura et al., Angew. Chem. Int. Ed. 38, 236 (1999).Acknowledgments: We thank K. Palmo, B. Gregerson, and

J. Kiepeis for their input during the development of theCHARMM22* force field; P. Røgen for providing uswith software to calculate generalized Gauss integrals;J. Salmon and R. Dirks for developing and testingsimulation software; and R. Kastleman and M. Kirk foreditorial assistance.

Supporting Online Materialwww.sciencemag.org/cgi/content/full/334/6055/517/DC1Materials and MethodsFigs. S1 to S8Tables S1 to S3References (27–60)

13 May 2011; accepted 1 September 201110.1126/science.1208351

28 OCTOBER 2011 VOL 334 SCIENCE www.sciencemag.org520

REPORTS

on

July

23,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


Date post:	26-Oct-2014
Category:	Documents
Upload:	dandan953
View:	124 times
Download:	0 times

How Fast-Folding Proteins Fold

Documents