+ All Categories
Home > Documents > Three-body correlations in protein folding: the origin of cooperativity

Three-body correlations in protein folding: the origin of cooperativity

Date post: 22-Nov-2023
Category:
Upload: colbas
View: 0 times
Download: 0 times
Share this document with a friend
25
Physica A 307 (2002) 235 – 259 www.elsevier.com/locate/physa Three-body correlations in protein folding: the origin of cooperativity Ariel Fern andez a; b , Andr es Colubri b , R. Stephen Berry a ; a James Franck Institute, Department of Chemistry, The University of Chicago, 5735 South Ellis Avenue, Chicago, IL 60637-1403, USA b Instituto de Matem atica, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Cient cas y T ecnicas, Bah a Blanca 8000, Argentina Received 21 September 2001 Abstract The success of the protein folding process requires that the peptide chain nd a structure that ensures the survival of its intramolecular H-bonds. In this work, we identify and model how water is hindered from invading and destroying the intramolecular H-bonds: a three-body protective association establishes itself when a hydrophobic residue approaches a pair of residues held by an amide–carbonyl H-bond. This proximity disrupts the water structure surrounding the backbone H-bond, driving water molecules away so they cannot solvate the backbone. These three-body contributions often compensate thermodynamically for concurrent two-body hydrophobic–polar mismatches. A previously-developed theoretical method to generate folding pathways is extended to reveal the role of three-residue correlations in stabilizing the collapse-inducing folding nucleus; successful computer runs exhibit their formation and how they protect and scaold incipient secondary structure. c 2002 Elsevier Science B.V. All rights reserved. PACS: 87.15.He; 87.10.+e; 87.15.Da Keywords: Protein folding; Collapse-inducing nucleus; Ubiquitin; Folding topology 1. Introduction The importance of conformation-dependent local environments in shaping intramolec- ular interactions during protein folding has been emphasized recently [1– 6] and the modeling of such context-dependent environments, incorporating them in an algorithm to predict folding pathways has been carried out [7]. Describing the context dependence Corresponding author. Tel.: +1-773-702-7021; fax: +1-773-834-4049. E-mail address: [email protected] (R.S. Berry). 0378-4371/02/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved. PII: S0378-4371(01)00586-6
Transcript

Physica A 307 (2002) 235–259www.elsevier.com/locate/physa

Three-body correlations in protein folding:the origin of cooperativity

Ariel Fern&andeza;b, Andr&es Colubrib, R. Stephen Berrya ;∗aJames Franck Institute, Department of Chemistry, The University of Chicago,

5735 South Ellis Avenue, Chicago, IL 60637-1403, USAbInstituto de Matem*atica, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Cient*-.cas

y T*ecnicas, Bah*-a Blanca 8000, Argentina

Received 21 September 2001

Abstract

The success of the protein folding process requires that the peptide chain 0nd a structure thatensures the survival of its intramolecular H-bonds. In this work, we identify and model how wateris hindered from invading and destroying the intramolecular H-bonds: a three-body protectiveassociation establishes itself when a hydrophobic residue approaches a pair of residues held byan amide–carbonyl H-bond. This proximity disrupts the water structure surrounding the backboneH-bond, driving water molecules away so they cannot solvate the backbone. These three-bodycontributions often compensate thermodynamically for concurrent two-body hydrophobic–polarmismatches. A previously-developed theoretical method to generate folding pathways is extendedto reveal the role of three-residue correlations in stabilizing the collapse-inducing folding nucleus;successful computer runs exhibit their formation and how they protect and sca6old incipientsecondary structure. c© 2002 Elsevier Science B.V. All rights reserved.

PACS: 87.15.He; 87.10.+e; 87.15.Da

Keywords: Protein folding; Collapse-inducing nucleus; Ubiquitin; Folding topology

1. Introduction

The importance of conformation-dependent local environments in shaping intramolec-ular interactions during protein folding has been emphasized recently [1–6] and themodeling of such context-dependent environments, incorporating them in an algorithmto predict folding pathways has been carried out [7]. Describing the context dependence

∗ Corresponding author. Tel.: +1-773-702-7021; fax: +1-773-834-4049.E-mail address: [email protected] (R.S. Berry).

0378-4371/02/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved.PII: S 0378 -4371(01)00586 -6

236 A. Fern*andez et al. / Physica A 307 (2002) 235–259

of the intramolecular nonbonded energy contributions poses a challenge for the theo-retical description of the folding process. An intramolecular amide–carbonyl H-bond ise6ectively more stable when sequestered from water since water molecules typicallywin over the peptide backbone itself when they compete to form H-bonds.Consequently conformation-induced desolvation of an intramolecular H-bond actually

protects the local backbone from H-bonding to water, with a net e6ect of stabilizing theH-bond [1–5,7]. The adjacency of a hydrophobic residue to a pair of H-bonded residuesof the backbone creates a sort of structured bubble in the local solvent structure, raisingthe free energy barrier for water to solvate the peptide backbone. This picture has alsobeen adopted and validated in theoretical treatments of the exogenous stabilization ofhelices by addition of triDuoroethanol or other cosolvents [8]. Furthermore, as shown inthis work, this e6ect contributes in part to both cooperativity and the nonadditivity ofclustering forces [7]. Thus, a dual role is attributed to hydrophobic residues: initiallythey are able to act solely as clustering elements, but later on, they become alsoprotectors of H-bonds, promoting the development of secondary structure.This paper provides a coarse-grained, phenomenological approach, non-Hamiltonian

and not phase-volume-preserving, rather than a 0rst-principle, mechanical treatmentof these correlations. It seeks to demonstrate the predictive and explanatory powerof one kind of ab initio folding simulations—that is, simulations without input fromknown native structures. This is an extension of a method developed previously thatis described brieDy in Section 2. This paper incorporates some improvements andre0nements that are also described there.The speci0c thesis of this paper is that whenever the solvent is treated implicitly,

we must treat the intramolecular energy not merely as a set of two-body contributionsbut as a set of interactions that incorporate three-body terms arising from three-residuecorrelations. Such three-residue interactions appear to play an important role in sta-bilizing hydrophobic–polar (h–p) mismatches. A hydrophobic residue may stabilizea hydrogen bond between a “mismatched” pair by approaching one polar residue inthe hydrogen-bonded pair, driving water molecules out, thereby protecting the H-bondengaging that polar residue [7]. Also, the two-body mismatch between a hydrophobicand a polar residue, with the latter engaged in an H-bond, might be thermodynamicallyovercome by the net stabilizing e6ect on that H-bond induced by the proximity of thehydrophobic protective residue. Furthermore, these three-body e6ects will be shown tobe essential to the expediency and robustness of the folding process.Explicit modeling of local water structure and its interplay with di6erent conforma-

tions of the chain are currently a major computational challenge, particularly withina 0rst-principle folding algorithm [9–12]. This study is complementary to the explo-rations of water–protein interactions via explicit inclusion of water molecules. Here,we examine only how the residues of the protein may interact to change the environ-ment that contains the water. A full analysis of this issue would require a completemolecular description of how a change of the proximity of a third, neighboring residueraises the barrier—kinetic or thermodynamic or both—for water molecules to invadethe exposed backbone at the site of a H-bonded pair of residues. We have circum-vented the complexity of detailed modeling by the expedient of representing it as aconformation-dependent stabilization of the H-bond. This is done in two extreme ways,

A. Fern*andez et al. / Physica A 307 (2002) 235–259 237

Fig. 1. Schematic representation of the e6ective internal energy surface for a three-body correlation con-tributing to the overall intramolecular energy. Residues are i (polar), j (any) and k (hydrophobic). The0gure shows e6ective potential energy cross sections at di6erent values of the (i; j)-distance r(i; j). As r(i; j)becomes equal or smaller than the H-bond distance r∗, the two-body (k; i) potential becomes attractivebecause, as k approaches i, residue k desolvates the (i; j)-H-bond, while for r(i; j)¿r∗, the (k; i) contribu-tion reDects the h–p mismatch and is therefore repulsive.

one attributing the stabilization to thermodynamics and the other, to kinetics. In thethermodynamic formulation, the nonbonded enthalpies in the intramolecular pairwisepotential are rescaled when a three-body association forms, to account for the signif-icant lowering of the e6ective local dielectric [7] which we now rationalize as beinginduced by three-body associations. In the kinetic formulation, the rate at which therelevant dihedral angles may change is reduced, to model an increase in the barrier forconversion between the structured and unstructured states of the relevant segments.This situation is illustrated in the cartoon displayed in Fig. 1. In the thermodynamic

approach, which we emphasize in this report, the two-body e6ective energy contributionU (k; i) involving hydrophobic residue k and polar residue i is conditioned by thepresence of a third body, residue j. If the distance r(i; j) is equal to or less than thecritical distance r(i; j) = r∗ at which an amide–carbonyl H-bond between i and j isformed, the interresidue potential U (i; k) contains an attractive contribution, becauseof the strengthening e6ect on the H-bond; this is how we represent the way residuek “desolvates” the (i; j)-H-bond. On the other hand, if the distance r(i; j)¿r∗, thenU (i; k) becomes altogether repellent, reDecting the h–p mismatch between k and i.Thus there are two critical distances r∗ and r∗∗, with r∗∗ = r∗∗(r∗), such that whenr(i; k)¡r∗∗ and r(i; j)¡r∗, the energy drops into an attractive well.The role of three-body contributions in protecting intramolecular H-bonds plays a

second role as a re0nement for the way this theoretical approach represents how thesystem tolerates two-body mismatches of hydrophobic residues H-bonded to polar units[7]. Due to the imperfect chain condensation in the search for the “right topology” [13],many two-body h–p mismatches are expected (and have been found) to occur as thenucleus forms. This becomes apparent in our ab initio folding simulations: simulations

238 A. Fern*andez et al. / Physica A 307 (2002) 235–259

based solely on pairwise energy contributions are successful at 0nding and reproducingthe native fold only if they incorporate some tolerance to h–p mismatches [14,15].Tolerance has been a two-body ansatz essential for our model to fold proteins suchas BPTI successfully, proteins that exhibit a considerable extent of h–p mismatchingin their K-sheet motifs. A cooperative, three-body picture of the kind presented inthis work eliminates the need for this “tolerance ansatz”: with the algorithms usedin this work, h–p mismatches can become thermodynamically or kinetically favorableas three-body contributions are incorporated.Thus, the aim of this paper is fourfold:

(a) To use a folding algorithm incorporating context-dependent pairwise interactionsto identify the collapse-inducing nucleus in simple proteins exhibiting two-statekinetics and the residues participating in the formation of that nucleus. Suchresidues are predicted to be the “hot-spots” with respect to site-directed mutation;

(b) to compare our 0ndings with independent experimental characterizations of thenuclei obtained from site-directed mutagenesis and sequence homology compar-isons;

(c) to identify the hydrophobic residues and three-body correlations responsible forstabilizing the nuclei and overcoming h–p mismatches at the early (imperfect)stages of chain condensation;

(d) To extract from simulations of successful folding pathways those three-bodycorrelations in post-nucleation events responsible for stabilizing the kernels forformation of native secondary structures, especially those whose occurrence couldnot be predicted solely in terms of local pairwise associations.

2. Methods

Recent research [7,13–16] o6ers a way to combine local steric constraints and non-bonded interactions to describe the evolution of (�;�)-torsions of individual residues,to develop a coarse model of the folding process. This method, described in detail inRefs. [7] and [16] (apart from the modi0cations and extensions added here), providesaccess to the microsecond and millisecond timescale at the expense of losing somestructural resolution, and enables us to make structure predictions and identify foldingpathways. The basic tenets of the model are:

(a) The backbone (�;�) torsional motion is described by the evolving occupanciesof the basins of attraction in the Ramachandran map topography adopted by eachindividual residue. Thus, local torsional assignments change as residues performinterbasin hopping. Intrabasin “motion” enters only to de0ne the area and hencethe microcanonical entropy of each Ramachandran basin, and to allow one to infera geometry from a pattern of basin occupancies. Within this framework, two localtorsional isomers are “topologically equivalent” if they belong to the same basin.

(b) A particular state of the chain within this description is speci0ed as a pattern ofbasin occupancies, together with each residue’s character as hydrophobic or polar.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 239

Such patterns, called local topology matrices (LTM’s) [7,13–16] are subsequentlytranslated into explicit geometries to be interpreted in terms of standard structuralmotifs.

(c) The algorithm identi0es those nucleating residues whose basin occupancies allowthem to develop secondary and tertiary structure; in the present context, we focusespecially on three-body correlations that protect the H-bonds of the nucleus andthereby enable hydrophobic collapse.

(d) The LTM evolution is determined by the interbasin transitions whose (randomlychosen) rates are determined by the level of structural involvement of each residueat the given time. Thus, the mean interbasin hopping rate of a residue decreasesfrom 1011=s to 107=s as soon as its place in the LTM is part of a pattern compatiblewith a structural motif. On the other hand, hopping rates increase if topologicalpatterns are dismantled due to the formation of a 33% out-of-consensus criticalbubble in the LTM [7,13–16]. In previous work, the distribution from which wedrew the hopping frequency was arbitrarily shifted to a slower (Gaussian) rangeof rates whenever a group of residues fell into a pattern of basin occupanciesconsistent with secondary or tertiary structure. Likewise, the frequency distributionwas shifted to a higher range if 33% of the pattern disappeared. In this work, thehopping rates were determined by a far less arbitrary method based on a probabilitydistribution of the form exp(−QG=kT ) described below.

The pattern recognition is performed and its outcome is recorded periodically (ap-proximately every 70 ps) and displayed as a contact matrix (CM). This recording isbased on the topological compatibility of the pattern “read” in the LTM with a speci0cstructural motif. Four or more sequential residues occupying basin 1, for example, cor-respond to a bit of a �-strand, and four in basin 2, to an �-helix. The determination ofsecondary and tertiary structure from patterns, and the inference of a stable geometryfrom the LTM are described in detail in Refs. [7] and [16]. Then the iteration of threegeneric operations determines the LTM–CM dynamics. The 0rst operation is a random“Dipping” of dihedral angles among Ramachandran basins. The second is the patternrecognition operation, and third is a feedback operation prescribing the next rates ofDips, how pattern recognition will next be performed on the LTM and which residueschange their rate of interbasin hopping according to the information encoded in thelatest CM.This algorithm is contingent on a suitable geometric representation of the LTM

needed to recognize the emerging patterns. Here we introduce a change from our previ-ous method. Now a semiempirical intramolecular potential is introduced which governsthe interbasin hopping in the following way [7,15,16]. At each giveniteration and for each residue k, the program generates a probability for residue kto change its basin, P(k), determined as P(k) = exp[QF(k)=RT ], where QF(k)6 0gives a thermodynamic measure of the extent of structural involvement of residue kand is given as QF(k) = QU (k) − TQS(k), where QU (k) is the sum of changesof pairwise nonbonded energy contributions associated with (i; j)-contacts or anticon-tacts with i6 k6 j, whose energy content would be a6ected if k were to change itsRamachandran basin. The quantity QS(k) is the sum of conformational entropy changes

240 A. Fern*andez et al. / Physica A 307 (2002) 235–259

of backbone and side chains associated with forming the pattern whose stability wouldbe changed by a basin change in residue k. A free residue has QF = 0 and thereforemay change basin with each iteration until its backbone 0ts into a pattern that achievesthe stability of a structural motif, at which point QF ¡ 0 (unless the pattern is unsta-ble, in which case P(k) = 1). Typical values for P(k) are 10−4 and 10−8, occurring,respectively, as k becomes part of secondary or of tertiary structure [7,13–16].Since the expected number of iterations needed for residue k to change its basin is

estimated as P(k)−1, we infer that the more structurally engaged a residue is, the loweris the likelihood for it to change its Ramachandran basin (alter its local topology). Itsrate of interbasin hopping is determined at each iteration as P(k) × 1011 s−1, where1011 s−1 is the estimated rate of hopping for a free residue [7,13–16].The geometric realization of the topology, in turn, must rely on the potential, and

thus, representing an LTM is tantamount to 0nding the lowest-energy conformationwithin the constraints imposed by con0ning the backbone torsional coordinates to theassigned Ramachandran basins. Strictly, this is a simplifying assumption that avoidsthe question of whether or when the folding process passes through structures that arekinetically favored but not the optimal thermodynamic options from the previous struc-tures. This aspect of the method will be addressed in future studies. The potential hasstrong, short-range repulsions that assure that excluded volume constraints are satis0ed.To carry out this program, we have used two alternative kinds of algorithms. One

invokes thermodynamic stabilization to model the e6ect of three-body correlation to sta-bilize hydrogen-bonded pairs. The other treats the formation of three-body“protective” structures as a means of kinetic inhibition by water as the basis of thestabilization. In the former, we use a potential made up of both two-body nonbondedcontributions and conformation-dependent [7] terms that incorporate the three-bodyinteractions. The two-body nonbonded contributions should be regarded as zeroth-orderterms in relation to the sensitivity to solvent environments. Coulombic, dipole–dipoleand H-bonding terms are all sensitive to local solvent organization. This means thatwe must iteratively rescale each pairwise contribution U (t)

nb (i; j) → U (t+1)nb (i; j) (i¡ j),

where U (t)nb (i; j) = (i; j) is the pair contribution to the nonbonded potential at time t.

This rescaling is based on the extent of desolvation of the pair (i; j) at time t: If asingle hydrophobic residue (k) penetrates the desolvation realm of residue i (or j) attime t, the resulting local environment reDects a three-body contribution. For example,a hydrophobic residue k might approach a polar residue i which in turn is H-bondedwith residue j, not because the two-body (k; i)-interaction is favorable—it is not—butbecause, in approaching residue i, residue k stabilizes the H-bond that i is formingwith j, desolvating it enough to overcome the repulsive (k; i)-interaction. Analogousde0nitions hold for n-body terms with n¿ 3.

These contributions are subsumed in the rescaling of the dielectric-dependent, pair-wise terms concurrent with the rules prescribing the transition LTM(t) → LTM(t+1).Denoting by u(t)nb (i; j) a generic nonbonded term for residue pair (i; j) dependent on theextent of local desolvation at time t, and u(0)nb (i; j), the corresponding nonbonded term inbulk solvent, we get u(t)nb (i; j)=u

(0)nb (i; j)×M (i; j; t), where M (i; j; t)=[b(i; t)×b(j; t)]1=2

quanti0es the extent of desolvation for the pair (i; j) at time t, which, due to the

A. Fern*andez et al. / Physica A 307 (2002) 235–259 241

multiplicative nature of the nonbonded interactions, is the geometric mean of the num-ber of residues driving local desolvation for the individual residues i and j at time t. Thelocal desolvations b(i; t) and b(j; t) are de0ned as the number of hydrophobic residuesthat have penetrated the desolvation realms of i and j at time t. Thus, [b(i; t) − 1]denotes the number of hydrophobic units, n’s, such that |U (t−1)

nb (i; n)|¿ 2RT , that is,such that they interact meaningfully with residue i vis-a-vis the nonbonded potentialU (t−1)

nb . An identical de0nition applies to b(j; t). This computational ansatz models theenhancement of dielectric-dependent contributions as the extent of local desolvationincreases.In the kinetic approach, the rate of hopping for residue k is slowed down according to

the extent of burial within which the nonbonded interaction involving k is taking place.For instance, if a single hydrophobic residue approaches residue k, in turn engaged inan intramolecular H-bond, we rescale the probability P′(k)=�P(k), where � is 0xed at10−2, an assumption that we can justify a posteriori, as it leads to successful ab initiostructure predictions, while P(k) is the in-bulk zeroth-order probability of interbasinhopping. This lowers the interbasin hopping rate by two orders of magnitude. Thescaling factor decreases geometrically with the extent of burial, so for a residue engagedin a hydrogen bond and protected by two hydrophobic units, we get P′(k) = �2P(k).Thus the rate of basin hopping for a fully buried residue is (10−4×10−4)×1011 s−1 =103 s−1, in agreement with previous treatments based on pattern recognition algorithms[13–16]. At present we are not prepared yet to decide which of the two treatmentsof the three-body correlation is more realistic or what combination of the two e6ectswould best represent the folding of any particular protein. The two methods yieldsimilar results for the cases we have examined thus far.

3. Identifying the residues involved in forming the folding nucleus

A core problem in molecular biophysics is the identi0cation of particular site mu-tations that signi0cantly a6ect the overall folding rate of proteins [17,18]. This prob-lem presents a particular challenge when addressed from an ab initio perspective thatexcludes any input from the native structure, sequence homology or phylogeny. Froma theoretical standpoint, such ab initio inferences might be possible if we can iden-tify a nucleus that induces the hydrophobic collapse and forms when the systemcrosses its most signi0cant, rate-limiting kinetic barrier [19–22]. Under such condi-tions, site-directed mutations at the nuclear core residues have signi0cant e6ects onfolding rates [23].This scenario has been portrayed in terms of two-state kinetics attributed to the

folding of certain small globular proteins whose nuclei have been often identi0ed as“transition states” [20]. (Note the di6erence between this usage and that of the chem-istry vernacular, in which a persistent form is an “intermediate”, and a transient format the peak of a reaction path is a “transition state”.) This view is revealing, althoughthe reaction coordinate in such reactions made up of manifold asynchronous events, isprobably ill-de0ned [19] and the “two-stateness” of the reaction is an ensemble-averagepicture drawn on a heuristic free energy surface [24,25] with unknown degeneracies.

242 A. Fern*andez et al. / Physica A 307 (2002) 235–259

For systems represented as two-state folders, the so-called �-value analysis, mea-suring the relative changes in the stability of the nucleus, has been used to deter-mine the “hottest” or most sensitive mutation sites [20,23,26]. The �-value for aresidue is de0ned as QQG �==QQG, where QQG �= represents the free energy changein the transition state ensemble relative to the wild type and associated with makinga site- directed mutation, and QQG is the relative free energy change associated withthe overall folding process. Residues with high values (� ≈ 1) are presumed to beresponsible for folding rates because their activation energies (for their rate-controllingsteps) are sensitive to mutation. Useful as a qualitative index [26,27], the �-valueanalysis has limitations in terms of chemical kinetic interpretation: (a) it o6ers no wayto identify any reaction coordinate [24,25], and (b) it gives no indication whether themutation may be regarded as a perturbation on certain selected pathways and not onothers, or whether it alters the ensemble-average folding pathway altogether.From a theoretical standpoint, this prompts us to search for a dynamic parame-

ter associated with the role each residue plays in guiding the folding process. Thecoarse-grained basin view of the torsional evolution of the chain provides an approachto carry this out. Speci0cally, it exhibits the nucleation kinetics data needed to predictthe mutational hot spots and identify the collapse-triggering nucleus.A quantity accessible from our simulations of the folding process is �(t) = the

number of residues that perform a basin hopping during an interval around timet. This quantity gives us an estimate of the extent of structural Ductuations at anygiven stage of the folding process [7]. Fig. 2 displays the normalized values, �(t)=N ,averaged every 256 ps and over 22 reproducible runs at T = 308◦C along the most

Fig. 2. Time dependence of the extent of structural Ductuations during the folding of Ub (thick line) andBPTI (thin line), as measured by the normalized number �(t)=N , where �(t) is the number of chain unitschanging basin within their respective Ramachandran maps averaged over a 256 ps interval centered at timet and over 22 reproducible and successful runs, and N is the length of the chain. The other light curve isthat of chymotrypsin inhibitor (CI2).

A. Fern*andez et al. / Physica A 307 (2002) 235–259 243

favored coarsely-resolved folding pathways for each of three systems. These are bovinepancreatic trypsin inhibitor (BPTI, PDB accession code: 1PIT, N =58) [14,15], mam-malian ubiquitin (Ub, PDB accession code 1UBI, N = 76) [7,19] and chymotrypsininhibitor (CI2, PDB accession code 1COA, N=64) [23,28]. In all three cases, the mostreproducible folding pathway is a successful pathway leading to the native fold[7,13–16] and its simulation requires 6:11×106, 107 and 8:0×106 pattern-recognition-and-feedback iterations, respectively, for each of the three proteins. The simulatedredox solvent conditions for BPTI have been chosen so that disul0de bond formationin BPTI occurs at a rate that keeps the folding within the time window of interest(10−6–10−2 s). The simulated redox conditions allow disul0de bond reshuSing withinabout a tenth of a microsecond [13–16].In spite of the timescale compression, we can discern one drastic quenching of

structural Ductuations beginning at approximately at t∗ = 1:2 × 10−5 s for BPTI andt∗ = 7:4 × 10−5 s for Ub. A comparable value of t∗ ≈ 3:0 × 10−5 s was found forchymotrypsin inhibitor.These features indicate that in all of these three proteins, a single collapse-triggering

nucleus appears, whose formation is concurrent with the surmounting of a large ener-getic barrier since the Ductuations are considerable prior to the “quench time” t∗, andremain much smaller thereafter. These results are consistent with what has come to becalled a two-state picture and nucleation scenario [19,20].At this point we need to address a central question: what is the structure of the

nucleus? We address this issue by identifying the nucleus as the part of the structurethat essentially ceases to Ductuate after t = t∗, the point at which a drastic quenchingin structural Ductuations is observed. This point is characterized by a drop in �(t)=Nof the kind displayed in Fig. 2. To characterize this drop, we introduce a dynamicquantity associated with each individual residue along the chain, which we denote itsF-value. This quantity, a scaled time, indicates whether a residue is ordered before,concurrently, or after the formation of the collapse-triggering nucleus.For residue n, F(n) is de0ned as F(n) = t′(n)=t#, where t′(n) is the time it takes

for residue n to cease performing interbasin hopping within the folding time upperlimit of 10−1 s, and t# is the time at which �(t)=N stays within its lowest value afterthe initiation of the quenching process. As revealed by the curves in Fig. 1, t# ≈4:3 × 10−5 s for BPTI, t# ≈ 10−4 s for Ub, and the corresponding value for CI2 liesbetween these: t# ≈ 4:9× 10−5 s. The di6erence t# − t∗ is essentially the time requiredfor the collapse and formation of the nucleus; this interval is of the order of 10–40 �sfor the proteins studied in this work, suggesting substantial structural rearrangementsduring the process of quenching itself.The F(n) values for BPTI, CI2 and Ub are displayed for the time interval [0; t#] in

Figs. 3, 4 and 5, respectively. The results displayed in Figs. 2–5 represent averagesover 22 successful and reproducible runs. The extents of complete reproducibility ofthe results are: 42% (BPTI), 40% (CI2) and 40% (Ub). By complete reproducibilitywe mean that the LTM’s remain, step by step, within a Hamming distance of 1% fromeach other for the speci0ed percentage of the total number of runs.By intersecting the F(n)-plot with the threshold value F∗ = t∗=t#, we identify the

residues involved in the nucleus formation and are therefore expected to have large �

244 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 3. Averaged F-value as function of contour residue number 16 n6N for BPTI. The residues (thickcontour abscissas) for which their F-value lies below the sharp dash-line threshold F∗ = t∗=t# have foundtheir correct topology before or by the time when the hydrophobic collapse begins: they are part of thecollapse-triggering nucleus.

values. Such residues are those with low F-values, i.e., for which F(n)6F∗: Theseresidues .nd their stationary local topology before or by the time the quenchingof structural ?uctuations starts. Such residues must belong to the nucleus since nosecondary structure or intramolecular interaction can become stationary in isolation:they Ductuate until tertiary sca6olding comes into place to stabilize them [13–16]. Onthe other hand, those units remaining disordered at the time the nucleus is formed sat-isfy the inequality F(n)¿F∗. For example, Fig. 3 reveals that the disordered residuesin BPTI at the time where the nucleus is formed belong to the contour windows8–13 and 36–44 (thin lines in the chain-contour abscissa axis). Some of them (10–13,40–44) still Ductuate even after the “lock-in” time t= t# is reached. It is important torealize that at t#, the system will not, in general, have reached its native structure.At this point an observation is in order. The fact that some residues get topologi-

cally organized before the time of quenching of structural Ductuations does not implythe existence of a folding intermediate. Our simulations, revealing extensive structuralDuctuations until t = t∗, do not let us infer the existence of any such structure. Infact, no part of the protein chains studied in this work is structurally stabilized beforet= t∗. This is possible since each Ramachandran basin encompasses a range of valuesof the dihedral angles, implying that a residue could be stabilized with regards to itstopology and not be structurally stabilized [15,16]. An illustration: a single Ramachan-dran basin contains the canonical (�;�)-coordinate values for a hairpin K-turn (a zeropitch turn) as well as those for an �-helix turn (nonzero pitch). Thus, a residue withinsuch a basin is often found oscillating back and forth between one structural motif

A. Fern*andez et al. / Physica A 307 (2002) 235–259 245

Fig. 4. Characteristics of CI2. (A) Averaged F-value as a function of contour residue number 16 n6Nfor CI2. The mutationally hot critical residues marked by thick contour abscissas represent those residuesfor which F ≈ F∗, with F∗ de0ned within an uncertainty determined by the gap between the two dash-lineordinates. By contrast, notice that no such uncertainty arises in the case of BPTI (cf. Fig. 3). As in Fig. 3,residues topologically organized in the nucleus are those for which F(n)6F∗, and they represent mutationalhot spots. (B) Predicted �-values for CI2. The experimental values (shaded circles) were averaged overdi6erent drastic mutations (those causing folding free energy changes ¿ 1 kcal=mol, Refs. [23,28]) at eachsite. The theoretical values, drawn as a smooth curve but really a residue-by-residue representation, wasconstructed as described in the text.

and another while still remaining topologically unchanging since its basin assignmentremains invariant. This situation is dramatically illustrated by K-lactoglobulin folding,a protein whose �-helical content during the early stages of folding is known to befar larger than the extent of helicity in its native fold [15,16]. This protein containsseveral residues which become topologically invariant while their local structure keepschanging between �-helix turn and K-hairpin (and, actually, random coil as well).Of the residues engaged in the nucleus formation (thick segments of the abscis-

sas in Figs. 3, 4A and 5), those organized at F(n)≈F∗ (thickest line segments inFig. 4A) are predicted to have � ≈ 1, since they are the ones most diVcult to engagein the nucleus formation while participating in triggering the hydrophobic collapse.Also, residues with F�F∗ are expected to have �-values close to unity since theystart the nucleus formation (cf. Fig. 4B).One can make quantitative predictions of the expected �-values from the F-values

as follows. The �-plot was obtained by mapping the F-value data given in Fig. 4Aaccording to the tenets of nucleation–condensation theory. The mapping is given by�=2=[1+exp(−g)]−1=[1− exp(−g)]=[1+exp(−g)], with g=[F |F−F∗|]−1. Theseexpressions were obtained from typical input–output relations from pattern recognitiontheory [28] of critical nucleation with a single “organizational susceptibility” 1=F , andDuctuation-measuring “temperature” |F − F∗|. Readily-ordered residues have small F

246 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 5. F-value as function of contour residue number (n) for Ub. The thick line contour segments alongthe abscissas indicate residues engaged in the collapse-inducing nucleus. As in Fig. 4, an uncertainty in thedetermination of the critical value F∗ applies.

and high �-values; as F approaches the critical value F ≈ F∗, the �-value approachesits maximal value � ≈ 1. High F-values (relative to F∗) indicate late-organizingresidues, corresponding thus to relatively low �-values.Our predictions for CI2, an extensively studied two-state protein from the perspective

of �-value analysis [23,29] are satisfactorily validated, as shown in Fig. 4B. Residues[15–17], and 49, for which F(n) ≈ F∗, should have very high �-values (� ≈ 1), whilethe 1–14 region containing the �-helix and the 56–64 region, containing the terminalK-strands, should have the lowest. Furthermore, certain core residues within the 21–47region, whose �-values are diVcult to determine [23], are clearly discernible in theF(n)-plot displayed in Figs. 4A and B. Two such residues, 21 and 22, are not engagedin the nucleus since F(21)¿F∗, F(22)¿F∗. Therefore their �-values should beexpected to be low [23,29].

4. The role of three-body correlations in stabilizing the folding nucleus

4.1. BPTI

We turn 0rst to BPTI. The procedure just described leads to the backbone conforma-tion schematically displayed in Fig. 6, when a geometric interpretation is given to the

A. Fern*andez et al. / Physica A 307 (2002) 235–259 247

Fig. 6. Characteristics of BPTI. (A) Solid ribbon representation of the optimized BPTI backbone conformationsubject to the constraints imposed by the Ramachandran basins chosen at time t= t∗. The structure representsthe nucleus conformation. (B) Caricature of three-body interactions stabilizing the K-sheet motif in thenucleus of BPTI. Hydrophobic residues are represented as open circles, polar residues as black circles andamphiphilic units, as shaded circles. The proximity of two hydrophobic residues, Ile 19 and Val 34, andthe Phe 45 (F ≈F∗) stabilizes the Arg 20–Phe 33 H-bonds enough to compensate for the unfavorabletwo-body hydrophobic–polar mismatches. The solvation hemisphere of a residue has radius ca. 7 WA (cf.Ref. [14]) and is limited by a plane orthogonal to the H-bond and passing through the �-carbon of theresidue within the virtual-bond backbone representation.

248 A. Fern*andez et al. / Physica A 307 (2002) 235–259

topological representation of the folded chain [7,15,16]. As we can see, this snapshotreveals the basic features con0rmed by site-directed mutagenesis studies [30,31]; thatis, the proto-K-sheet pattern buttressed by a single backbone H-bond (Arg 20–Phe 33)inducing the formation of the native (14; 38) disul0de bond and the organized coreresidues in the 15–36 region. That our identi0ed core region is directly involved in thefast formation and stabilization of the (14; 38) disul0de bond seems to be an establishedexperimental fact [13,14,30,31], although the origin of the formation of the structuralbuttressing remains unexplained. On the other hand, the nucleus has disorganized (non-native) loop regions 9–14, 37–44, retarding the formation of the two additional nativedisul0de bonds (30; 51) and (5; 55) [13,14,30,31].The native helices formed within the microsecond stages at the extremities of the

molecule are able to survive well within the submillisecond timescale through nonnativetertiary sca6olding (cf. Fig. 6). Such tertiary interactions are progressively replaced bythe native contacts in a sort of helix-to-K-sheet slippage produced by movements inthe 8–13 and 37–44 residues as soon as the helices move to their native positions andformation of the two additional disul0de bonds is allowed. This is reminiscent of theconversion of �-helical segments to native K-sheet in K-lactoglobulin [13,16].This analysis together with an examination of the native fold leads us to infer which

signi0cant structural changes occur in the submillisecond range after the nucleus hasbeen formed. They consist of the docking of the terminal 48–56 helix towards theopposite side of the K-sheet in order to form the disul0de bonds with the other end ofthe molecule and with the hairpin loop of the K-sheet. These changes are concurrentwith a minor docking of the 1–8 extremity of the molecule.Now we can ask a central question: What stabilizes the folding nucleus of BPTI? No

convincing answer to this question could be obtained from a computation based onlyon two-body interactions. While the formation of the disul0de bond (14; 38) could bethe trigger for hydrophobic collapse [13,14,30,31], the nuclear (15–36) K-sheet whichinduces the disul0de bridge formation has four hydrophobic-polar mismatches alongits purported backbone H-bonds: Arg 20–Phe 33, Phe 22–Gln 31, Asn 24–Leu 29and Asn 24–Ala 27. If we adopt the intramolecular potential that allowed us to repro-duce expeditious folding pathways for BPTI [13,14], and remove from it the n-bodycontributions with n¿ 3 (Methods), we get a free energy for the nucleus which lies2:80 kcal=mol above that of the random coil!The many-body contributions, previously accounted for by introducing an “error

tolerance” and the implicit treatment of the solvent in the intramolecular potentials, arenecessary to stabilize the nucleus. They have been incorporated here in the simulationof BPTI folding by rescaling the nonbonded dielectric-dependent terms according to theextent of desolvation assigned to each residue at each given time [7]. Thus, three-bodye6ects on the K-sheet occur wherever hydrophobic residues desolvate H-bonds. TheArg 20–Phe 33 H-bond, the buttress of the entire proto-K-sheet, is partially—albeitinsuVciently—stabilized when it encounters a third body, be it Ile 19 or Val 34.These three-body contributions are shown in Fig. 6B. The hydrophobic units alternatingbetween the H-bonds e6ectively decrease the local dielectric surrounding the H-bondsin the treatment adopted here: the rescaled nonbonded energy is 3:46 kcal=mol lowerthan its in-bulk (zeroth-order scaled) counterpart.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 249

Nevertheless, these three-body contributions by themselves are not suVcient to pro-vide wholesale stabilization of the nuclear K-sheet. When they are included and thenonbonded energy is rescaled accordingly, the free energy for the K-sheet lies only0:66 kcal=mol below that of the random coil. An additional three-body interaction isneeded; this is provided by the rapid approach of the nuclear Phe 45 (� ≈ 1, cf.Fig. 3). This residue penetrates the solvation hemisphere of Arg 20, driving wa-ter molecules out, at just about the critical time F(45) ≈ F∗, when the snapshotin Fig. 6 was taken. As caricatured in Fig. 6B, Phe 45 is essential to stabilize andprotect the Arg 20–Phe 33 H-bond, whose in-bulk (zeroth-order solvent-scaled)lifetime is almost t∗. While the two-body contribution associated with Phe 45 ap-proaching Arg 20 is unfavorable, being a h–p mismatch, the three-body (20; 33; 45)-contribution lowers the free energy of the K-sheet by an additional 1:1 kcal=mol inthe way we have represented this three-body correlation as a thermodynamicstabilization.

4.2. Ubiquitin

Regarding Ub, the structure and H-bond pattern of its nucleus are elusive whenprobed via kinetic deuterium=hydrogen (D=H) amide isotope e6ect measurements [19].While the absence of nuclear protection of amide protons suggests that the moleculeforms most of its secondary structure after the rate-limiting step [19,32], an alternativeexplanation is that the H-bonds occur in K-sheets where they are more solvent-exposedthan those on the hydrophobic domains of an amphipathic helix.By inferring a geometric structure from the occupancies of the Ramachandran basins,

we 0nd that Ub at t = t∗ is a predominantly K-sheet nucleus, displayed in Fig. 7A.This imperfect K-sheet system has only two H-bonds: one of them (Arg 42–Leu 69)lies in the antiparallel K-sheet system involving residues 41–50 and 63–72. (Notice thenonnative elongated shape of the terminal K strand.) The other is a Dickering one (Gln2–Glu 16) in the K-sheet involving residues 2–16. Thus, as Figs. 5 and 7A reveal, thenucleus consists of three parts: (a) a loose hydrophobic K-sheet pattern with scant butstrong H-bonding (due to the desolvation induced by the proximity of Val 26) and anorganized (44–55) looped region; (b) a single �-helix turn kernel involving residues26–30 with no H-bonds and (c) tertiary sca6olding contacts between the putative helixregion (Val 26) and the K-sheet pattern (Arg 42). The loop residues in the region(57–65) are predicted to be the last to organize, with their ordering taking even longerthan t#. This, however, does not preclude the formation of the enthalpically delicateand entropically costly parallel K-sheet engaging the two extremities of the molecule.The length of the loop involved in the docking of the terminal K-strand is enoughto keep the entropy and free energy cost low enough to allow the parallel sheet toform.Examination of Fig. 5 allows us to infer which residues will be hot spots vis-a-vis

site mutations for Ub. The highest �-value residues, here identi0ed with the F ≈ F∗

residues, are predicted to be Ile 3, Leu 15, Val 26, Ile 30, Arg 42, Arg 54 andLeu 67. These residues 0t into the nucleus only with diVculty, and yet their orga-nization is required for the hydrophobic collapse. This result may be compared with

250 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 7. Characteristics of Ubiquitin (Ub). (A) Solid ribbon representation of the backbone conformation forthe predicted collapse-inducing nucleus of Ub. (B) Caricature of a three-body correlation contributing to thestabilization of the H-bond pattern of the Ub nucleus and involving the nonconserved residue Val 26.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 251

the sequence-conservation analysis of Michnick and Shakhnovich [33], based on thehypothesis that conserved residues in a Ub superfamily are essential to warrant expe-ditious folding and thereby must be identi0ed as core nuclear residues or hot muta-genesis spots. Thus, they suggest that the residues involved in the folding nucleus ofUb are the seven residues (3, 5, 15, 17, 30, 67 and 69) conserved among homologs.These two predictions should be tested by site-directed mutagenesis experiments. Allwe can state now is that four of our seven identi0ed hot spots are in that list ofconserved residues (cf. Ref. [33]). They are Ile 3, Leu 15, Ile 30 and Leu 67. Onthe other hand, all conserved residues belong to our predicted nucleus, as indicated inFig. 5, and therefore, have relatively large �-values. The crucial residues to investi-gate experimentally are thus Val 26, Arg 42 and Arg 54, predicted to be hot spotsin our analysis but not in that of Michnick and Shakhnovich. The residue Val 26, aburied residue in the native fold, is responsible for creating a hydrophobic environmentneeded to strengthen the sole stable H-bond in the folding nucleus, as explained below.Its importance as stabilizer becomes apparent only when three-body contributions areincluded.Our overall prediction of the structure of the Ub nucleus helps us explain the

small D=H isotopic e6ects found for this protein [19]. The collapse-inducing topology,being predominantly a complex K-sheet, does not need to protect its H-bonds to thesame extent as an �-helix, with its partially desolvated backbone, must do. Thus, animperfect condensation of hydrophobic residues seems to occur for Ub in its foldingrate-determining step, while its native H-bonding pattern (protected within a desolvatedenvironment) is completed during the subsequent energetically downhill search for thenative structure [32,34]. This view agrees with recent experimental work on predom-inantly K-sheet proteins [35]. Furthermore, it reveals that the formation of the helicalregion in this protein is not governed by local propensities: it is a context-dependentevent requiring the prior formation of the nucleus with tertiary buttressing of incipientsecondary structure [7].The role of the nonconserved residue Val 26 regarding H-bonding in the nucleus

deserves special attention. This residue is not engaged in pairwise nonbonded interac-tions of the nucleus. However when it moves to its particular location with respect tothe Arg 42 ribbon twist (Figs. 7A and B), it desolvates the hitherto Dickering H-bondin the central K-sheet motif, thus stabilizing that motif. This three-body correlation isillustrated in Fig. 7B. The proximity of Val 26–Arg 42 (�-carbon distance ≈ 7:2 WA)constitutes a three-body interaction in which the h–p mismatch is thermodynamicallycompensated by the Val 26 desolvating the Arg 42–Leu 69 H-bond (actually raising thekinetic barrier for exposed backbone solvation). The Val 26 next to Arg 42 strengthensthe carbonyl–amide H-bond Arg 4–Leu 69, reducing the free energy by 2:8 kcal=mol.Changing the hydrophobic environment induced by Val 26, for example by mutatingVal 26–Ala 26, would destabilize the sole H-bond which buttresses the nuclear K-sheetwith respect to invasion by water, thereby preventing the hydrophobic collapse. Thisthree-body sca6olding e6ect involving Val 26 is unrecognizable in other approaches tohot spot prediction, such as that introduced in Ref. [36], which singles out hot spotsas residues involved in the strongest pairwise contacts with the least entropic cost(i.e., F�F∗).

252 A. Fern*andez et al. / Physica A 307 (2002) 235–259

5. Post-nucleation events in Ubiquitin folding

This section examines post-nucleation events in Ub folding, focusing on three-bodycorrelations responsible for protecting the H-bonds of post-nucleation structure. Asnoted in Ref. [19], the native (22–35) �-helix is not part of the nucleus. The highexposure of amide protons in the transition state argues against that helix being presentat that stage of folding. Our simulations fully concur with this view, as indicatedin Section 4. Furthermore, predictions based solely on local propensities and basedon the AGADIR program give at most 3% probability to the existence of thishelix [37] (Fig. 8), while propensities for helix formation in “incorrect” (nonnative)regions are apparent. This clearly suggests a context-dependent propensity induced bythe formation of the nucleus. The purpose of this section is elucidating the origin ofthis cooperative e6ect.In view of this, we ask: How does the nucleus trigger the formation of the helix? To

answer this question, we monitored the successful reproducible runs for Ub at timest ¿ t∗. The four snapshots displayed in Figs. 9A–D represent CM’s (contact matrices)de0ned so that pair (i; j) forms a contact if r(i; j)¡ 8 WA). These were taken at times84, 92, 100 �s= t# and 1 ms, respectively. The CM representation has proven itself asa useful visualization of the three-body correlations. (We introduce a complementaryvisualization device below.) By the time the structural Ductuations have essentiallyceased (t = t#), the helix is fully formed and stabilized. This goal is reached via aconsiderable distortion of the nucleus which, as shown below, must act as a temporarystabilizer until the native cooperativity is established.As we can see from Fig. 9A, the hot-spot residue Ile3 becomes engaged in a

three-body correlation as it approaches the polar residue Asn 25, an unfavorable

Fig. 8. Percentage helicity in the Ub chain based on local propensities, as evaluated using the AGADIRprogram [37].

A. Fern*andez et al. / Physica A 307 (2002) 235–259 253

two-body interaction (the actual distance is 5:4 WA) compensated by the strengtheninge6ect Ile 3 has on the helical H-bond Thr 22–Asp 25. The rationale for this strength-ening is, as indicated before, that the solvation of the local exposed backbone becomeskinetically costly because of the solvent organization shaping the cavity around Ile 3.This three-body correlation contributes to the formation of the native helix by protect-ing the initial turn of the helix which can subsequently form through the usual kineticavalanche mechanism [38].Thus, already by t = 84 �s (cf. Fig. 2), the native (22–35) helix is fully formed,

although its protection at that stage requires the gross distortion of what later becomesthe native (1–16) K-sheet. This protection becomes even more pronounced at t=92 �s,with further deformation of the K-sheet (Fig. 9B). Now three hydrophobic residues.participate in the correlations that protect the “primer” Thr 22–Asn 25 H-bond: Ile 3,Phe 4, Val 5. All three approach Asn 25, which would correspond to h–p mismatchesif they were viewed as two-body contributions. The �-carbon distances are 5.5, 6.15and 6:06 WA, respectively. This H-bond protection is even reinforced by the engagingof Leu 67 (Fig. 9B) which joins the three-body correlation Thr 22–Asn 25–Leu 67,again, to overcome the h–p mismatch Asn 25–Leu 67 thermodynamically. Within theparametrization and potential-rescaling ansatz adopted in our simulations, this massivedesolvation of the single Thr 22–Asn 25 H-bond makes its relative enthalpy changedecrease by a factor of 2 with respect to its in-bulk value: QQH ≈ −2:8 kcal=mol.By the time the Ductuations have decreased to their stabilized lowest value when

t = 100 �s, the original native K-sheet is restored (Fig. 9C) and Met 1 takes on atransient role in the helix stabilization, now involving the three-body correlation Met1–Thr 22–Asn 25. This structure gets further re0ned (Fig. 9D), so we see that at 1 ms,Leu 69 acts as a protector of the 0rst helical turn through the three-body correlation Leu69–Thr 22–Asn 25. Still within the time frame 1 to 10 ms, structural rearrangementsoccur so that 0nally the helix becomes extensively protected through native tertiarysca6olding now involving all the hydrophobic residues within the 45–60 region. TheHamming distance of the CM displayed in Fig. 10A (obtained at 10 ms) and that ofthe native PDB fold is 1.09%, thus indicating a successful simulation of the foldingprocess. A similar level of accuracy (1.06%) is achieved when using the kinetic versionof the algorithm which treats three-body correlations by slowing down the interbasinDipping rates of protected residues (Fig. 10B).In order to visualize the entire folding history of Ub at the topological level with

its asynchronous occurrence of folding events, we have produced a coarse-grainedrepresentation of the LTM evolution. In Fig. 11A, the folding history is resolved at840 ns intervals for just the time window that captures the folding nucleation event,from t=0 to approximately 20 �s. In this 0gure, the chain topologies are coarse grainedby dividing the N = 76 chain in groups of three residues. (We have of course alsomade such diagrams with all residues described.) Each basin is characterized by acolor: blue denotes Basin 1, the basin containing the local conformation associatedwith the K-strand; red represents Basin 2, the basin required for the �-helix turn; whiteindicates the basin for the left-handed helix or, in Gly, also the fourth basin absent inother Ramachandran topographies. Black rectangles are those engaged in stabilizing,three-body correlations. Each region is colored according to the dominant color within

254 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 9. (A–D) Four contact maps of a successful simulation of Ub along its most favored and reproduciblefolding pathway, representing post-nucleation con0gurations obtained respectively at 84, 92, 100 �s and 1 ms.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 255

Fig. 9. Continued.

256 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 10. Contact maps (CMs) of the time evolution of Ubiquitin. A. CM of Ub along the most successfuland reproducible run taken at 10 ms derived by the thermodynamically-based algorithm, which lowers thefree energy of the hydrogen bond when it forms a third-body contact. B. CM generated at t = 10 msby the kinetically controlled algorithm which slows the interbasin hopping induced by the protection ofH-bonds brought about by three-body correlations. C. CM for Ubiquitin at long times, generated by boththermodynamic and kinetic algorithms; this map is indistinguishable from that of the native protein.

the coarse-grained representation adopted. The 0nal pattern is identical to that of thenative PDB structure. The formation of the nucleus and the asynchronous organizationof the various regions of the chain recapitulate the full history displayed in Fig. 5.The local propensities towards helicity shown in Fig. 8 are apparent in the full foldingpicture shown in Fig. 11A. Thus, the early formation of nonnative but locally-dictatedhelix is apparent. Fig. 11A also reveals the inDuence of the context in shaping thenative (22–35)-helix and in dismantling the nonnative helical structure.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 257

Fig. 11. Maps of the evolution of folding of Ubiquitin. (A) The simulated folding history of Ub resolvedat 840 ns intervals, for the time window that captures folding nucleation. Chain topologies are exhibited bya “modulo Ramachandran basin” representation of the local conformation. The contour variable, designatingthe residue location along the chain, has been coarse grained in groups of three residues. Blue color denotesBasin 1, containing the extended conformation associated with a K-strand; red represents Basin 2, compatiblewith an �-helix turn, while the white indicates Basin 3, for a left-handed helix or, in Gly, also the fourthbasin, unavailable for other residues. Each region is colored according to the dominant color within thecoarse-grained representation adopted. The 0nal topology is identical to that of the native structure depositedin the PDB. The formation of the nucleus and the asynchronous organization of the di6erent regions of thechain can be readily seen. The black regions contain hydrophobic residues engaged in three-body correlationsprotecting the H-bonds. (B) The time evolution of the Ub chain topology as generated by the kineticallycontrolled algorithm. The representational conventions of Fig. 11A are followed.

258 A. Fern*andez et al. / Physica A 307 (2002) 235–259

Fig. 11A has been constructed from the thermodynamic model of three-residue pro-tection. A similar history, but based on the kinetic treatment of protection by theslowing of solvent attacks on intramolecular H-bonds is shown in Fig. 11B. This algo-rithm yields a somewhat sharper transformation from the unstructured to the nucleatedform and a more persistent protection of H-bonds through three-body correlations. The0nal pattern shown in the top line of Fig. 11B is within a Hamming distance of 1.4%from that of the native structure. Further investigation will allow us to make more spe-ci0c comparisons of the two algorithms and to 0nd experimental tests to distinguishthe roles of both modes of stabilization.

6. Conclusions

This work reveals that a necessary condition for the hydrophobic collapse of atwo-state protein is the achievement of a topology that allows for the desolvation—andthereby protection—of its intramolecular H-bonds. Thus, 0nding the “right topology”represents an endogenous means of changing the local structure of the solvent, creatinga protective hole in the water to inhibit the solvation of locally exposed backbone andthus e6ectively stabilize the intramolecular H-bonds.Thus, a necessary aspect of stabilization of the collapse-inducing nucleus is the

establishment of three-body correlations that put hydrophobic residues into positionsthat desolvate intramolecular H-bonds. Such e6ects have been shown to overcome thedestabilization produced by h–p mismatches that occur frequently during the phaseof imperfect chain condensation and would by themselves preclude the hydrophobiccollapse. These interactions play crucial roles not only in the formation of nuclei butalso in the subsequent post-nucleation stages of folding.

Acknowledgements

This research was supported in part by a grant from the National Science Foundation.

References

[1] C. Krittanai, W. Curtis Johnson, Proteins 39 (2000) 132.[2] K. Park, M. Vendruscolo, E. Domany, Proteins 40 (2000) 237.[3] C. Waldburger, T. Jonsson, R.T. Sauer, Proc. Natl. Acad. Sci. USA 93 (1996) 2629.[4] R. Baldwin, G. Rose, Trends Biochem. Sci. 24 (1999) 26.[5] D.L. Minor, P.S. Kim, Nature (London) 371 (1994) 264.[6] S. Takada, Z.A. Luthey-Schulten, P.G. Wolynes, J. Chem. Phys. 110 (1999) 11 616.[7] A. Fern&andez, J. Chem. Phys. 114 (2001) 2489.[8] A. Kentsis, T.R. Sosnick, Biochemistry 37 (1998) 14 613.[9] A. CaDisch, M. Karplus, J. Mol. Biol. 252 (1995) 672.[10] T. Simonson, C.L. Brooks III, J. Amer. Chem. Soc. 118 (1996) 6452.[11] A. Garc&\a, G. Hummer, Proteins 38 (2000) 261.[12] A. Garc&\a, K.Y. Sanbonmatsu, Proteins 42 (2001) 345.[13] A. Fern&andez, A. Colubri, R.S. Berry, Proc. Natl. Acad. Sci. USA 97 (2000) 14 062.

A. Fern*andez et al. / Physica A 307 (2002) 235–259 259

[14] A. Fern&andez, K. Kostov, R.S. Berry, Proc. Natl. Acad. Sci. USA 96 (1999) 12 991.[15] A. Fern&andez, R.S. Berry, J. Chem. Phys. 112 (2000) 5212.[16] A. Fern&andez, A. Colubri, R.S. Berry, J. Chem. Phys. 114 (2001) 5871.[17] C.R. Matthews, Methods Enzymol. 154 (1987) 498.[18] A.R. Fersht, A. Matouschek, L. Serrano, J. Mol. Biol. 224 (1992) 771.[19] B.A. Krantz, L.B. Moran, A. Kentsis, T.R. Sosnick, Nat. Struct. Biol. 7 (2000) 62.[20] A.R. Fersht, Proc. Natl. Acad. Sci. USA 97 (2000) 1525.[21] V.I. Abkevich, A.M. Gutin, E.I. Shakhnovich, Biochemistry 33 (1994) 10 026.[22] Z. Guo, D. Thirumalai, Folding Des. 2 (1997) 377.[23] V. Munoz, W.A. Eaton, Proc. Natl. Acad. Sci. USA 96 (1999) 11 311.[24] P.G. Bolhuis, C. Dellago, D. Chandler, Proc. Natl. Acad. Sci. USA 97 (2000) 5877.[25] J.-E. Shea, J.N. Onuchic, C.L. Brooks III, J. Chem. Phys. 113 (2000) 7663.[26] J.N. Onuchic, Z. Luthey-Schulten, P.G. Wolynes, Ann. Rev. Phys. Chem. 48 (1997) 545.[27] A. Fersht, Structure and Mechanism in Protein Science: a Guide to Enzyme Catalysis and Protein

Folding, W. Freeman & Co, New York, 1999.[28] D.E. Rumelhart, J.C. McCleland, and the PDP Group, Parallel Distributed Processing, MIT Press,

Cambridge, 1988.[29] S.E. Jackson, A.R. Fersht, Biochemistry 30 (1991) 10 428.[30] K. Zdanowski, M. Dadlez, J. Mol. Biol. 287 (1999) 433.[31] M. Dadlez, Biochemistry 36 (1997) 2788.[32] T.R. Sosnick, L. Mayne, S.W. Englander, Proteins 24 (1996) 413.[33] S.W. Michnick, E.I. Shakhnovich, Folding Des. 3 (1998) 239.[34] R. Matheson, H. Scheraga, Macromolecules 11 (1978) 814.[35] N. Schonbrunner, G. Pappenberger, M. Scharf, J. Engels, T. Kiefhaber, Biochemistry 36 (1997) 9057.[36] B.A. Shoemaker, J. Wang, P.G. Wolynes, Proc. Natl. Acad. Sci. USA 94 (1997) 777.[37] E. Lacroix, A.R. Viguera, L. Serrano, J. Mol. Biol. 284 (1998) 173.[38] A. Fern&andez, A. Colubri, Phys. Rev. E 60 (1999) 4645.


Recommended