+ All Categories
Home > Documents > Cello-oligomer-binding dynamics and directionality in family 4 ...

Cello-oligomer-binding dynamics and directionality in family 4 ...

Date post: 06-Jan-2017
Category:
Upload: ledung
View: 218 times
Download: 1 times
Share this document with a friend
12
Glycan Recognition Cello-oligomer-binding dynamics and directionality in family 4 carbohydrate-binding modules Abhishek A Kognole 2 and Christina M Payne 1,2 2 Department of Chemical and Materials Engineering, University of Kentucky, 177 F. Paul Anderson Tower, Lexington, KY 40506, USA 1 To whom correspondence should be addressed: Tel: +1-859-257-2902; e-mail: [email protected] Received 18 June 2015; Revised 4 July 2015; Accepted 4 July 2015 Abstract Carbohydrate-binding modules (CBMs) play signicant roles in modulating the function of cellu- lases, and understanding the proteincarbohydrate recognition mechanisms by which CBMs select- ively bind substrate is critical to development of enhanced biomass conversion technology. CBMs exhibit a limited range of specicity and appear to bind polysaccharides in a directional fashion dic- tated by the position of the ring oxygen relative to the protein fold. The two family 4 CBMs of Cellu- lomonas mi Cel9B (Cf CBM4) are reported to preferentially bind cellulosic substrates. However, experimental evidence suggests that these CBMs may not exhibit a thermodynamic preference for a particular orientation. We use molecular dynamics (MD) and free energy calculations to investigate proteincarbohydrate recognition mechanisms in Cf CBM4-1 and Cf CBM4-2 and to elucidate prefer- ential ligand-binding orientation. We evaluate four cellopentaose orientations including that of the crystal structure and three others suggested by nuclear magnetic resonance (NMR). These four orientations differ based on position of the ligand reducing end (RE) and pyranose ring orientations relative to the protein core. MD simulations indicate that the plausible orientations reduce to two con- formations. Calculated ligand-binding free energy discerns each of the orientations is equally favorable. The calculated free energies are in excellent agreement with isothermal titration calorim- etry measurements from the literature. MD simulations further reveal the approximate structural symmetry of the oligosaccharides relative to the amino acids along the binding cleft plays a role in the promiscuity of ligand binding. A survey of ligand-bound structures suggests this phenomenon may be characteristic of the broader class of proteins belonging to the β-sandwich fold. Key words: β-sandwich fold, Carbohydrate recognition, Cellulose, Molecular modeling, Thermodynamics Introduction Glycoside hydrolases are responsible for a vast majority of natural bio- mass conversion (Falkowski et al. 2000; Bardgett et al. 2008). As a result, glycoside hydrolases have become a primary focus of industrial protein engineering efforts toward efcient and economical produc- tion of second-generation biofuels (Wilson 2009). There are many challenges associated with enzymatic conversion of biomass to sol- uble, fermentable sugars. However, one of the greatest challenges is posed by nature itself. To protect against both microbial and animal attack, plants have evolved naturally recalcitrant cellulosic cell walls (Himmel et al. 2007). One of the methods nature uses to overcome this difculty is through the secretion of multi-modular glycoside hy- drolases, consisting of catalytic domains appended to carbohydrate- binding modules (CBMs) by linker peptides (Lynd et al. 2002). The CBM is a non-catalytic module that serves as the primary biological means of proteincarbohydrate recognition (Boraston et al. 2004). Seemingly endless combinatorial constructs exist, with some fungal cellulases exhibiting a single catalytic domain and CBM (Shoemaker Glycobiology, 2015, vol. 25, no. 10, 11001111 doi: 10.1093/glycob/cwv048 Advance Access Publication Date: 7 July 2015 Original Article © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected] 1100 Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633 by guest on 01 March 2018
Transcript
Page 1: Cello-oligomer-binding dynamics and directionality in family 4 ...

Glycan Recognition

Cello-oligomer-binding dynamics and

directionality in family 4 carbohydrate-binding

modules

Abhishek A Kognole2 and Christina M Payne1,2

2Department of Chemical and Materials Engineering, University of Kentucky, 177 F. Paul Anderson Tower, Lexington,KY 40506, USA

1To whom correspondence should be addressed: Tel: +1-859-257-2902; e-mail: [email protected]

Received 18 June 2015; Revised 4 July 2015; Accepted 4 July 2015

Abstract

Carbohydrate-binding modules (CBMs) play significant roles in modulating the function of cellu-

lases, and understanding the protein–carbohydrate recognition mechanisms by which CBMs select-

ively bind substrate is critical to development of enhanced biomass conversion technology. CBMs

exhibit a limited range of specificity and appear to bind polysaccharides in a directional fashion dic-

tated by the position of the ring oxygen relative to the protein fold. The two family 4 CBMs of Cellu-lomonas fimi Cel9B (CfCBM4) are reported to preferentially bind cellulosic substrates. However,

experimental evidence suggests that these CBMs may not exhibit a thermodynamic preference for

a particular orientation.We usemolecular dynamics (MD) and free energy calculations to investigate

protein–carbohydrate recognition mechanisms in CfCBM4-1 and CfCBM4-2 and to elucidate prefer-

ential ligand-binding orientation. We evaluate four cellopentaose orientations including that of the

crystal structure and three others suggested by nuclear magnetic resonance (NMR). These four

orientations differ based on position of the ligand reducing end (RE) and pyranose ring orientations

relative to the protein core.MD simulations indicate that the plausible orientations reduce to two con-

formations. Calculated ligand-binding free energy discerns each of the orientations is equally

favorable. The calculated free energies are in excellent agreement with isothermal titration calorim-

etry measurements from the literature. MD simulations further reveal the approximate structural

symmetry of the oligosaccharides relative to the amino acids along the binding cleft plays a role

in the promiscuity of ligand binding. A survey of ligand-bound structures suggests this phenomenon

may be characteristic of the broader class of proteins belonging to the β-sandwich fold.

Key words: β-sandwich fold, Carbohydrate recognition, Cellulose, Molecular modeling, Thermodynamics

Introduction

Glycoside hydrolases are responsible for a vast majority of natural bio-mass conversion (Falkowski et al. 2000; Bardgett et al. 2008). As aresult, glycoside hydrolases have become a primary focus of industrialprotein engineering efforts toward efficient and economical produc-tion of second-generation biofuels (Wilson 2009). There are manychallenges associated with enzymatic conversion of biomass to sol-uble, fermentable sugars. However, one of the greatest challenges isposed by nature itself. To protect against both microbial and animal

attack, plants have evolved naturally recalcitrant cellulosic cell walls(Himmel et al. 2007). One of the methods nature uses to overcomethis difficulty is through the secretion of multi-modular glycoside hy-drolases, consisting of catalytic domains appended to carbohydrate-binding modules (CBMs) by linker peptides (Lynd et al. 2002). TheCBM is a non-catalytic module that serves as the primary biologicalmeans of protein–carbohydrate recognition (Boraston et al. 2004).Seemingly endless combinatorial constructs exist, with some fungalcellulases exhibiting a single catalytic domain and CBM (Shoemaker

Glycobiology, 2015, vol. 25, no. 10, 1100–1111doi: 10.1093/glycob/cwv048

Advance Access Publication Date: 7 July 2015Original Article

© The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected] 1100

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 2: Cello-oligomer-binding dynamics and directionality in family 4 ...

et al. 1983; Teeri et al. 1987), and bacterial cellulosomes exhibitingmultiple CBMs and catalytic domains (Bayer and Lamed 1986; Doiand Tamaru 2001; Schwarz 2001). With a clearly defined functionalrole of glycosidic cleavage as a straightforward target for improve-ment, catalytic domains have remained a significant focus of muchof the ongoing biomass conversion protein engineering research fordecades (Payne et al. 2015). Conversely, the community has justbegun to recognize the potential of harnessing the carbohydrate recog-nition capabilities of CBMs for more effective biomass conversion orany of the myriad other biotechnological applications that benefitfrom specific protein–carbohydrate-binding interactions. As aresult, many of the molecular-level mechanisms underlying protein–carbohydrate binding remain elusive. Developing a fundamentalunderstanding of how CBMs recognize and mediate carbohydratebinding will enable engineering of promiscuity and affinity, offeringthe promise of enhanced biomass conversion through engineered chi-meras and cellulosomes (Mingardon et al. 2007; Nakazawa et al.2013). Furthermore, this knowledge will translate to development ofincreasingly common CBM biotechnological applications includingaffinity purification techniques, alternatives to monoclonal antibodiesand cell wall molecular probes (Tomme et al. 1998; McCartney et al.2004; Shoseyov et al. 2006).

A CBM’s function and the interactions it maintains with the sub-strate are largely defined by protein structure. Nomenclature broadlycategorizing CBM-substrate interactions has evolved as our under-standing of CBM function develops and is helpful in assessing similar-ities (Boraston et al. 2004; Gilbert et al. 2013). CBMs are currentlyclassified into both families and types according to fold and functionalsimilarities, respectively, and generally appear to preferentially bindeither crystalline or amorphous/non-crystalline substrates non-competitively. Type A CBMs bind crystalline polysaccharide surfacesalong flat, aromatic residue-lined faces; the archetypal Trichodermareesei Family 1 CBM belongs to this type (Mattinen et al. 1997). Incontrast, Type B and C CBMs closely associate with single polysac-charide chains by binding the chain along a groove or cleft reminiscentof catalytic domains. The difference between Types B and C was onlyrecently elucidated, where Type B CBMs bind polysaccharide chainsinternally (i.e., endo-type), and Type C CBMs bind chains at the ter-mini (i.e., exo-type) (Gilbert et al. 2013). As of today, 69 differentCBM families have been identified, each of which belongs to one ofseven unique folds (Lombard et al. 2014). However, familial categor-ization does not necessarily delineate specificity, suggesting small sub-stitutions along the binding sites greatly contribute to function. Forexample, Family 4 CBMs (CBM4s) each display a characteristicβ-sandwich fold, which is general to a large number of families andallows a great deal of flexibility in substrate binding (Richardson1981). Within this family, CBMs have been noted to bind xylan,β-1,3-glucan, β-1,4-β-1,3-glucans, β-1,6-glucans and amorphous cel-lulose (Coutinho et al. 1992; Fuchs et al. 2003; Hong and Meng2003; Gullfot et al. 2010).

The multi-modular Cellulomonas fimi endoglucanase Cel9B (for-merly CenC) exhibits tandem Type B, N-terminal CBMs, both ofwhich belong to Family 4 (Coutinho et al. 1991). The domains,CfCBM4-1 (formerly CBDN1) and CfCBM4-2 (formerly CBDN2),appear sequentially and additively bind amorphous cellulose(Tomme et al. 1996). CfCBM4-1 is of historical significance as thefirst known soluble substrate-binding CBM, the discovery of whichled to renewed interest in CBM function in general. BothCfCBM4-1 and CfCBM4-2 bind cellotetraose and cellopentaosewith increasing affinity (Tomme et al. 1996; Brun et al. 2000), suggest-ing each binding cleft consists of five pyranose binding subsites formed

by the parallel β-sheets of the β-sandwich (Figure 1); this waslater confirmed when Boraston et al. solved the cellopentaosebound CfCBM4-1 structure (Boraston et al. 2002). The affinity ofCfCBM4-1 and CfCBM4-2 for cello-oligomers is roughly the samefor a given length, and isothermal titration calorimetry suggests bind-ing of cello-oligomers to both CBM4s is enthalpically driven (Tommeet al. 1996; Brun et al. 2000). This latter observation is consistent withthe large population of potential hydrogen-bonding polar residues lin-ing the binding cleft (Kormos et al. 2000). Despite the apparent simi-larities in specificity and binding mode, the two modules exhibit only36% sequence identity with notable amino acid substitutions alongthe binding cleft. The binding cleft of CfCBM4-2 is also noticeablywider than CfCBM4-1 (Brun et al. 2000; Boraston et al. 2002). Weanticipate that direct comparison of the dynamics of cellopentaose-bound CfCBM4-1 and CfCBM4-2 will elucidate the fundamental in-teractions driving β-1,4-linked glucan specificity in Family 4 CBMs.Furthermore, these findings are likely to have broad applicability toother CBM families with β-sandwich folds.

NMR analysis of nitroxide spin-labeled cello-oligomer derivativesalso put forth the intriguing, though somewhat controversial, hypothesisthatCfCBM4-1 andCfCBM4-2 are capable of binding a cello-oligomerin amulti-directional fashion (Johnson et al. 1999). Johnson et al. (1999)examined association of 2,2,6,6-tetramethylpiperidine-1-oxyl-4-yl(TEMPO)-labeled cellotriose and cellotetraose with individualCfCBM4-1 and CfCBM4-2 domains. At the time of this study, struc-tural resolution of a ligand-bound CBM4 was unavailable, and NMRtechniques were a complementary approach toward understanding lig-and binding in lieu of crystallographic evidence. Determination of 1Hand 15N chemical shifts confirmed labeling did not significantly affect af-finity, and paramagnetic relaxation studies further revealed the nitroxidelabel could lie at either end of the binding clefts. However, relative occu-pancies were not determined as a means to suggest a “more favorable”conformation. The multi-directional-binding observation is interestingbecause it is counter to intuition. Polysaccharides exhibit a large dipolealong the length of the polymer as a result of several factors including theparallel orientation of chains, the asymmetric pyranose ring oxygenatom, and the chemical polarity of the individual chains (Sugiyamaet al. 1992; Frka-Petesic et al. 2014); for the cellopentaose ligand, thedipole moment is ∼12 D. On the surface, it seems such a dipole wouldprecludemulti-directional binding, as proteins likely evolve in such awayas to most effectively hydrogen bond with the oligomer in a given direc-tion. Boraston et al. (2002) reached a similar conclusion upon solution ofthe cellopentaose-bound CfCBM4-1 structure. The structure capturedthe cellopentaose with one hydrophilic edge of the sugar pointed intoward the binding groove and the other edge exposed to solvent.Unambiguous electron density pointed to a single thermodynamically fa-vorable conformation occupying five subsites of the binding cleft. Never-theless, the authors left open the possibility that serendipitous crystalpacking interactions may have resulted in binding the least favorable cel-lopentaose orientation. Of course, the ability to multi-directionally bindcello-oligomers would be significantly advantageous in engineered cellu-lase or cellulosomal constructs, allowing the CBMs to target amorphouscellulose from virtually any angle. Thus, determining whether this cap-ability does in fact exist in a thermodynamically equivalent capacityand howmulti-directional CBM4 substrate binding is accomplished pro-mises to inform future biotechnological development.

Here, we use molecular dynamics (MD) simulations to explicitlyexamine carbohydrate-binding mechanisms in Family 4 CBMs. MDsimulations of eight total systems representing the various ligand con-figurations of CfCBM4-1 and CfCBM4-2 were conducted for0.25 μs. Six ligand-bound systems representing possible variations in

Binding mechanisms of family 4 carbohydrate-binding modules 1101

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 3: Cello-oligomer-binding dynamics and directionality in family 4 ...

binding cleft occupation were examined in addition to the twounbound proteins (Figure 1). The corresponding case–systemabbreviation used throughout this study is illustrated in Figure 1.The CfCBM4-1 systems were constructed based on the 1GU3 ProteinData Bank (PDB) structure (Boraston et al. 2002), and the CfCBM4-2systems were constructed from the 1CX1 PDB structure (Brun et al.2000). The CfCBM4-1 structure features a bound ligand, whichwas used here as the basis for investigation of ligand dynamicsand directionality preference. Four ligand orientations bound toCfCBM4-1 were considered representing: (i) the structural orientation(CfCBM4-1-RE); (ii) a reversed ligand orientation where the non-reducing end (NRE) of the cellopentaose occupies the original RE pos-ition of the structural conformation and symmetry of the glucopyranoseside chains is maintained (CfCBM4-1-NRE); (iii) a rotation of the struc-tural cellopentaose conformation about C1–C4 axis so the oppositehydrophilic edge faces inward to the protein, effectively locating a C5hydroxymethyl group where the C3 hydroxyl previously existed(CfCBM4-1-RE’); and (iv) a transverse axis reversal along with theC1-C4 rotation (CfCBM4-1-NRE’). As theCfCBM4-2NMR structure

does not contain a ligand, the boundCfCBM4-2 systemswere preparedby aligning CfCBM4-2 to CfCBM4-1 protein backbones and dockingthe cellopentaose to the CfCBM4-2 structure. Two ligand orientationsin CfCBM4-2 were considered, representing the orientation of the1GU3 structure (CfCBM4-2-RE) and the transverse axis transform-ation (CfCBM4-2-NRE). The unbound CfCBM4-1 and CfCBM4-2systems were also considered to understand the contributions of ligandbinding to protein dynamics. A detailed description of simulation con-struction is provided in Methods and Supplementary data.

We quantitatively examined thermodynamic preference of liganddirectionality through a computational determination of absolutebinding free energy. An enhanced sampling free energy methodology,free energy perturbation with replica-exchange molecular dynamics(FEP/λ-REMD), was used to calculate the affinity of cellopentaose toCfCBM4-1 (Jiang et al. 2009). We considered two cases representingthe structural orientation and the transverse axis rotation,CfCBM4-1-RE and CfCBM4-1-NRE, respectively. The remainingtwo CfCBM4-1 ligand orientations, rotations about the C1–C4axis, were excluded from free energy calculations, as the ligands

Fig. 1. CfCBM4-1 and CfCBM4-2 ligand conformations considered in this study. CfCBM4-1 and CfCBM4-2 are shown in cartoon representations with key aromatic

residues shown in stick. The cellopentaose ligand is shown in stick representation, perpendicular to the β-sheets of the protein core. (A) CfCBM4-1-RE represents the

ligand orientation of the CfCBM4-1 structure (PDB 1GU3) with the reducing end (RE) in a left-to-right fashion, and (B) CfCBM4-1-NRE illustrates the reverse,

transverse axis transformation with the ligand oriented so the RE runs from right to left. (C) CfCBM4-1-RE’ represents the structural ligand orientation with the

RE from left to right, but the cellopentaose has been rotated 180° about the length of the C1–C4 axis, locating the hydroxymethyl groups out of register. (D)

CfCBM4-1-NRE’ represents both the transverse axis rotation and the 180° C1–C4 rotation of the cellopentaose in the binding cleft. (E) CfCBM4-2-RE represents

the CfCBM4-1 structural ligand orientation (PDB 1GU3) with the RE of the ligand running from left to right. (F) CfCBM4-2-NRE represents the transverse axis

transformation of cellopentaose so the RE runs from right to left. This figure is available in black and white in print and in color at Glycobiology online.

1102 AA Kognole and CM Payne

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 4: Cello-oligomer-binding dynamics and directionality in family 4 ...

significantly shifted along the length of the binding cleft over thecourse of the MD equilibration simulations and no longer representedthe intended conformational state. These results will be described fur-ther in Results and Discussion.

Results and discussion

Symmetry of the cellopentaose is critical to binding

Four possible cellopentaose conformations occupying theCfCBM4-1 bind-ing groove were investigated as potential multi-directional-binding forms.Two of these conformations, CfCBM4-1-RE’ and CfCBM4-1-NRE’,were constructed so as to test the suitability of the binding groove toaccommodate larger carbohydrate side chain groups, such as the hydro-xymethyl group, regardless of binding subsite. The nomenclature of dif-ferent binding subsites for CfCBM4-1 is illustrated in Figure 2. Thesetwo systems, constructed by rotating the ligand around its longitudinalaxis (Figure 1C and D), place the cellopentaose off register by one bind-ing subsite compared with the structurally bound ligand. Acceptance ofthe latter two ligand conformations would require each of the bindingsubsites to consist of semi-redundant hydrogen-bonding residues inevery binding subsite.

MD simulations indicate the CfCBM4-1 binding groove will notaccept the cellopentaose with the hydroxymethyl group arbitrarily lo-cated along the groove. This result is immediately evident from visual-ization of both theCfCBM4-1-RE’ andCfCBM4-1-NRE’ trajectories(Supplementary data, Movies S1 and S2). From the CfCBM4-1-NRE’trajectory, we observe the cellopentaose shift longitudinally across the

groove within 2 ns of the 250 ns simulation (Figure 3 and Supplemen-tary data, Movie S1). The displacement of the cellopentaose exposes aglucopyranose moiety to solvent, external to the binding groove, leav-ing only four moieties bound in the groove. For the purposes ofdescribing ligand dynamics going forward, we have numbered thisexternal “binding site” as “site 0” (Figure 2). An equivalent shiftoccurred at 8 ns in the CfCBM4-1-RE’ simulation (Supplementarydata, Movie S2). As described in Methods, each of these starting con-figurations was extensively minimized in a stepwise fashion, signifi-cantly reducing the possibility that unfavorable molecular contactsinfluenced the ability of the cellopentaose to occupy the alternativebinding site. Additionally, each of these simulations was independent-ly repeated varying the random number seed, and the same shift ofthe cellopentaose across the binding groove was observed. In theremaining two cellopentaose conformations, CfCBM4-1-RE andCfCBM4-1-NRE, this displacement was not observed.

Johnson et al. (1999) suggested that the approximate structuralsymmetry of oligosaccharides accounts for the ability of the proteinto bind the cello-oligomer regardless of directionality. That is to say,upon reversing the cellopentaose within the binding site, the hydroxy-methyl group of the NRE occupies roughly the same position as thehydroxy-methyl group of the RE orientation group, which mayallow for similar hydrogen bonding (Figure 2). Rotating the cellopen-taose, as in theCfCBM4-1-RE’ andCfCBM4-1-NRE’ cases, effective-ly disrupts this structural symmetry. Positioning the ligand so that thehydrogen-bonding side chains are no longer occupying symmetricallysimilar locations, the cellopentaose is no longer able to make thehydrogen bonds necessary to bind within the active site, as we willshow through explicit characterization of hydrogen bonding. Naturally,

Fig. 2. Binding-site nomenclature for CfCBM4-1. The CBM binds cellopentaose

along five individual binding subsites perpendicular to the β-sheets forming

the protein core. These sites are numbered from 1 to 5. Here, we define an

additional “binding subsite,” 0, for discussion of MD simulations of

CfCBM4-1-RE’ and CfCBM4-1-NRE’. Subsite 0 represents a completely

solvent exposed pyranose ring of the cellopentaose chain. The bottom panel

illustrates the symmetry of a cello-oligomer oriented in the opposite direction.

The primary hydroxyl groups remain in approximately the same location

regardless of direction. This figure is available in black and white in print and

in color at Glycobiology online.

Fig. 3. Snapshots from the CfCBM4-1-NRE’ simulation at (A) 0 ns and (B) 2 ns.

The protein is shown in surface representation, and the ligand is shown in stick

representation. The snapshots illustrate the ligand, initially out of register from

the structurally bound position, naturally sliding to the more energetically

favorable position, defined by the position of the hydroxymethyl side chain

facing out of the binding cleft at subsite 1. This figure is available in black

and white in print and in color at Glycobiology online.

Binding mechanisms of family 4 carbohydrate-binding modules 1103

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 5: Cello-oligomer-binding dynamics and directionality in family 4 ...

the ligandwas displaced by one glucopyranose moiety as it readjusted itsside chains similar to CfCBM4-1-NRE or CfCBM4-1-RE. After thecellopentaose reached an equilibrium position, the CfCBM4-1-NRE’and CfCBM4-1-RE’ cases were approximately equivalent toCfCBM4-1-NRE and CfCBM4-1-RE, respectively.

In the following sections, we discuss the results of theCfCBM4-1-NRE’ and CfCBM4-1-RE’ by comparing the equilibriumposition of the cellopentaose, with four protein-bound moieties and asingular “external” moiety in site 0. Furthermore, as the CfCBM4-1-RE’ and CfCBM-4-1-NRE’ cases are approximately equivalent toCfCBM4-1-RE andCfCBM4-1-NRE, respectively, we did not performfree energy calculations on the former two cases.

Thermodynamic preference of cello-oligomer

orientation

A primary question we have sought to address by this study is whetherCfCBM4-1 has a thermodynamic preference for a particular boundcello-oligomer conformation given the inconclusive nature of experi-mental approaches to date. We used FEP/λ-REMD to calculate thefree energy of binding a cellopentaose ligand to the CfCBM4-1 bind-ing groove in two different orientations, CfCBM4-1-RE andCfCBM4-1-NRE, having narrowed down putative binding conforma-tions using MD. This free energy calculation protocol couples free en-ergy perturbation with Hamiltonian replica-exchange moleculardynamics to enhance Boltzmann sampling (Deng and Roux 2006;Jiang et al. 2009). The calculations were performed by decouplingthe potential energy into four separate contributions scaled accordingto coupling parameters, defined mathematically by Jiang et al. (2009).In short, the contributions to overall free energy included the shiftedWeeks-Chandler Anderson repulsive and dispersive components,ΔGrepu and ΔGdisp, respectively, and the electrostatics contribution,ΔGelec. Additionally, contributions from an applied restraining poten-tial, where necessary, were considered, ΔGrstr. We used the thermo-dynamic cycle in Figure 4 to arrive at the free energy of binding acellopentaose to CfCBM4-1. The cycle consisted of two separatesets of calculations: (i) decoupling the bound cellopentaose from the

solvated CfCBM4-1 and (ii) decoupling the solvated cellopentaosefrom solution. The difference between the two values is the standardbinding free energy, ΔGb. The restraining potential was used only inthe first leg of the cycle, decoupling cellopentaose from CfCBM4-1.Detailed simulation methodology is provided inMethods and Supple-mentary data.

The free energy of binding cellopentaose to CfCBM4-1 in eitherthe CfCBM4-1-RE or CfCBM4-1-NRE conformation was approxi-mately equal. As shown in Table I, the binding free energies were with-in error at −18.9 ± 5.4 and −24.5 ± 6.3 kJ mol−1 for CfCBM4-1-REand CfCBM4-1-NRE, respectively. The repulsive, dispersive, electro-statics, and restraining potential contributions are provided individu-ally. The free energies of each step in the thermodynamic cycle, ΔG1

and ΔG2, were obtained by summing these contributions. The corre-sponding error values represent standard deviations (SD) over the final30 of 40 intervals, i.e., the final 3 ns of 4 ns total. The free energy overthe course of the 4 ns calculation, in 100 ps intervals, is given in Sup-plementary data, Figure S1A. The error of the binding free energy,ΔGb, was obtained by taking the square root of the sum of the squaredstandard deviations of the free energy of decoupling cellopentaosefrom CfCBM4-1 and the cellopentaose solvation free energy, ΔG1

and ΔG2. Error calculations based on statistical correlation of thedata for each 100 ps interval are reported in Supplementary data,Figure S1A. We have chosen to report the standard deviation here,as this represents the larger of the two values. Progress towardconvergence was assessed by monitoring the time evolution of thefree energy calculation (Supplementary data, Figure S1A). The effectof replica-exchange frequency on the sampling and convergence ofthe binding free energy in the case of CfCBM4-1-RE was also consid-ered (Supplementary data, Figure S1B).

The calculated binding free energies were in excellent agreementwith a previously measured value obtained by isothermal titrationcalorimetry (ITC) at 35°C (Tomme et al. 1996). The reportedvalue of cellopentaose binding to CfCBM4-1 in pure water at 35°Cis −21.9 ± 3.8 kJ mol−1. As ITC does not provide structural-level reso-lution of ligand binding, the experimental binding free energy likelyrepresents the ensemble of both putative binding conformations.Considering the accuracy of both ITC and free energy calculations(Wang et al. 2006; Baranauskiene et al. 2009), the difference betweenthe two is relatively insignificant.

Calculated free energies of binding cellopentaose to CfCBM4-1support the hypothesis that CfCBM4-1 possesses the ability to bi-directionally bind cello-oligomers. Our findings of approximatethermodynamic equality are in line with the original Johnson et al.study using TEMPO-labeled cello-oligomers coupled with NMR toobserve ligand binding (Johnson et al. 1999). The crystallographicstructure, later captured by Boraston et al., temptingly suggests thatCfCBM4-1 binds cellopentaose in a single, thermodynamically favor-able orientation relative to the binding cleft (Boraston et al. 2002).Boraston et al. describe how the distance-dependent nature of the

Fig. 4. Thermodynamic cycle used to determine ligand-binding free energy

from FEP/λ-REMD. In this case, “CBM” is CfCBM4-1 and “ligand” is

cellopentaose. The subscripts “solv” and “vac” refer to the solvated and

vacuum (or decoupled) systems, respectively.

Table I. Binding free energies of cellopentaose to CfCBM4-1 in two ligand orientations representing bi-directional binding

ΔGb (kJ mol−1) ΔGrepu (kJ mol−1) ΔGdisp (kJ mol−1) ΔGelec (kJ mol−1) ΔGrstr (kJ mol−1)

Cellopentaose – 284.7 ± 1.6 −258.5 ± 0.5 −277.2 ± 1.4 –CfCBM4-1-RE −18.9 ± 5.4 308.9 ± 4.6 −330.0 ± 0.8 −247.6 ± 2.3 −1.2CfCBM4-1-NRE −24.5 ± 6.3 310.5 ± 4.8 −329.9 ± 1.2 −256.3 ± 2.5 0.3CfCBM4-1 experimentala −21.9 ± 3.8 – – – –

The solvation free energy of cellopentaose, ΔG2, is also tabulated as its three contributions—repulsion, dispersion, and electrostatics.aTomme et al. (1996).

1104 AA Kognole and CM Payne

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 6: Cello-oligomer-binding dynamics and directionality in family 4 ...

NMR spin-labeling analysis prohibits calculation of relative occu-pancy of each of the ligand-binding conformations, dismissing thepossibility that bi-directional binding represents anything more thana low-occupancy state. While free energy calculations also sufferfrom the inability to capture the statistical likelihood of a given orien-tation, the equality of the free energy of binding cellopentaose in eitherthe CfCBM4-1-RE or the CfCBM4-1-NRE conformation suggestsoccupancy of each state is equally likely and the captured structuralorientation was a result of circumstance.

CfCBM4-1 hydrogen bonding

The cellopentaose ligand formed approximately the same number ofhydrogen bonds in each binding subsite regardless of the direction ofthe ligand. VMDwas used to determine the average number of hydro-gen bonds formed per pyranose ring and side chain with the surround-ing protein (Figure 5A) (Humphrey et al. 1996). The criteria used todefine a hydrogen bond was a 3.0 Å donor–acceptor distance and a20° cutoff. The number of hydrogen bonds a ring formed was deter-mined for each frame of the trajectory and averaged over the 250-nslength. Hydrogen bonding primarily occurred with subsites 1 through3, where Arg75, Gln124, Gln128, Asn50 and Asn81were the primaryresidues participating in hydrogen bonding.

Detailed analysis of hydrogen bonding over the course of MD si-mulations defined the primary hydrogen-bonding partners in theCfCBM4-1 binding subsites responsible for acceptance of a bi-directionally bound cellopentaose (Supplementary data, Figure S2).In subsite 1, Arg75 and Gln128 hydrogen bond with the secondaryhydroxyl groups of the pyranose ring. Gln124 generally bonds withthe primary hydroxyl group of the pyranose ring in subsite 2, whileoccasionally hydrogen bonding with the secondary hydroxyl groupof the subsite 3 pyranose. Asn50 and Asn81 hydrogen bond withthe secondary hydroxyl groups of the subsite 3 pyranose. The proteinsurrounding subsite 4 rarely participated in hydrogen bonding withthe pyranose ring, but when a hydrogen bond was formed, Ala18was the partner. This specificity for primary and secondary hydroxylgroups can only be fulfilled by the orientation of cellopentaose inCfCBM4-1-RE and CfCBM4-1-NRE, resulting from the symmetryof the ligands (Figure 2). When cellopentaose occupies the bindingsite as initialized in CfCBM4-1-RE’ and CfCBM4-1-NRE’, thesehydrogen-bonding partners were inaccessible, and thus, the ligandshift by one binding subsite accommodates formation of hydrogenbonds with the protein. The binding subsites ofCfCBM4-1 do not ap-pear to have redundant hydrogen-bonding partners that would allowbinding of the CfCBM4-1-RE’ and CfCBM4-1-NRE’ ligand confor-mations.

Fig. 5. CfCBM4-1 hydrogen-bonding behavior and protein-ligand dynamics from 250-nsMD simulations. (A) Average number of hydrogen bonds (H-bonds) formed

between the pyranose ring and the surrounding protein ofCfCBM4-1-binding site. Error bars represent 1 standard deviation (SD). (B) RMSDof theCfCBM4-1 protein

backbone over the 250 ns simulation. The RMSD reference structure is the last frame of the 1-ns equilibration simulation, which is why RMSD does not start at 0 Å at

0 ns. (C) RMSF of the cellopentaose ligand on the per-binding-site basis. Error bars were determined from block averaging over 2.5 ns blocks of data. (D) Average

total interaction energy (sum of van der Waals and electrostatic contributions) of each pyranose ring with the surrounding protein. Error bars represent 1 SD. This

figure is available in black and white in print and in color at Glycobiology online.

Binding mechanisms of family 4 carbohydrate-binding modules 1105

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 7: Cello-oligomer-binding dynamics and directionality in family 4 ...

CfCBM4-1 dynamics

Molecular dynamics simulations further support the feasibility of bi-directional ligand binding. Examination of the five CfCBM4-1 andthree CfCBM4-2 molecular dynamics simulations described above re-veals remarkably similar dynamic behavior among theCfCBM4-1-REand CfCBM4-1-NRE conformations and the CfCBM4-2-RE andCfCBM4-2-NRE conformations. Furthermore, the dynamics of theCfCBM4-1-RE’ and CfCBM4-1-NRE’ conformations correspondedto the CfCBM4-1-RE and CfCBM4-1-NRE dynamics, respectively,following translocation. To evaluate dynamic similarity, we applieda host of simulation trajectory analyses including: analysis of proteinand ligand flexibility measured through the root mean square devi-ation (RMSD) and root mean square fluctuation (RMSF), non-bondedinteraction energy measurements and degree of ligand solvation.

Evaluation of the RMSDof the protein backbone over the course the250-ns simulation illustrates the relative stability of the CfCBM4-1-REand CfCBM4-1-NRE conformations (Figure 5B). The RMSD was cal-culated for each of the five CfCBM4 simulations, using the last coordi-nates of the 100-ps equilibration simulation as the reference coordinates.The RMSD of the protein backbones in the CfCBM4-1-RE andCfCBM4-1-NRE simulationswere extraordinarily well behaved, deviat-ing little over the course of the simulation. This particular result suggeststhe opposite ligand conformation did not adversely affect the proteinstructure, and the binding sitewas capable of accommodating the ligandwithout a significant structural rearrangement. When the ligand was ro-tated around the longitudinal axis, as in the cases of CfCBM4-1-RE’and CfCBM4-1-NRE’, the dynamics of the protein backbone reflectedthe translocation of the ligand to the equivalent CfCBM4-1-RE andCfCBM4-1-NRE positions. The RMSD deviated significantly fromthat of the initial position as the protein rearranged the ligand, andthe last glucose binding site remained unoccupied. The CfCBM4-1-RE’and CfCBM4-1-NRE’ simulations eventually reached an equilibriumsimilar to that of the ligand-free CfCBM4-1.

Similarly, the RMSF of the protein backbones indicate the averageprotein structure of each ligand-bound CfCBM4-1 was generally un-affected by the ligand’s conformation (Supplementary data,Figure S3A). In fact, the absence of ligand appeared to impact the pro-tein more than any of the ligand conformations. Supplementary data,Figure S3A illustrates that the aromatic residues along the bindingcleft, Tyr43 and Tyr85, and key hydrogen-bonding residues, Asn50and Asn81, fluctuated significantly in the absence of a ligand. Thesefluctuations potentially contributed to the increase in RMSD of theapo structure, and gradually, the backbone of the apo CBM becamemore flexible. This may be a mechanism by which the CBM makesthe binding site more accessible to ligands. In general, the N- andC-terminal domain RMSF values were significantly higher than thecore of the protein domain. While high terminal domain fluctuationis an expected behavior in nearly all proteins, we mention this toadd the caveat that CfCBM4-1 has been simulated without a boundcalcium ion. The 1GU3 structure does not contain a resolvedmetal ion(Boraston et al. 2002), but (Johnson et al. 1998) have reported thatCfCBM4-1 coordinates calcium binding through residues Thr8,Gly30 and Asp142, where Thr8 and Asp142 comprise the N- andC-terminus, respectively. This lack of coordinating bonds tying to-gether the termini leads to higher relative fluctuation, as can be seenin the RMSD of the CfCBM4-1-apo at 70 ns (Figure 5B). However,the calcium ion and coordinating residues are located on the surfaceofCfCBM4-1, directly opposite the binding cleft, and the lack of a cal-cium ion has no effect on binding affinity (Johnson et al. 1998). Thus,we chose to simulate the protein without the calcium ion in accord-ance with the structure.

The flexibility of the CfCBM4-1-RE and CfCBM4-1-NRE ligands,as measured byRMSFof the pyranose ring atoms, was equivalent with-in error. The RMSF of the ligand is a determination of the average pos-ition of the ring atoms over the course of the entire simulation and isdelineated on a per-binding-site basis (Figure 5C). As previously de-scribed, the CfCBM4-1-RE’ and CfCBM4-1-NRE’ ligands shiftedout of register very early in theMD simulation to positions approximat-ing the side chain and ring positioning of the CfCBM4-1-RE andCfCBM4-1-NRE ligands, respectively. The “0” binding site representsa solvent exposed pyranose ring, external to the cleft. Otherwise, theRMSF as a function of binding site (Figure 5C) illustrates equivalent po-sitions along the cleft, where CfCBM4-1-RE has the same ring and sidechain orientation as CfCBM4-1-RE’. Along the entirety of the cleft, theCfCBM4-1-RE and CfCBM4-1-NRE pyranose rings fluctuated withinerror of each other, suggesting the cleft accommodates each ligand withequal favorability. Though equivalent in position at the 1–4 bindingsites, the CfCBM4-1-RE’ and CfCBM4-1-NRE’ ligands fluctuatedmore than the fully bound ligands. The solvent exposed pyranoserings had a much larger range of motion (Supplementary data, MoviesS1 and S2), uninhibited by the protein cleft, and this translated into in-creased fluctuation along the entirety of the four bound pyranose rings.

The degree of solvation within the binding cleft was unaffected bythe ligand conformation (Supplementary data, Figure S4A). For agiven trajectory frame, the number of water molecules within 3.5 Åof the pyranose ring of binding site was determined. This value wasaveraged for each binding site over the entire 250-ns trajectory. Theaverage value is a numerical estimate of the degree to which any pyr-anose ring is exposed to the water solvent. The degree of solvation ofany given binding site was within 1 SD of that of any of the variousligand conformations. This is consistent with the notion thatCfCBM4-1 is capable of binding the cellopentaose ligand in boththe CfCBM4-1-RE and CfCBM4-1-NRE directions.

The total interaction energyof the protein with the pyranose rings ofcellopentaose reveals aromatic stacking interactions were maintainedwith both faces of the pyranose rings along the cleft. Electrostatic andvan der Waals components of the non-bonded interactions were calcu-lated over the 250-ns MD simulations. The same non-bonded inter-action cutoffs used in producing the simulations were applied in thedata analysis. For computational efficiency, the interaction energy ana-lysis was conducted using a culled dataset, 2500 equally spaced framesrather than the 25,000 frames collected. The total interaction energy,the sum of the two components, was highest in binding sites 1 through3, reflecting the availability of hydrogen-bonding partners and aromaticstacking interactions relative to sites 4 and 5. As with other dynamicanalyses, the total interaction energy was generally unaffected by thedirection of the bound ligand (Figure 5D). The residues along the bind-ing cleft maintained a similar degree of contact with the pyranose ringsand were equally capable of maintaining stacking interactions with ei-ther face of the pyranose ring. From perturbations of 1H chemical shiftsupon cellotetraose binding, Johnson et al. reported that, despite themultitude of aromatic residues lining the binding cleft, only Tyr19and Tyr85 were directly involved in aromatic stacking interactions(Johnson et al. 1996a). From this determination of interaction energies,we endorse addition of Tyr43 to the list of stacking aromatic residues(Boraston et al. 2002), as the interaction energyof Tyr43with the ligandis of similar order as Tyr85.

CfCBM4-2 dynamics

Johnson et al. (1999) also made the case that CfCBM4-2 was capableof binding TEMPO-labeled cello-oligomers with the label at either endof the cleft. To date, a ligand-bound structure of this homologous

1106 AA Kognole and CM Payne

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 8: Cello-oligomer-binding dynamics and directionality in family 4 ...

CBM4 structure has not been reported, though the similarity in foldenabled docking of the 1GU3 cellopentaose and the modeledCfCBM4-1-NRE cellopentaose to the CfCBM4-2 structure. MD si-mulations of the two conformations were performed to elucidate themolecular interactions governing ligand binding and the possibility ofbi-directional binding.

CfCBM4-2 differs structurally from CfCBM4-1. The cleft ofCfCBM4-2 appears to be wider and several ligand-binding residuesare substituted (e.g., CfCBM4-1 residues Asn81,Tyr43, Gln128 andVal17 appear as Gly87, Trp49, Gln128 and Ser23, respectively, inCfCBM4-2; Figure 6A; Brun et al. 2000). However, MD simulationsuggests that the widened cleft of CfCBM4-2 may be an artifact ofthe structural study conditions. When CfCBM4-2 was docked withcellopentaose in the binding groove, the cleft width reduced, approxi-mately matching that of CfCBM4-1 (Supplementary data, Figure S5).The reduction in cleft width occurred quickly, during equilibrationand the protein remained in close contact with the ligand over the re-mainder of the simulation. The NMR structure captured CfCBM4-2in its ligand-free state (Brun et al. 2000), and thus, the absence of lig-and interactions is the likely reason behind the larger binding groovewidth relative to that of CfCBM4-1. The RMSD of the CfCBM4-2protein backbone reflects the protein rearrangement that occurswhen the binding cleft closes around the bound ligand, eventuallyequilibrating at∼3.5 Å (Figure 6B). The ligand-freeCfCBM4-2 exhib-ited a great deal more flexibility in the loops surrounding the cleft

(Supplementary data, Figure S3B), suggesting flexibility in the cleftas an acquisition mechanism. The N-terminus of the CfCBM4-2-REstructure underwent a conformational change ∼220 ns, as indicatedby the change in RMSD, but this does not affect ligand binding.

As with CfCBM4-1, dynamic measurement associated withligand-binding and hydrogen bond formation suggest CfCBM4-2 iscapable of bi-directional binding. Again, we have compared the aver-age number of hydrogen bonds formed between the protein and thispyranose rings of a given binding site, the RMSF of ligand along thecleft and the total interaction energy of the ligandwith the protein on aper-binding-site basis as function of ligand direction in theCfCBM4-2cleft. All of these measures were expected to be the same for the twoconformations, indicating both ligands are equally stable in theCfCBM4-2 cleft. Despite the significant sequence variation in thetwo clefts, the number of hydrogen bonds formed between a given pyr-anose ring and the CfCBM4-2 binding site did not vary significantlyupon reorientation of the ligand (Figure 7A). In general, each bindingsite formed one intermittent hydrogen bond with the protein over thecourse of the simulation. This is similar to the behavior of CfCBM4-1(Figure 5A), implying the substituted residues play equivalent roles inligand binding. The RMSF of the ligand was approximately the samein each binding site irrespective of where the RE resided (Figure 7B).The CfCBM4-2-NRE pyranose in binding site 5 was unable to main-tain stable interactions with any surrounding protein residues,accounting for the slight deviation from the CfCBM4-2-RE ligandbehavior. However, the total interaction energy of the pyranose ringin a given CfCBM4-2 binding site was the same regardless of liganddirection (Figure 7C). These dynamic measures support the hypothesisthat the CfCBM4-2 can bind cello-oligomers in at least two differentconformations. As we will describe, we further posit the ability to bi-directionally bind carbohydrate oligomers may be common to theβ-sandwich protein fold.

Evidence of bi-directional binding beyond C. fimi CBM4s

Bi-directional cello-oligomer binding is likely a phenomenon commonto CBM4s and the broader class of β-sandwich CBMs. While struc-tural resolution of cello-oligomers in two different orientations ofthe same CBM4 binding cleft does not currently exist, our computa-tional results combined with the NMR studies from Johnson et al.strongly suggest both CfCBM4-1 and CfCBM4-2 demonstratebi-directional-binding capabilities, despite significant differences in se-quence similarity. As further evidence of bi-directional binding inCBM4s, a computational docking study of a Clostridium thermocel-lum CBM4, part of the cellulosomal cellobiohydrolase A construct,found this particular CBM4 was likely to bind a cellohexaose in a dir-ection opposite that of the cellobiose bound in the reported crystalstructure (PDB ID 3K4Z) (Alahuhta et al. 2010).

CfCBM4-1 adopts a characteristic β-jelly roll fold, which belongsto the larger family of β-sandwich structures (Boraston et al. 2004). AsCBMs are classified in the CAZy database (Carbohydrate ActiveEnzyme Database; http://www.cazy.org) according to sequence andstructural similarity, all CBM4s belong to the β-sandwich proteinfold (Lombard et al. 2014). Further, the β-sandwich fold is commonamong other CBM families and is noted for its broad specificity(Boraston et al. 2004). At the writing of this manuscript, 29 of the69 CBM families documented in CAZy exhibit a form of theβ-sandwich fold, with a remarkable relative diversity of sequence.Accordingly, we hypothesized that bi-directional binding has beenpreviously observed in these structurally related CBMs, but that ithad perhaps not been recognized as such. In such a comparison, one

Fig. 6. (A) Comparison of binding site of CfCBM4-1 and CfCBM4-2 illustrating

substitutions of residues involved in cello-oligomer binding. The binding

subsites are numbered. (B) RMSD of the CfCBM4-2 protein backbone over

the 250-ns simulation. The RMSD reference structure is the last frame of the

100-ps or 0.1-ns equilibration simulation. This figure is available in black and

white in print and in color at Glycobiology online.

Binding mechanisms of family 4 carbohydrate-binding modules 1107

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 9: Cello-oligomer-binding dynamics and directionality in family 4 ...

must be cognizant that β-sandwiches can have two binding sites, oneon the face of the β-sheets and the other on the edge of the β-sheets(Boraston et al. 2004). Of the 29 β-sandwich CBM families, 10 fam-ilies had deposited structures with a glycan bound at the same bindingsite as that of CfCBM4-1-RE (i.e., on the face of the β-sheets). A totalof 34 glycan-bound CBM structures, representing 10 of the 29β-sandwich CBM families, were available for examination (Supple-mentary data, Table S1). Using the Dali Web Server (http://ekhidna.biocenter.helsinki.fi/dali_lite/start) (Hasegawa and Holm 2009) to

structurally align the 34 structures with CfCBM4-1 (PDB code1GU3), we examined the conformation of the ligands within theCBM-binding clefts.

Visualization of the glycan-bound β-sandwich fold CBM struc-tures reveals apparent promiscuity in binding. The examined CBMsbind not only C6 sugars but also C5 sugars; the sugars were oftenbonded through a variety of glycosidic linkages as well. Further, multi-directional binding along the binding cleft appeared often across theobserved β-sandwich CBM structures. Of the 34 structures examined,22 displayed a ligand in the same bound conformation as the 1GU3structure (i.e., CfCBM4-1-RE); 12 ligands appeared in the oppositeconformation corresponding to the modeled CfCBM4-1-NRE con-formation. As an example of the latter, we illustrate a family 15CBM from Pseudomonas cellulosa xylanase Xyn10C (PDB code1GNY) (Szabo et al. 2001) aligned with CfCBM4-1 (PDB code1GU3) in Figure 8. Family 6 CBMs exhibit bi-directional ligand bind-ing within the same family. Binding of glycans in β-sandwich CBMsmakes use of standard aromatic stacking interactions commonamong carbohydrate-binding proteins (Wimmerova et al. 2012; LuisAsensio et al. 2013), but we anticipate bi-directional binding is a con-sequence of the evolutionary diversity of the protein fold (Richardson1981), resulting in conveniently spaced hydrogen-bonding partnersalong the cleft. While 34 structures are too small a sample to drawconclusions relative to frequency of conformational occupancy, thisevaluation indicates bi-directional binding occurs more frequentlythan acknowledged and offers new possibilities in the developmentof cellulosic biotechnology.

MD simulations and free energy calculations have enabled us toinvestigate the molecular-level contributions to cellopentaose bindingin protein–carbohydrate systems that have eluded structural resolutiontechniques. Overall, our results support the original Johnson et al.hypothesis that C. fimi CBM4s are capable of binding cello-oligomers

Fig. 7. CfCBM4-2 ligand dynamic measurements. (A) Average number of

hydrogen bonds formed between each of the five pyranose rings and side

chains in each binding site with the surrounding protein (B) RMSF of ligand

on a per-binding-site basis for CfCBM4-2 systems. Error bars were

calculated using block averaging over 2.5 ns (C) Average total interaction

energy of each pyranose with the surrounding protein. Error bars represent 1

SD. This figure is available in black and white in print and in color at

Glycobiology online.

Fig. 8. Family 15 CBM derived from Pseudomonas cellulosa xylanase Xyn10C,

PcCBM15, bound to xylopentaose aligned with CfCBM4-1-RE bound to

cellopentaose. This figure is available in black and white in print and in color

at Glycobiology online.

1108 AA Kognole and CM Payne

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 10: Cello-oligomer-binding dynamics and directionality in family 4 ...

with the RE of the pyranose at either end of the binding cleft. Free en-ergy calculations are remarkably comparable with experimental ITCmeasurement and go beyond experiment in enabling delineation be-tween conformational populations. MD simulations reveal abundanthydrogen-bonding partners, in near 1 : 1 parity, exist along the bind-ing cleft, so that regardless of direction, the pyranose ring primary andsecondary hydroxyl groups are capable of maintaining a hydrogenbond with relevant partners from the interior of the cleft. MD simula-tions of CfCBM4-2 extend these observations to loosely related (36%sequence similarity) familial representatives. Observation of the dy-namic markers indicative of a stably bound ligand again suggestthat CfCBM4-2 is capable of binding cellopentaose in a bi-directionalfashion. This observation does not appear to be limited to CBM4s; ra-ther, many carbohydrate-binding proteins bearing the β-sandwichfold, which currently includes 29 additional CBM families, maybind pyranose rings irrespective of direction.

Methods

All CfCBM4-1 and CfCBM4-2 simulations were constructed fromcrystal structures, manually docking the cellopentaose ligands as ne-cessary. The CfCBM4-1 simulations were constructed from the1GU3 PDB structure, in which CfCBM4-1 binds cellopentaose inthe binding cleft. The nomenclature used in this study reflects theorientation of the 1GU3 ligand (Boraston et al. 2002); we have definedthis as the “RE” conformation of the bound ligand (CfCBM4-1-RE),numbering the pyranose moieties from 1 to 5 accordingly (Figure 2).The “NRE” conformation (CfCBM4-1-NRE) was prepared from thissame structure. To reverse the ligand direction, the coordinates of theheavy ring atoms were retained from the 1GU3 structure, and atomtypes were reassigned so as to locate the pyranose ring oxygen at theopposite end of the cleft (Figure 1). CHARMM was used to recon-struct the remaining hydrogens and hydroxyl groups from internal co-ordinate tables (Brooks et al. 2009). The RE and NRE cellopentaoseconformations bound to CfCBM4-2, CfCBM4-2-RE andCfCBM4-2-NRE, respectively, were constructed by docking theCfCBM4-1-RE and CfCBM-4-1-NRE ligands through structuralalignment with the 1CX1 PDB structure (Brun et al. 2000). To preparethe CfCBM4-1-RE’ and CfCBM4-1-NRE’ conformations, the coor-dinates of the pyranose ring heavy atoms were again renamed suchthat the ligand was rotated along the longitudinal axis relative toCfCBM4-1-RE and CfCBM4-1-NRE, respectively. CHARMM wasused to reconstruct remaining hydrogens and hydroxyl groups.Protonation states of the titratable residues were determined usingH++ and manual inspection of the protein environment (i.e., possiblesalt bridge formation) (Gordon et al. 2005; Myers et al. 2006;Anandakrishnan et al. 2012; Onufriev et al.). PyMOL and VMDwere used for structural alignment and visualization (Humphreyet al. 1996; Schrodinger 2010).

All constructed systems were vacuum minimized, solvated withwater, neutralized with sodium ions, and minimized again. The mini-mized systems were then heated to 300 K and density equilibrated inCHARMM. After equilibration, the systems were simulated for250 ns at 300 K in the NVT ensemble using NAMD (Phillips et al.2005). All simulations used the CHARMM36 force-field withCMAP correction for proteins (MacKerell et al. 1998, 2004; Brookset al. 2009), and the CHARMM36 carbohydrate force-field for thecellopentaose ligands (Guvench et al. 2008, 2009, 2011). The modi-fied TIP3P force-field was applied to water molecules (Jorgensen et al.1983; Durell et al. 1994). Analysis of the 250 ns MD simulations in-cluded: determination of the RMSD and RMSF of the protein

backbones, the RMSF of cellopentaose on a per-binding-site basis,the hydrogen-bonding and interaction energies of each glucose residuewith protein, and average solvation of the ligand on per-binding-sitebasis. Additional details can be found in Supplementary data.

The free energy of binding cellopentaose to CfCBM4-1 was calcu-lated using the FEP/λ-REMDprotocol (Jiang et al. 2009). The thermo-dynamic paths used to determine binding free energy is shown inFigure 4, and the FEP/λ-REMD methodology has been describedabove to add context to the results. The free energy calculationswere constructed from 4 ns snapshots from the explicitly solvatedCfCBM4-1-RE and CfCBM4-1-NRE MD simulations. The absolutebinding free energy was determined from 40 consecutive 0.1 ns calcu-lations, where the first 1 ns was discarded as equilibration (Supple-mentary data, Figure S1). The simulations used a set of 128 replicas(72 repulsive, 24 dispersive and 32 electrostatic) with an exchange fre-quency of 1/100 steps (every 0.1 ps). The CfCBM4-1/cellopentaosesystems included a positional restraint defined by the distance of thecenter of the mass of the ligand to the center of mass of the protein.This restraint bias during the decoupling of cellopentaose from the sol-vated complex to vacuum was determined by numerical integrationwith Simpsons’ rule (Deng and Roux 2006). The output energies col-lected during simulation were post-processed using the multistate Ben-nett Acceptance Ratio to calculate the free energies and statisticaluncertainty of the individual repulsive, dispersive, and electrostaticcontributions (Shirts and Chodera 2008). Finally, summation of allthe four contributions gives total free energy change for each leg ofthe thermodynamic pathway (i.e., ΔG1 and ΔG2). The binding free en-ergy of cellopentaose to CfCBM4-1 is the difference between the freeenergy of decoupling of solvated cellopentaose from solution and thefree energy of decoupling of bound cellopentaose from CfCBM4-1(i.e., ΔGb = ΔG2− ΔG1; Figure 4). The standard deviation of these va-lues over the 3 ns data collection period, which were combined usingerror propagation rules, is reported as the final binding free energyerror. Convergence was determined by monitoring the time evolutionof the free energy calculations. Additional details are provided in Sup-plementary data.

Supplementary data

Supplementary data for this article are available online at http://glycob.oxfordjournals.org/.

Funding

The material is based upon work supported by the National ScienceFoundation under Grant No. CHE-1404849.

Acknowledgements

Computational time for this research was provided by the Extreme Science andEngineering Discovery Environment (XSEDE), (Towns et al. 2014) which issupported by National Science Foundation grant number ACI-1053575. Thework was conducted on the National Institute of Computational Science Krakencluster under allocation MCB090159. Additional computational resources fortesting and data analysis were made available by the Center for ComputationalSciences DLX cluster at the University of Kentucky. Financial support was pro-vided by the National Science Foundation (CHE-1404849 to C.M.P.).

Conflict of interest statement

None declared.

Binding mechanisms of family 4 carbohydrate-binding modules 1109

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 11: Cello-oligomer-binding dynamics and directionality in family 4 ...

Abbreviations

CBM, carbohydrate-binding module; CfCBM4-1 and CfCBM4-2, family 4carbohydrate-binding modules 1 and 2, respectively, from Cellulomonas fimiCenC; FEP/λ-REMD, free energy perturbation with replica-exchange molecular dy-namics; MD, molecular dynamics; NMR, nuclear magnetic resonance; NRE, non-reducing end; PDB, Protein Data Bank; RE, reducing end; RMSD, rootmean squaredeviation; RMSF, root mean square fluctuation; SD, standard deviation.

References

Alahuhta M, XuQ, Bomble YJ, Brunecky R, AdneyWS, Ding SY, Himmel ME,Lunin VV. 2010. The unique binding mode of cellulosomal CBM4 fromClostridium thermocellum cellobiohydrolase A. J Mol Biol. 402:374–387.

Anandakrishnan R, Aguilar B, Onufriev AV. 2012. H++ 3.0: Automating pKprediction and the preparation of biomolecular structures for atomistic mo-lecular modeling and simulations. Nucleic Acids Res. 40:W537–W541.

Baranauskiene L, Petrikaite V,Matuliene J, Matulis D. 2009. Titration calorim-etry standards and the precision of isothermal titration calorimetry data. IntJ Mol Sci. 10:2752–2762.

Bardgett RD, Freeman C, Ostle NJ. 2008. Microbial contributions to climatechange through carbon cycle feedbacks. ISME J. 2:805–814.

Bayer EA, Lamed R. 1986. Ultrastructure of the cell-surface cellulosome of Clostrid-ium thermocellum and its interaction with cellulose. J Bacteriol. 167:828–836.

Boraston AB, Bolam DN, Gilbert HJ, Davies GJ. 2004. Carbohydrate-bindingmodules: Fine-tuning polysaccharide recognition. Biochem J. 382:769–781.

Boraston AB, Nurizzo D, Notenboom V, Ducros V, Rose DR, Kilburn DG,Davies GJ. 2002. Differential oligosaccharide recognition by evolutionarily-related β-1,4 and β-1,3 glucan-binding modules. J Mol Biol. 319:1143–1156.

Brooks BR, Brooks CL, MacKerell AD, Nilsson L, Petrella RJ, Roux B, Won Y,Archontis G, Bartels C, Boresch S, et al. 2009. CHARMM: The biomolecu-lar simulation program. J Comp Chem. 30:1545–1614.

Brun E, Johnson PE, Creagh AL, Tomme P, Webster P, Haynes CA,McIntosh LP. 2000. Structure and binding specificity of the secondN-terminal cellulose-binding domain from Cellulomonas fimi endogluca-nase C. Biochemistry. 39:2445–2458.

Coutinho JB, Gilkes NR,Warren RAJ, Kilburn DG,Miller RC. 1992. The bind-ing of Cellulomonas fimi endoglucanase-C (CenC) to cellulose and sepha-dex is mediated by the N-terminal repeats. Mol Microbiol. 6:1243–1252.

Coutinho JB,Moser B, Kilburn DG,Warren RAJ,Miller RC. 1991. Nucleotide-sequence of the endoglucanase-C gene (CenC) of Cellulomonas fimi, itshigh-level expression in Escherichia coli, and characterization of its pro-ducts. Mol Microbiol. 5:1221–1233.

Deng YQ, Roux B. 2006. Calculation of standard binding free energies: Aro-matic molecules in the T4 lysozyme L99A mutant. J Chem TheoryComput. 2:1255–1273.

Doi RH, Tamaru Y. 2001. The Clostridium cellulovorans cellulosome: An en-zyme complex with plant cell wall degrading activity. Chem Rec. 1:24–32.

Durell SR, Brooks BR, Ben-Naim A. 1994. Solvent-induced forces between 2hydrophilic groups. J Phys Chem. 98:2198–2202.

Falkowski P, Scholes RJ, Boyle E, Canadell J, Canfield D, Elser J, Gruber N,Hibbard K, Hogberg P, Linder S, et al. 2000. The global carbon cycle: Atest of our knowledge of Earth as a system. Science. 290:291–296.

Frka-Petesic B, Jean B,HeuxL. 2014. First experimental evidence of a giant perman-ent electric-dipole moment in cellulose nanocrystals. EPL-Europhys Lett.107:28006-p1–28006-p5.

Fuchs KP, Zverlov VV, Velikodvorskaya GA, Lottspeich F, Schwarz WH. 2003.Lic16A of Clostridium thermocellum, a non-cellulosomal, highly complexendo-beta-1, 3-glucanase bound to the outer cell surface. Microbiology.149:1021–1031.

Gilbert HJ, Knox JP, Boraston AB. 2013. Advances in understanding the mo-lecular basis of plant cell wall polysaccharide recognition by carbohydrate-binding modules. Curr Opin Struct Biol. 23:669–677.

Gordon JC, Myers JB, Folta T, Shoja V, Heath LS, Onufriev A. 2005. H++: Aserver for estimating pK(a)s and adding missing hydrogens to macromole-cules. Nucleic Acids Res. 33:W368–W371.

Gullfot F, Tan TC, von Schantz L, Karlsson EN, Ohlin M, Brumer H, Divne C.2010. The crystal structure of XG-34, an evolved xyloglucan-specificcarbohydrate-binding module. Proteins. 78:785–789.

Guvench O, Greene SN, Kamath G, Brady JW, Venable RM, Pastor RW,MacKerell AD. 2008. Additive empirical force field for hexopyranosemonosaccharides. J Comp Chem. 29:2543–2564.

Guvench O, Hatcher E, Venable RM, Pastor RW, MacKerell AD. 2009.CHARMM additive all-atom force field for glycosidic linkages betweenhexopyranoses. J Chem Theory Comput. 5:2353–2370.

Guvench O, Mallajosyula SS, Raman EP, Hatcher E, Vanommeslaeghe K,Foster TJ, Jamison FW,MacKerell AD. 2011. CHARMM additive all-atomforce field for carbohydrate derivatives and its utility in polysaccharide andcarbohydrate-protein modeling. J Chem Theory Comput. 7:3162–3180.

Hasegawa H, Holm L. 2009. Advances and pitfalls of protein structural align-ment. Curr Opin Struct Biol. 19:341–348.

Himmel ME, Ding SY, Johnson DK, Adney WS, Nimlos MR, Brady JW,Foust TD. 2007. Biomass recalcitrance: Engineering plants and enzymesfor biofuels production. Science. 315:804–807.

Hong TY, Meng M. 2003. Biochemical characterization and antifungal activityof an endo-1,3-beta-glucanase of Paenibacillus sp isolated from garden soil.Appl Microbiol Biot. 61:472–478.

Humphrey W, Dalke A, Schulten K. 1996. VMD – Visual molecular dynamics.J Mol Graphics. 14:33–38.

Jiang W, Hodoscek M, Roux B. 2009. Computation of absolute hydration andbinding free energy with free energy perturbation distributed replica-exchangemolecular dynamics. J Chem Theory Comput. 5:2583–2588.

Johnson PE, Brun E, MacKenzie LF, Withers SG, McIntosh LP. 1999. Thecellulose-binding domains from Cellulomonas fimi beta-1, 4-glucanaseCenC bind nitroxide spin-labeled cellooligosaccharides in multiple orienta-tions. J Mol Biol. 287:609–625.

Johnson PE, Creagh AL, Brun E, Joe K, Tomme P, Haynes CA, McIntosh LP.1998. Calcium binding by the N-terminal cellulose-binding domain fromCel-lulomonas fimi beta-1,4-glucanase CenC. Biochemistry. 37:12772–12781.

Johnson PE, Tomme P, Joshi MD, McIntosh LP. 1996a. Interaction of solublecellooligosaccharides with the N-terminal cellulose-binding domain of Cel-lulomonas fimi CenC. 2. NMR and ultraviolet absorption spectroscopy.Biochemistry. 35:13895–13906.

Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. 1983.Comparison of simple potential functions for simulating liquid water.J Chem Phys. 79:926–935.

Kormos J, Johnson PE, Brun E, Tomme P, McIntosh LP, Haynes CA,Kilburn DG. 2000. Binding site analysis of cellulose binding domainCBDN1 from endoglucanse C of Cellulomonas fimi by site-directed muta-genesis. Biochemistry. 39:8844–8852.

Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. 2014.The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids

Res. 42:D490–D495.Luis Asensio J, Arda A, Javier Canada F, Jimenez-Barbero J. 2013.

Carbohydrate-aromatic interactions. Acc Chem Res. 46:946–954.Lynd LR, Weimer PJ, van Zyl WH, Pretorius IS. 2002. Microbial cellulose

utilization: Fundamentals and biotechnology. Microbiol Mol Biol Rev.66:506–577.

MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ,Fischer S, Gao J, Guo H, Ha S, et al. 1998. All-atom empirical potential formolecular modeling and dynamics studies of proteins. J Phys Chem B.102:3586–3616.

MacKerell AD, Feig M, Brooks CL. 2004. Extending the treatment of backboneenergetics in protein force fields: Limitations of gas-phase quantum me-chanics in reproducing protein conformational distributions in moleculardynamics simulations. J Comp Chem. 25:1400–1415.

Mattinen ML, Kontteli M, Kerovuo J, Linder M, Annila A, Lindeberg G,Reinikainen T, Drakenberg T. 1997. Three-dimensional structures ofthree engineered cellulose-binding domains of cellobiohydrolase I fromTrichoderma reesei. Protein Sci. 6:294–303.

McCartney L, Gilbert HJ, Bolam DN, Boraston AB, Knox JP. 2004. Glycosidehydrolase carbohydrate-binding modules as molecular probes for the ana-lysis of plant cell wall polymers. Anal Biochem. 326:49–54.

1110 AA Kognole and CM Payne

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018

Page 12: Cello-oligomer-binding dynamics and directionality in family 4 ...

Mingardon F, Chanal A, Tardif C, Bayer EA, Fierobe HP. 2007. Exploration ofnew geometries in cellulosome-like chimeras. Appl Environ Microbiol.73:7138–7149.

Myers J, Grothaus G, Narayanan S, Onufriev A. 2006. A simple clusteringalgorithm can be accurate enough for use in calculations of pKs in macro-molecules. Proteins. 63:928–938.

Nakazawa H, Kim DM, Matsuyama T, Ishida N, Ikeuchi A, Ishigaki Y,Kumagai I, Umetsu M. 2013. Hybrid nanocellulosome design from cellu-lase modules on nanoparticles: Synergistic effect of catalytically divergentcellulase modules on cellulose degradation activity. ACS Catal.3:1342–1348.

Onufriev A, Anandakrishnan R, Aguilar B, Gordon J,Myers J, Folta T, Shoja V,Heath L, Shaffer C, Back G, et al. H++, v. 3.1. http://biophysics.cs.vt.edu/H++ (15 July 2015, date last accessed). Virginia Tech, 2013.

Payne CM, Knott BC, Mayes HB, Hansson H, Himmel ME, Sandgren M,Ståhlberg J, Beckham GT. 2015. Fungal cellulases. Chem Rev. 115:1308–1448.

Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C,Skeel RD, Kale L, Schulten K. 2005. Scalable molecular dynamics withNAMD. J Comp Chem. 26:1781–1802.

Richardson JS. 1981. The anatomy and taxonomy of protein structure. AdvProtein Chem. 34:167–339.

Schrodinger L. 2010. The PyMOL molecular graphics system, version 1.1r1.http://sourceforge.net/projects/pymol/ (15 July 2015, date last accessed).

Schwarz WH. 2001. The cellulosome and cellulose degradation by anaerobicbacteria. Appl Microbiol Biot. 56:634–649.

Shirts MR, Chodera JD. 2008. Statistically optimal analysis of samples frommultiple equilibrium states. J Chem Phys. 129:1–10.

Shoemaker S, Schweickart V, Ladner M, Gelfand D, Kwok S, Myambo K,Innis M. 1983. Molecular-cloning of exo-cellobiohydrolase-I derivedfrom Trichoderma reesei strain-L27. Nat Biotechnol. 1:691–696.

Shoseyov O, Shani Z, Levy I. 2006. Carbohydrate binding modules: Bio-chemical properties and novel applications. Microbiol Mol Biol Rev.70:283–295.

Sugiyama J, Chanzy H, Maret G. 1992. Orientation of cellulose microcrystalsby strong magnetic-fields. Macromolecules. 25:4232–4234.

Szabo L, Jamal S, Xie H, Charnock SJ, BolamDN, Gilbert HJ, Davies GJ. 2001.Structure of a family 15 carbohydrate-binding module in complex with xy-lopentaose. Evidence that xylan binds in an approximate 3-fold helical con-formation. J Biol Chem. 276:49061–49065.

Teeri TT, Lehtovaara P, Kauppinen S, Salovuori I, Knowles J. 1987. Homolo-gous domains in Trichoderma reesei cellulolytic enzymes – Gene sequenceand expression of cellobiohydrolase-II. Gene. 51:43–52.

Tomme P, Boraston A, McLean B, Kormos J, Creagh AL, Sturch K, Gilkes NR,Haynes CA, Warren RAJ, Kilburn DG. 1998. Characterization andaffinity applications of cellulose-binding domains. J Chromatogr B.715:283–296.

Tomme P, Creagh AL, Kilburn DG, Haynes CA. 1996. Interaction of polysac-charides with the N-terminal cellulose-binding domain of Cellulomonasfimi CenC. 1. Binding specificity and calorimetric analysis. Biochemistry.35:13885–13894.

Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A,Hazlewood V, Lathrop S, Lifka D, Peterson GD, et al. 2014. XSEDE: Ac-celerating scientific discovery. Comput Sci Eng. 16:62–74.

Wang J, Deng Y, Roux B. 2006. Absolute binding free energy calculations usingmolecular dynamics simulations with restraining potentials. Biophys J.91:2798–2814.

Wilson DB. 2009. Cellulases and biofuels. Curr Opin Biotech. 20:295–299.Wimmerova M, Kozmon S, Necasova I, Mishra SK, Komarek J, Koca J. 2012.

Stacking interactions between carbohydrate and protein quantified by com-bination of theoretical and experimental methods. PLoS ONE. 7(10):e46032.

Binding mechanisms of family 4 carbohydrate-binding modules 1111

Downloaded from https://academic.oup.com/glycob/article-abstract/25/10/1100/1988633by gueston 01 March 2018


Recommended