+ All Categories
Home > Documents > Computational design of a leucine-rich repeat protein with a … · Computational design of a...

Computational design of a leucine-rich repeat protein with a … · Computational design of a...

Date post: 01-Nov-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
6
Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämisch a , Ulrich Weininger b , Jonas Martinsson a , Mikael Akke b , and Ingemar André a,1 Departments of a Biochemistry and Structural Biology and b Biophysical Chemistry, Center for Molecular Protein Science, Lund University, SE-221 00 Lund, Sweden Edited by David Baker, University of Washington, Seattle, WA, and approved October 30, 2014 (received for review July 17, 2014) Structure-based protein design offers a possibility of optimizing the overall shape of engineered binding scaffolds to match their targets better. We developed a computational approach for the structure-based design of repeat proteins that allows for adjustment of geometrical features like length, curvature, and helical twist. By combining sequence optimization of existing repeats and de novo design of capping structures, we designed leucine-rich repeats (LRRs) from the ribonuclease inhibitor (RI) family that assemble into structures with a predefined geometry. The repeat proteins were built from self-compatible LRRs that are designed to interact to form highly curved and planar assemblies. We validated the geometrical design approach by engineering a ring structure constructed from 10 self-compatible repeats. Protein design can also be used to increase our structural understanding of repeat proteins. We use our design constructs to demonstrate that buried Cys play a central role for stability and folding cooperativity in RI-type LRR proteins. The computational procedure presented here may be used to develop repeat proteins with various geometrical shapes for applications where greater control of the interface geometry is desired. binding scaffold | Rosetta | buried cysteines | computational protein design | geometrical design E ngineered protein-binding scaffolds are increasingly used as therapeutics, diagnostic probes, intracellular reporter mole- cules, or fusion domains in protein crystallization (1). Nature provides a large variety of protein recognition scaffolds from which engineered systems could be built. Repeat proteins are used in a wide range of biological processes, including the im- mune response and regulatory cascades (25). They consist of simple, structurally similar building blocks, called repeats, that assemble into elongated tandem arrays (6). Their extended shapes result in proteins with extraordinarily large binding sur- faces, which makes them ideal scaffolds for protein binding. Analogous to antibodies, repeat proteins can be divided into framework residues, which encode stability and structure, and variable positions, which are responsible for protein recognition (7). A striking difference from antibodies is that the global structure can vary considerably between repeat proteins, even within a family. This structural variability suggests that not only the directly interacting residues but also the overall shapes of these proteins are optimized for binding target molecules. Engineered repeat proteins have typically been developed by consensus sequence design, a method where highly conserved sequence positions are identified and scaffolds are built from identical repeats containing the most common residues at those positions (8). Consensus design has been successfully applied to create stable scaffolds from several repeat protein classes (912). However, this approach does not enable the design of binders with predefined shapes. Because the geometrical shape of an assembly is encoded by subtle structural differences between repeats and interrepeat interfaces, a structure-based design approach is required to design the assembly shape rationally. Optimizing the shape complementarity to a target molecule would enable development of scaffolds that are custom-made for their target proteins and could yield enhanced binding properties, such as simultaneous binding to multiple functional sites in a single protein, binding site targeting, or specific recognition of protein oligomers. Leucine-rich repeat (LRR) proteins display a significant varia- tion in shape (13, 14). The repeats in this protein class are typically composed of 2030 residues, and they form helically twisted, solenoid-like structures with a continuous parallel β-sheet on the concave side; they can be elongated or highly curved (2). It has been shown that N- and C-terminal capping structures are crucial for folding and stability of LRR proteins (15, 16). The combination of a stable core of framework positions and variability in overall assembly structure makes LRRs ideal building blocks for the de- velopment of repeat proteins with rationally designed shapes. The feasibility of engineering LRR proteins has been demonstrated with consensus design applied to repeats from the ribonuclease inhibitor (RI) family (17), the nucleotide-binding oligomerization domain family (18), and the variable lymphocyte receptors (VLRs). The latter yielded a protein-binding scaffold named repebodies(19). In this work, we developed a computational approach for the structure-based design of repeat proteins that allows for adjust- ment of geometrical features like length, curvature, and helical twist. We used the method to design a self-compatible LRR that assembles into highly curved and planar repeat proteins. The repeats were designed to allow assembly into a closed-ring structure. Variants with five double repeats and added capping structures produced stable proteins. In the absence of caps, the same repeat variants can dimerize to form complete circles (full- ring structures), thereby verifying the correctly designed geom- etry. The results demonstrate that stable proteins with pre- defined shapes can be built with limited sequence redesign if the conformation of the self-compatible repeat is selected carefully. Additionally, the results highlight the stabilizing effect of buried Significance Repeat proteins are used in nature to bind to proteins and pep- tides. The shape of their binding surfaces can vary substantially, even for proteins within the same family. This variability likely arose because they evolved to match the proteins they interact with geometrically. Repeat proteins are often engineered to de- velop binders specific to new target proteins. It would be highly beneficial to design repeat proteins with predefined geometrical shapes because such a method would enable development of engineered repeat proteins that are shape-optimized to their targets. Here, we demonstrate that repeat proteins with a pre- defined shape can be designed using a computational design method. The approach is exemplified by the design of a protein that forms a ring structure not seen in nature. Author contributions: S.R. and I.A. designed research; S.R., U.W., J.M., M.A., and I.A. performed research; S.R., U.W., M.A., and I.A. analyzed data; and S.R., U.W., M.A., and I.A. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1413638111/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1413638111 PNAS | December 16, 2014 | vol. 111 | no. 50 | 1787517880 BIOPHYSICS AND COMPUTATIONAL BIOLOGY Downloaded by guest on April 17, 2021
Transcript
Page 1: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

Computational design of a leucine-rich repeat proteinwith a predefined geometrySebastian Rämischa, Ulrich Weiningerb, Jonas Martinssona, Mikael Akkeb, and Ingemar Andréa,1

Departments of aBiochemistry and Structural Biology and bBiophysical Chemistry, Center for Molecular Protein Science, Lund University,SE-221 00 Lund, Sweden

Edited by David Baker, University of Washington, Seattle, WA, and approved October 30, 2014 (received for review July 17, 2014)

Structure-based protein design offers a possibility of optimizingthe overall shape of engineered binding scaffolds to match theirtargets better. We developed a computational approach for thestructure-based design of repeat proteins that allows for adjustmentof geometrical features like length, curvature, and helical twist. Bycombining sequence optimization of existing repeats and de novodesign of capping structures, we designed leucine-rich repeats (LRRs)from the ribonuclease inhibitor (RI) family that assemble intostructures with a predefined geometry. The repeat proteins werebuilt from self-compatible LRRs that are designed to interact to formhighly curved and planar assemblies. We validated the geometricaldesign approach by engineering a ring structure constructed from 10self-compatible repeats. Protein design can also be used to increaseour structural understanding of repeat proteins. We use our designconstructs to demonstrate that buried Cys play a central role forstability and folding cooperativity in RI-type LRR proteins. Thecomputational procedure presented here may be used to developrepeat proteins with various geometrical shapes for applicationswhere greater control of the interface geometry is desired.

binding scaffold | Rosetta | buried cysteines |computational protein design | geometrical design

Engineered protein-binding scaffolds are increasingly used astherapeutics, diagnostic probes, intracellular reporter mole-

cules, or fusion domains in protein crystallization (1). Natureprovides a large variety of protein recognition scaffolds fromwhich engineered systems could be built. Repeat proteins areused in a wide range of biological processes, including the im-mune response and regulatory cascades (2–5). They consist ofsimple, structurally similar building blocks, called repeats, thatassemble into elongated tandem arrays (6). Their extendedshapes result in proteins with extraordinarily large binding sur-faces, which makes them ideal scaffolds for protein binding.Analogous to antibodies, repeat proteins can be divided intoframework residues, which encode stability and structure, andvariable positions, which are responsible for protein recognition(7). A striking difference from antibodies is that the globalstructure can vary considerably between repeat proteins, evenwithin a family. This structural variability suggests that not onlythe directly interacting residues but also the overall shapes ofthese proteins are optimized for binding target molecules.Engineered repeat proteins have typically been developed by

consensus sequence design, a method where highly conservedsequence positions are identified and scaffolds are built fromidentical repeats containing the most common residues at thosepositions (8). Consensus design has been successfully applied tocreate stable scaffolds from several repeat protein classes (9–12).However, this approach does not enable the design of binders withpredefined shapes. Because the geometrical shape of an assembly isencoded by subtle structural differences between repeats andinterrepeat interfaces, a structure-based design approach is requiredto design the assembly shape rationally. Optimizing the shapecomplementarity to a target molecule would enable development ofscaffolds that are custom-made for their target proteins and couldyield enhanced binding properties, such as simultaneous binding to

multiple functional sites in a single protein, binding site targeting, orspecific recognition of protein oligomers.Leucine-rich repeat (LRR) proteins display a significant varia-

tion in shape (13, 14). The repeats in this protein class are typicallycomposed of 20–30 residues, and they form helically twisted,solenoid-like structures with a continuous parallel β-sheet onthe concave side; they can be elongated or highly curved (2). It hasbeen shown that N- and C-terminal capping structures are crucialfor folding and stability of LRR proteins (15, 16). The combinationof a stable core of framework positions and variability in overallassembly structure makes LRRs ideal building blocks for the de-velopment of repeat proteins with rationally designed shapes. Thefeasibility of engineering LRR proteins has been demonstrated withconsensus design applied to repeats from the ribonuclease inhibitor(RI) family (17), the nucleotide-binding oligomerization domainfamily (18), and the variable lymphocyte receptors (VLRs). Thelatter yielded a protein-binding scaffold named “repebodies” (19).In this work, we developed a computational approach for the

structure-based design of repeat proteins that allows for adjust-ment of geometrical features like length, curvature, and helicaltwist. We used the method to design a self-compatible LRR thatassembles into highly curved and planar repeat proteins. Therepeats were designed to allow assembly into a closed-ringstructure. Variants with five double repeats and added cappingstructures produced stable proteins. In the absence of caps, thesame repeat variants can dimerize to form complete circles (full-ring structures), thereby verifying the correctly designed geom-etry. The results demonstrate that stable proteins with pre-defined shapes can be built with limited sequence redesign if theconformation of the self-compatible repeat is selected carefully.Additionally, the results highlight the stabilizing effect of buried

Significance

Repeat proteins are used in nature to bind to proteins and pep-tides. The shape of their binding surfaces can vary substantially,even for proteins within the same family. This variability likelyarose because they evolved to match the proteins they interactwith geometrically. Repeat proteins are often engineered to de-velop binders specific to new target proteins. It would be highlybeneficial to design repeat proteins with predefined geometricalshapes because such a method would enable development ofengineered repeat proteins that are shape-optimized to theirtargets. Here, we demonstrate that repeat proteins with a pre-defined shape can be designed using a computational designmethod. The approach is exemplified by the design of a proteinthat forms a ring structure not seen in nature.

Author contributions: S.R. and I.A. designed research; S.R., U.W., J.M., M.A., and I.A. performedresearch; S.R., U.W.,M.A., and I.A. analyzed data; and S.R., U.W., M.A., and I.A. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1413638111/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1413638111 PNAS | December 16, 2014 | vol. 111 | no. 50 | 17875–17880

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1

Page 2: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

Cys in the core of RI-type LRR proteins and address the role ofcapping motifs in the folding of repeat proteins.

ResultsTo develop an approach for the design of repeat proteins witha defined geometry, we started from the following consid-erations: Repeat proteins can be described as arrays of repeatingstructural units. However, there are subtle but important dif-ferences between the conformations of individual repeats thatencode the compatibility with neighboring repeats as well as theoverall shape of the protein. Features like buried hydrogenbonds (e.g., in Asn and Cys ladders, in structural water mole-cules, in contacts formed between interacting loop segments)serve as specificity elements that define the relative orientationof neighboring repeats. Thus, the central question when de-signing shape-optimized repeat proteins is how to engineer thesespecificity elements accurately. Because nature has already de-veloped a variety of specific interaction geometries, we attemptto adopt these detailed features from repeats with a knownstructure. In structures of known LRR proteins, the curvature-defining angles between neighboring repeats range from −0.5° to37.2° and helical twist angles are between −11.2° and 10.5°. Thespace of available conformations for repeats within a certainfamily is quite limited and can be sampled from the structures ofintact repeat proteins. Because each protein contains manyrepeats, a large number of possible backbone conformations canbe assembled from a small set of protein structures.We developed a computational design method where repeat

proteins with predefined shapes are assembled from structur-ally compatible building blocks taken from crystal structures.Subsequently, sequence redesign is used to optimize self-compatibility. A summary of the procedure is shown in Fig. 1A;it consists of the following steps:

i) The desired geometry of the protein is defined.ii) A library of structures of individual repeats is compiled from

crystal structures of selected repeat proteins.iii) Repeats that are most likely to assemble with the predefined

geometry are selected from the structural library. In thisstudy, we limit ourselves to symmetrical assembly of singleself-compatible repeats.

iv) Cycles of rigid body docking and computational sequence de-sign are used to optimize the interface between the repeats.

v) The repeats are covalently connected via loops betweenconsecutive repeats.

vi) In most cases, capping repeats have to be added to the N-and C-terminal sides of the repeat protein, either by adopt-ing existing caps or using de novo design.

As a proof of principle, we applied the design protocol tocreate a protein built from a single type of self-compatible LRRthat assembles to form a planar structure with high curvature.We set a challenging design goal by requiring that the interactiongeometry between consecutive repeats would allow for assemblyinto a closed circular molecule. Once self-compatible repeats aredesigned that meet those criteria, planar repeat proteins withvarious number of repeats could be made. These strict geo-metrical constraints make it possible to address a number ofissues. First, the ring form is not dependent on capping repeats.Thus, the question of whether caps are strictly required forfolding can be addressed. Second, a structure composed ofidentical repeating units that does not require caps would be anexcellent system to deconvolute intra- vs. interrepeat energiesin LRR proteins, as previously done using capped consensusankyrin repeat protein (20). Third, formation of a ring is astringent test of our ability to control the interaction geometrybetween repeats precisely. If we could first design a stable cap-ped half-ring, then only a fully planar half-ring with the designed

curvature would dimerize to form a full ring. To the best of ourknowledge, a donut-shaped structure has not been observedamong LRR proteins.

Selection of an Optimal Building Block. For creating a small struc-tural library of repeats, we focused on the RI family because themembers of this family possess a relatively high curvature andcomparably little twist (21). Proteins from the RI family are as-sembled from two alternating 28-aa and 29-aa repeats, referredto as A-type and B-type repeats. The different sequence lengthsresult in subtle conformational differences. A-type and B-typerepeats always come in pairs, which suggests that they are notself-compatible. We initially attempted to design repeat proteinsexclusively from either A or B repeats, but the resulting modelswere poorly packed and had little shape complementarity be-tween repeats. Consequently, we used A + B double repeats asstructural building blocks and compiled a library of sevenmembers from the porcine RI (22) (Fig. 1B).Using symmetrical docking simulations (23, 24), we then identi-

fied the best candidate double repeat and the optimal unit numberfor assembling a ring-shaped structure. By applying different

A B

Fig. 1. Overview of the design procedure. (A) Illustration of the generalworkflow for designing repeat proteins (gray) with a predefined shape,optimized for a target molecule (rose). (B) Summary of the design as done inthis study: Native A + B double repeats were extracted from the RI crystalstructure. Symmetrical docking of each unit (units 1–7) was performed usingfour different symmetries (C8–C12). The thick black line indicates the bestcombination. Sequences of both surface residues (light blue) and core resi-dues (green) were optimized; charged residues are shown in red (negative)and blue (positive). Symmetrical docking verified the planar assembly of 10identically redesigned double repeats.

17876 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al.

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1

Page 3: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

rotational symmetries (C8, C9, C10, C11, and C12), closed planarring-shaped structures with varying numbers of repeating units,and hence curvatures, were generated (Fig. 1B). The quality of anassembly was evaluated by the degree of shape complementarity,the strength of hydrogen bonds between β-strands, the amount ofvoid volume, and the binding energy between repeats, as well asthe distance between the N and C termini of consecutive repeats,which needed to be covalently connected.Best results were obtained for 10 repeating units (C10) with

double repeat number 6 (residues 310–366) as the building block.The crystal structure of the horseshoe-shaped porcine RI has anaccumulated helical twist of 16° and a curvature of 238° over sevendouble repeats. If extended to a complete ring, the two endrepeats would sterically overlap by more than 13 Å and the risealong the helical axis would be 11 Å. Thus, the geometry of thedesign model, with a helical twist of 0° and a total curvature of360° (252° over seven double repeats), differs considerably fromthe native RI protein from which the repeat was adopted.

Sequence Optimization to Increase Self-Complementarity. To mini-mize the energy of the assembly, we performed cycles of sequenceoptimization using Rosetta’s fixed-backbone design algorithm (25)and symmetrical docking. To minimize the risk of aggregation, weenforced a net charge of −3 per double repeat (Fig. 1B). At theconcave side, a repeating electrostatic interaction pattern wasdesigned by selecting Lys, Asn, and Ser in the β-strand of repeat Aand Glu, Ser, and Ser in repeat B. Because the model with thenative core sequence already looked promising, and to avoid therisk of removing critical interactions, we tried to minimize se-quence changes in the core. Fixed-backbone design calculationssuggested that introduction of a Cys residue at position 56 wouldbe greatly stabilizing. Cys is highly conserved at position 56 in RI-type proteins, yet it is missing from the native sequence of the sixthRI double repeat. In the RI protein, five of the seven doublerepeats have a Cys at this position. In our most conservative de-sign, called geometrically designed LRR-1 (gdLRR-1), we adoptedthe core residues from the sixth RI double repeat, except forposition 56, where Thr was replaced by Cys. In both the crystalstructure of RI and our design model, Cys56 mediates buried polarcontacts. In the final models, the N and C termini of consecutiverepeats ended up favorably positioned, so that consecutive repeatscould be covalently connected by reintroducing Thr, which hadbeen removed for docking of unconnected repeats.We constructed a second self-compatible repeat, gdLRR-2,

based on the core sequence of the previously published RI con-sensus design (17). This core resulted in good packing with lowenergy. The consensus sequence lacks Cys56, as well as an addi-tional Cys at position 9, which had been removed for experimentalreasons. Cys9 is highly conserved in RI repeats, more so thanCys56. Based on the high degree of conservation and favorableenergetics, we introduced Cys9 and Cys56 into the consensus coreof gdLRR-2. All designed repeat sequences are shown in Fig. 2.

Addition of Capping Motifs. To be able to produce stable andsoluble repeat proteins with fewer repeats than needed for ringformation, capping structures at the N- and C-terminal ends of

the molecule are required (15, 16, 18). We analyzed the com-plementarity of native caps from RI-type LRR proteins to thedesigned LRRs by rigid-body docking in Rosetta. The C-terminalcap of porcine RI can dock with an almost identical interface asthe one found in native RI, and with similar association energy.Thus, we chose to add the C-terminal 32 aa of porcine RIwithout further modifications. On the N-terminal side, we foundlittle compatibility between the designed proteins and N-capsfrom all RI-type LRR proteins of known structure, somethingthat could not be corrected by redesigning the interface residues.We chose to design N-caps de novo. Using a combination of

interface design using Protein Data Bank ID code 1IO0, and si-multaneously optimizing interface residues within both the cap andthe first internal repeat (Fig. 3A), we obtained a model with in-terface energies similar to those interface energies found in nativeLRRs. Because the first repeat could be either an A-type or B-typerepeat, two N-caps were designed: one coupled to an A-type repeatand one coupled to a B-type repeat. Ab initio folding simulationsof the designed sequences attached to internal repeats revealeda strong convergence toward the designed conformations, as shownin Fig. 3B. For experimental characterization, we combined fiveself-compatible double repeats, based on gdLRR-1 or gdLRR-2,with the different caps (Fig. 3D); the four constructs are referredto as gdLRR-1-A-cap, gdLRR-2-A-cap, gdLRR-1-B-cap, andgdLRR-2-B-cap. For proteins with B-caps, we inserted an addi-tional redesigned N-terminal B-type repeat (Fig. 3D). A struc-tural model for one of them, gdLRR-1-A-cap, is shown in Fig. 3C.

Biophysical Characterization. All four variants could be overex-pressed in Escherichia coli with yields of 10–40 mg of pure pro-tein from the soluble fraction of 1 L of culture volume (Fig. 4A).

Fig. 2. Alignment of internal repeat sequences. Highlighted positions indicate differences between variants; core positions are highlighted in orange, andsurface positions are highlighted in green. Positions that were left at their native identity are marked with an asterisk.

A

D

B C

Fig. 3. De novo design of N-terminal caps. (A) Designed interface betweenthe first internal A-repeat (gray) and the designed N-terminal cap (red). (B)Rosetta energy landscape from ab initio structure prediction of the designedN-terminal sequence. The y axis shows energy of 50,000 models, and the xaxis indicates the Cα rmsd to the designed backbone conformation. (C)Model of the complete gdLRR-1-A-cap (blue), docked C-terminal cap fromProtein Data Bank ID code 1IO0 (red), and de novo designed N-terminalcapping motif. (D) Architecture of the final constructs; residues in the firstinternal repeat were redesigned together with the N-terminal cappingmotif, as indicated by red dots. Gray boxes represent A + B double repeats.

Rämisch et al. PNAS | December 16, 2014 | vol. 111 | no. 50 | 17877

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1

Page 4: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

The purified proteins are highly soluble, can be kept in solutionat room temperature for several weeks, and can be lyophilizedwithout triggering aggregation. CD spectroscopy showed a broadnegative absorption band with a minimum at 221 nm, indicatingmixed α/β secondary structure content (Fig. 4C and Fig. S1).Temperature denaturation followed by CD indicates higherdegrees of cooperativity for the variants with A-caps (Fig. 4C andFig. S1). The temperature denaturation midpoints (Tm) were41 °C for gdLRR-1-A-cap, 45 °C for gdLRR-1-B-cap, 56 °C forgdLRR-2-A-cap, and 58 °C for gdLRR-2-B-cap.More detailed analysis was carried out for one of the constructs,

gdLRR-2-A-cap. FTIR spectroscopy was used to study secondarystructure formation. The second-derivative FTIR spectrum showedcharacteristic bands in the amide-I region that indicate the presenceof both β-strands (1,621 cm−1 and 1,631 cm−1) and the α-helix(1,655 cm−1). As a reference system, we used a highly soluble butunfolded design variant, gdLRR-3-A-cap, which is described inmoredetail below. The second-derivative FTIR spectrum of this proteinshowed a strong signal at 1,648 cm−1, which is indicative of a randomcoil (28) (Fig. 4B). Thus, FTIR analysis demonstrates that gdLRR-2-A-cap is a well-folded protein of mixed α/β secondary structure.To characterize the tertiary structure of the folded state further,

we studied the protein with heteronuclear 1H–13C NMR spec-

troscopy. The dispersion of methyl groups in NMR spectra is anexcellent probe for the presence of tertiary structure. Withoutstructure formation, all methyl groups of the same type cluster atchemical shifts that are characteristic of a random coil. In a foldedprotein, a distinct chemical environment of individual methylgroups leads to different chemical shifts resulting in a dispersion ofsignals. We expressed gdLRR-2-A-cap and, as a reference, theunfolded gdLRR-3-A-cap (see below) with [1-13C]-glucose as thesole carbon source. Expression results in specific labeling of iso-lated sites, including methyl groups (29). In agreement with theFTIR results, the 2D 1H–

13C NMR spectrum of gdLRR-3-A-capis characteristic of a random coil, with no more than 5% foldedconformation (Fig. 4D, blue contours), whereas gdLRR-2-A-capshows methyl signals that are well dispersed, indicating an intacttertiary structure for at least 99% of the ensemble (Fig. 4D, greencontours). We further compared the experimental 1H–

13C spec-trum of gdLRR-2-A-cap with a theoretical spectrum, calculatedfor the Rosetta model using SHIFTX2 (30) (Fig. S2). In general,the experimental chemical shift dispersion compares favorablywith the predicted spectrum, although a limited number of well-dispersed predicted signals from residues in, or near, the capregions are missing in the experimental spectrum. By contrast, thespectrum predicted for an intrinsically disordered protein of thesame sequence, calculated using the ncIDP program (31), showsno similarity to the experimental spectrum (Fig. S2 B–D). Takentogether, these results firmly support the conclusion that gdLRR-2-A-cap forms a well-defined structure in solution.To investigate the extent of structural flexibility of gdLRR-2-

A-cap, we carried out NMR relaxation dispersion experiments,

which probe conformational exchange on the millisecond timescale. The results reveal that only a few methyl groups displaya significant dispersion in their relaxation profiles, whereas thelarge majority of residues have flat profiles (Fig. S3). The absenceof relaxation dispersion implies that the particular residue does notexchange between alternative conformations, or that the chemicalshifts are identical in the different states. Conformational exchangeis seen for two Leu residues and two of four Tyr residues in theprotein (Fig. S3). Two Tyr residues are located in each of the caps;thus, at least one of the caps samples an alternative, high-energystate with a relative population of ∼3%. For most methyl groupsthat do not show exchange, the chemical shift is clearly indicative ofa well-structured environment. Therefore, we conclude that thecore of the protein is not in exchange with an unfolded species onthe millisecond time scale.

Dimerization of gdLRR-2.After developing a stable assembly of self-compatible repeats with caps, we wanted to determine whether theinteraction geometry of the repeat was correctly encoded. In theabsence of both caps, only a structure with no twist and the correctcurvature is expected to form a dimeric complex of two five-double repeat half-rings. We created a construct consisting ex-clusively of five gdLRR-2 double repeats for experimental testing.CD and FTIR spectra of this protein were similar to the cappedvariants (gdLRR-2-A-cap and gdLRR-2-B-cap) (Fig. 5 A and B).Dynamic light scattering (DLS) experiments suggested a diameterof the purified protein of ∼80 Å, which is very close to the longestinteratomic distance within the model of the ring (77 Å). Ana-lytical ultracentrifugation showed dimers, but no trace of mono-mers in solution (Fig. S4). To validate the structural modelfurther, we studied the dimers with negative-stain EM. As shownin Fig. 5C, the micrographs demonstrate that the dimers formdonut-shaped structures with an outer diameter of ∼80 Å and aninner diameter of 20–25 Å. These dimensions are in agreementwith the results from DLS and the structural model (Fig. 5D).Temperature denaturation followed by CD revealed an apparent

two-state behavior with a high degree of unfolding cooperativity(Fig. 5A). The dimer refolds upon cooling (Fig. S5) and has a Tm of86 °C, which is significantly higher than the Tm of the capped var-iants. This result suggests that disassembly and unfolding happensimultaneously; dissociation into stable monomers below 86 °C isunlikely because the capped monomer starts to unfold above 60 °Cand uncapped monomers are expected to be even less stable. Thesuccessful formation of a planar ring structure with the designedcurvature does not exclude that the capped variants adopt a differ-ent geometry. However, because the capped monomers have a well-defined structure and the closed-ring structure of the noncappeddimer is very stable, we believe that energetically costly large-scalerearrangements upon dimerization are unlikely.

Cys56 Stabilizes the Folded State of a Designed LRR. The successfuldesigns were created by introduction of Cys into a conservativelyredesigned core (Fig. 6A). However, we wanted to investigate

A

B

C DFig. 4. Biophysical characterization of gdLRR-2-A-cap. (A) Overexpression of gdLRR-4-A-cap. P, pellet ofinsoluble fraction; S, soluble fraction. (B) Second-derivative FTIR spectra of gdLRR-2-A-cap (green) andgdLRR-3-A-cap (blue). Greek letters indicate the cor-responding secondary structure. (C, Top) Far-UV CDspectrum of gdLRR-2-A-cap. (C, Bottom) Temperaturedenaturation monitored by ellipticity at 222 nm. Dotsindicate measured values, and the solid line shows a fitto a two-state model. (D) Methyl regions of the 1H–13Cheteronuclear single quantum coherence spectra ofgdLRR-2-A-cap (green) and gdLRR-3-A-cap (blue).

17878 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al.

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1

Page 5: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

whether a stable and folded repeat protein could be designedwithout buried Cys. To this end, we carried out fixed-backbonedesign of the core, while disallowing Cys. Because the core ofthe RI protein is poorly packed, we attempted to stabilize thedesigned repeats by adding more hydrophobic residues. Weconstructed two new sequence variants (gdLRR-3 and gdLRR-4;Fig. 2) with more favorable energies compared with gdLRR-1and gdLRR-2. In both constructs, the core Cys9 and Cys56 werereplaced by Thr and Ile, respectively. The hydrophobic Ile fitsvery well into the cleft between the descending loops of twoadjacent A-type and B-type repeats and forms favorable contactsto other core residues. For experimental testing, we constructedfour capped constructs with five self-compatible double repeats,named gdLRR-3-A-cap, gdLRR-3-B-cap, gdLRR-4-A-cap, andgdLRR-4-B-cap.All four constructs could be expressed and purified to high

yields. Analysis by CD spectroscopy indicated that all constructswere unfolded (Fig. S6). FTIR and NMR spectra of gdLRR-3-A-cap confirmed a lack of secondary and tertiary structure (Fig. 3 Band D). The gdLRR-2 and gdLRR-3 repeats differ only at threecore positions, two of which are positions 9 and 56. We spec-ulated that the constructs based on gdLRR-3 were unfoldeddue to the absence of either one or both core Cys. To test thishypothesis, we mutated Ile56 to Cys in gdLRR-3-A-cap. Theresulting protein, named gdLRR-3-A-cap-I56C, has a CD spec-trum similar to the CD spectra of the folded variants (Fig. 6B).Thus, a single mutation in each repeat seems to induce sec-ondary structure formation, which demonstrates the stabilizingeffect of Cys56.

DiscussionA key for designing shape-optimized repeat proteins is to controlaccurately the subtle structural features that encode the overallshape of these molecules. Our approach is to adopt these de-tailed features from known repeat structures. The benefit of thisconservative approach to shape-controlled engineering of repeatproteins is that it limits the need for precise design of backbone

conformations. With limited redesign of the core, the backboneconformation of the designed repeat is most likely close to thetemplate, and hence the design model. The experimental analysisdemonstrates that changing a single amino acid in the core se-quence of the template repeat was sufficient to obtain a foldedand stable protein, gdLRR-1. Moreover, limited redesign signif-icantly increased stability and folding cooperativity (gdLRR-2).Both results highlight that the crucial step in designing RI repeatproteins was the selection of an optimal building block, ratherthan introducing new interactions between building blocks byextensive redesign of the core.At some positions, there is more tolerance for introducing

amino acids that are rare within a particular LRR family, likeVal20 and Ile44 in gdLRR-1 or Ala11 in gdLRR-3-C56. How-ever, highly conserved positions are likely to be more sensitive tosubstitutions, for instance, Cys9. The structural role of Cys atposition 9 of LRR proteins (C10 in the most common numberingscheme) has been extensively described (32, 33). It is part of anAsn/Cys ladder, which is located immediately following theβ-strand, where it connects neighboring repeats via side-chain–to–main-chain hydrogen bonds. When comparing gdLRR-3-A-cap-C56, which lacks Cys9, and gdLRR-2-A-cap, which doeshave Cys9, there is a considerable difference in unfoldingcooperativity between these constructs. The increased coopera-tivity of gdLRR-2-A-cap gives some experimental evidence thatthe Asn/Cys ladder may be important for encoding foldingcooperativity in RI-type LRR proteins.Less attention has been paid to Cys on the descending loops of

A-type repeats, where a cleft to the preceding B-type repeat isfound. In RI, as well as in our design models, Cys56 appears tostabilize the conformation of the descending loop by forming hy-drogen bonds either to the backbone nitrogen at position i + 2 orto carbonyl oxygen in the neighboring repeat’s helix cap (Fig. 6A).Polar interactions at this site are likely to be important, becauseCys56 is substituted by Thr or Ser in some repeats. We observeda significant increase in secondary structure content and unfoldingcooperativity by mutating Ile56 to Cys in gdLRR-3. This obser-vation is an experimental indication that polar interactions at thissite are essential for stability. It has been suggested that de-stabilization of single repeats within an assembly is essential forthe function of repeat proteins by introducing flexibility into theprotein (34–36). Perhaps this need for selective instability explainsthe absence of highly conserved Cys in some native repeats likethe template repeat used in this study. Selective mutations of coreCys may provide a means for selectively destabilizing specificrepeats in designed binder proteins.Two previously reported consensus sequence designs of LRR

proteins were based on VLR (19) and RI-type repeats (17).VLRs consist only of 24 residues, which does not allow for for-mation of canonical α-helices on the convex side of the

A B

C D

Fig. 5. Structure analysis of the cap-deficient gdLRR-2. (A) Thermalunfolding monitored by ellipticity at 222 nm. Dots represent the data, andthe solid line shows the fitted function. (Inset) Far-UV CD spectra during thethermal melting experiment; colors indicate the temperature from dark blue(15 °C) to dark red (95 °C). (B) Second-derivative FTIR spectrum. Greek lettersindicate secondary structure. (C) Negative-stain EM images at a magnifica-tion of 55,000×. (D) Model of the assembled gdLRR-2 dimer; one monomer iscolored in blue, and one monomer is colored in red.

A B

Fig. 6. Roles of buried Cys at positions 9 and 56. (A) Structural represen-tation of Cys9 (Top) and Cys56 (Bottom) in the model of gdLRR-2. Yellowdashed lines indicate N. . .HO and N. . .H-N H-bonds, pink dashed lines indicateCys H-bonds. (B) Comparison of CD spectra (Left) and temperature de-naturation curves followed by ellipticity at 222 nm (Right) of gdLRR-3-A-cap(blue) and gdLRR-3-A-cap-I56C (green).

Rämisch et al. PNAS | December 16, 2014 | vol. 111 | no. 50 | 17879

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1

Page 6: Computational design of a leucine-rich repeat protein with a … · Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb,

horseshoe. The shorter repeat sequence results in a protein withless void volume, which may explain the higher thermal stabilityof the designed VLR compared with RI repeats. The RI con-sensus design unfolds with low cooperativity (17). There isa strong similarity between the published CD spectra of the five-repeat consensus design and the nonfolded gdLRR-3 andgdLRR-4 variants. The folded gdLRR-2 variants differ from theconsensus sequence design at two core positions, Cys9 andCys56. Our results suggest that introduction of these residuesinto the consensus sequence design may improve stability andfolding cooperativity. However, the different capping repeatsmay also play a role in explaining the different biophysicalproperties compared to our constructs.It has been shown that the LRR protein internalin-B has a po-

larized folding pathway, starting from the N-terminal cappingmotif (15). A capless ring could be used to investigate whethercaps are crucial for initiating folding or merely for stabilizing thefolded state. Here, we show that a dimeric full-ring construct isstably folded without caps. Although shielding the hydrophobiccore is almost certainly crucial for stability, specific capping motifsmay not be required for initiating folding of LRR proteins.Here, we present a general design approach for development

of geometrically optimized repeat proteins built from self-compatible repeats. The proteins developed in this study weredesigned to assemble with a repeat-repeat interface geometrydefined by cyclical symmetry. However, a wide variety of shapescan be designed by applying helical symmetry, which can readilybe implemented into the design process. The design methodcould also be extended to include multiple different building

blocks for cases where single self-compatible repeats are notsufficient to produce a desired shape. However, not all shapescould be designed by this approach because of the finite vari-ability in repeat conformations. Furthermore, repeats with a lowamount of well-defined secondary structure elements are notamenable to fixed-backbone design, because sequence changesare likely to cause structural changes in loop regions. Never-theless, there should be a sufficient number of structured repeatsto enable the design of a wide range of shape-optimized proteinbinders and biomaterials.

Materials and MethodsAll genes were codon-optimized for expression in E. coli and synthesized byGenscript. Proteins were expressed in LB for 4 h at 37 °C and purified by His-tag affinity chromatography, His-tag removal by tobacco etch virus proteasecleavage, and gel filtration. Isotopically labeled proteins were expressed inM9 minimal medium supplemented with [1-13C]-glucose. Fixed-backbonedesign and symmetrical docking simulations were performed usingRosetta++ and Rosetta3. N-caps were designed using RosettaScripts andRosettaRemodel, and convergence of the designed sequence toward the de-sign model was tested by ab initio structure prediction in Rosetta3. Curvaturesand helical twists were calculated using Angulator (37). Detailed descriptionsof all methods and all sequences are given in SI Materials and Methods.

ACKNOWLEDGMENTS. We thank Andreas Barth (Stockholm University) forhelp with FTIR, the Lund Protein Production Platform (Lund University,www.lu.se/lp3) for protein production, Eva-Christina Ahlgren and GunnelKarlsson for help with EM, and Sarel Fleischman for valuable comments onthe manuscript. The work was supported by the Swedish Research Council(Vetenskapsrådet), Crafoord Foundation, and Defense Advanced ResearchProjects Agency (Subcontract 747458).

1. Boersma YL, Plückthun A (2011) DARPins and other repeat protein scaffolds: Ad-vances in engineering and applications. Curr Opin Biotechnol 22(6):849–857.

2. Kobe B, Kajava AV (2001) The leucine-rich repeat as a protein recognition motif. CurrOpin Struct Biol 11(6):725–732.

3. D’Andrea LD, Regan L (2003) TPR proteins: The versatile helix. Trends Biochem Sci28(12):655–662.

4. Li J, Mahajan A, Tsai M-D (2006) Ankyrin repeat: A unique motif mediating protein-protein interactions. Biochemistry 45(51):15168–15178.

5. Al-Khodor S, Price CT, Kalia A, Abu Kwaik Y (2010) Functional diversity of ankyrinrepeats in microbial proteins. Trends Microbiol 18(3):132–139.

6. Kajava AV (2012) Tandem repeats in proteins: From sequence to structure. J StructBiol 179(3):279–288.

7. Main ERG, Phillips JJ, Millership C (2013) Repeat protein engineering: Creatingfunctional nanostructures/biomaterials from modular building blocks. Biochem SocTrans 41(5):1152–1158.

8. Forrer P, Stumpp MT, Binz HK, Plückthun A (2003) A novel strategy to design bindingmolecules harnessing the modular nature of repeat proteins. FEBS Lett 539(1-3):2–6.

9. Main ERG, Xiong Y, Cocco MJ, D’Andrea L, Regan L (2003) Design of stable α-helicalarrays from an idealized TPR motif. Structure 11(5):497–508.

10. Kohl A, et al. (2003) Designed to be stable: Crystal structure of a consensus ankyrinrepeat protein. Proc Natl Acad Sci USA 100(4):1700–1705.

11. Binz HK, Stumpp MT, Forrer P, Amstutz P, Plückthun A (2003) Designing repeatproteins: Well-expressed, soluble and stable proteins from combinatorial libraries ofconsensus ankyrin repeat proteins. J Mol Biol 332(2):489–503.

12. Urvoas A, et al. (2010) Design, production and molecular structure of a new family ofartificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-likerepeats. J Mol Biol 404(2):307–327.

13. Enkhbayar P, Kamiya M, Osaki M, Matsumoto T, Matsushima N (2004) Structuralprinciples of leucine-rich repeat (LRR) proteins. Proteins 54(3):394–403.

14. Kajava AV (1998) Structural diversity of leucine-rich repeat proteins. J Mol Biol 277(3):519–527.

15. Courtemanche N, Barrick D (2008) The leucine-rich repeat domain of Internalin Bfolds along a polarized N-terminal pathway. Structure 16(5):705–714.

16. Dao TP, Majumdar A, Barrick D (2014) Capping motifs stabilize the leucine-rich repeatprotein PP32 and rigidify adjacent repeats. Protein Sci 23(6):801–11.

17. Stumpp MT, Forrer P, Binz HK, Plückthun A (2003) Designing repeat proteins: Mod-ular leucine-rich repeat protein libraries based on the mammalian ribonuclease in-hibitor family. J Mol Biol 332(2):471–487.

18. Parker R, Mercedes-Camacho A, Grove TZ (2014) Consensus design of a NOD receptorleucine rich repeat domain with binding affinity for a muramyl dipeptide, a bacterialcell wall fragment. Protein Sci 23(6):790–800.

19. Lee S-C, et al. (2012) Design of a binding scaffold based on variable lymphocyte re-ceptors of jawless vertebrates by module engineering. Proc Natl Acad Sci USA 109(9):3299–3304.

20. Aksel T, Majumdar A, Barrick D (2011) The contribution of entropy, enthalpy, and

hydrophobic desolvation to cooperativity in repeat-protein folding. Structure 19(3):

349–360.21. Kobe B, Deisenhofer J (1993) Crystal structure of porcine ribonuclease inhibitor,

a protein with leucine-rich repeats. Nature 366(6457):751–756.22. Kobe B, Deisenhofer J (1996) Mechanism of ribonuclease inhibition by ribonuclease

inhibitor protein based on the crystal structure of its complex with ribonuclease A.

J Mol Biol 264(5):1028–1043.23. André I, Bradley P, Wang C, Baker D (2007) Prediction of the structure of symmetrical

protein assemblies. Proc Natl Acad Sci USA 104(45):17656–17661.24. DiMaio F, Leaver-Fay A, Bradley P, Baker D, André I (2011) Modeling symmetric

macromolecular structures in Rosetta3. PLoS ONE 6(6):e20450.25. Das R, Baker D (2008) Macromolecular modeling with rosetta. Annu Rev Biochem 77:

363–382.26. Fleishman SJ, et al. (2011) RosettaScripts: A scripting language interface to the Ro-

setta macromolecular modeling suite. PLoS ONE 6(6):e20161.27. Huang P-S, et al. (2011) RosettaRemodel: A generalized framework for flexible

backbone protein design. PLoS ONE 6(8):e24109.28. Kong J, Yu S (2007) Fourier transform infrared spectroscopic analysis of protein sec-

ondary structures. Acta Biochim Biophys Sin (Shanghai) 39(8):549–559.29. Lundström P, et al. (2007) Fractional 13C enrichment of isolated carbons using [1-13C]-

or [2-13C]-glucose facilitates the accurate measurement of dynamics at backbone

Calpha and side-chain methyl positions in proteins. J Biomol NMR 38(3):199–212.30. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: Significantly improved pro-

tein chemical shift prediction. J Biomol NMR 50(1):43–57.31. Tamiola K, Acar B, Mulder FAA (2010) Sequence-specific random coil chemical shifts

of intrinsically disordered proteins. J Am Chem Soc 132(51):18000–18003.32. Kobe B, Deisenhofer J (1995) Proteins with leucine-rich repeats. Curr Opin Struct Biol

5(3):409–416.33. Buchanan SGSC, Gay NJ (1996) Structural and functional diversity in the leucine-rich

repeat family of proteins. Prog Biophys Mol Biol 65(1-2):1–44.34. Truhlar SME, Torpey JW, Komives EA (2006) Regions of IkappaBalpha that are critical

for its inhibition of NF-kappaB.DNA interaction fold upon binding to NF-kappaB. Proc

Natl Acad Sci USA 103(50):18951–18956.35. Croy CH, Bergqvist S, Huxford T, Ghosh G, Komives EA (2004) Biophysical character-

ization of the free IkappaBalpha ankyrin repeat domain in solution. Protein Sci 13(7):

1767–1777.36. Kloss E, Courtemanche N, Barrick D (2008) Repeat-protein folding: New insights into

origins of cooperativity, stability, and topology. Arch Biochem Biophys 469(1):83–99.37. Bublitz M, et al. (2008) Crystal structure and standardized geometric analysis of InlJ,

a listerial virulence factor and leucine-rich repeat protein with a novel cysteine ladder.

J Mol Biol 378(1):87–96.

17880 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al.

Dow

nloa

ded

by g

uest

on

Apr

il 17

, 202

1


Recommended