+ All Categories
Home > Documents > Protein-DNA interactions: a structural analysis

Protein-DNA interactions: a structural analysis

Date post: 10-Oct-2016
Category:
Upload: susan-jones
View: 224 times
Download: 7 times
Share this document with a friend
20
Protein-DNA Interactions: A Structural Analysis Susan Jones 1 *, Paul van Heyningen 1 , Helen M. Berman 2 and Janet M. Thornton 1,3 1 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College Gower Street, London WC1E 6BT, England 2 Department of Chemistry Rutgers, The State University Piscataway, NJ 08855-0939, USA 3 Department of Crystallography, Birkbeck College, Malet Street, London WC1 7HX, England A detailed analysis of the DNA-binding sites of 26 proteins is presented using data from the Nucleic Acid Database (NDB) and the Protein Data Bank (PDB). Chemical and physical properties of the protein-DNA inter- face, such as polarity, size, shape, and packing, were analysed. The DNA-binding sites shared common features, comprising many discon- tinuous sequence segments forming hydrophilic surfaces capable of direct and water-mediated hydrogen bonds. These interface sites were com- pared to those of protein-protein binding sites, revealing them to be more polar, with many more intermolecular hydrogen bonds and buried water molecules than the protein-protein interface sites. By looking at the number and positioning of protein residue-DNA base interactions in a series of interaction footprints, three modes of DNA binding were ident- ified (single-headed, double-headed and enveloping). Six of the eight enzymes in the data set bound in the enveloping mode, with the protein presenting a large interface area effectively wrapped around the DNA. A comparison of structural parameters of the DNA revealed that some values for the bound DNA (including twist, slide and roll) were inter- mediate of those observed for the unbound B-DNA and A-DNA. The dis- tortion of bound DNA was evaluated by calculating a root-mean-square deviation on fitting to a canonical B-DNA structure. Major distortions were commonly caused by specific kinks in the DNA sequence, some resulting in the overall bending of the helix. The helix bending affected the dimensions of the grooves in the DNA, allowing the binding of pro- tein elements that would otherwise be unable to make contact. From this structural analysis a preliminary set of rules that govern the bending of the DNA in protein-DNA complexes, are proposed. # 1999 Academic Press Keywords: Protein-DNA complex; motif; binding modes; interface; DNA distortion *Corresponding author Introduction Since the first structure of a DNA molecule from a single crystal was solved in 1980 (Drew et al., 1980), the structures of over 220 DNA molecules complexed to proteins have been determined. The protein-DNA interactions in these complexes have been extensively documented for individual struc- tures and for specific DNA-binding motifs (for a review, see Harrison, 1991). The energetics and mode of interactions have been analysed, high- lighting the importance of hydrogen bonds and non-polar interactions (for a review, see Larson & Verdine, 1996). Distorted DNA structures have also been the subject of extensive research since they were first identified by their slow rate of migration during polyacrylamide gel electrophor- esis. DNA can be curved due to local helix struc- ture without the application of external forces (Trifonov, 1985, 1991; Lilley, 1986; Hagerman, 1990; Dickerson et al., 1996). However, this intrinsic curvature is distinct from DNA bending in which the DNA structure is forcibly distorted by the binding of a protein structure (Olson, 1996; Olson & Zhurkin, 1996; Olson et al., 1988). Recent work by Dickerson (1998) comprehensively examined DNA bending in a large data set of protein-DNA complexes using normal vector analysis. Concen- trating exclusively upon the DNA structure this detailed analysis confirmed previous studies E-mail address of the corresponding author: [email protected] Abbreviations used: DNA-BP, DNA-binding protein; ASA, accessible surface area; HTH, helix-turn-helix; HLH, helix-loop-helix. Article No. jmbi.1999.2659 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 287, 877–896 0022-2836/99/150877–20 $30.00/0 # 1999 Academic Press
Transcript
Page 1: Protein-DNA interactions: a structural analysis

Article No. jmbi.1999.2659 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 287, 877±896

Protein-DNA Interactions: A Structural Analysis

Susan Jones1*, Paul van Heyningen1, Helen M. Berman2

and Janet M. Thornton1,3

1Biomolecular Structure andModelling Unit, Department ofBiochemistry and MolecularBiology, University CollegeGower Street, LondonWC1E 6BT, England2Department of ChemistryRutgers, The State UniversityPiscataway, NJ08855-0939, USA3Department ofCrystallography, BirkbeckCollege, Malet Street, London

WC1 7HX, England

lighting the importance of hy

E-mail address of the [email protected]

Abbreviations used: DNA-BP, DNASA, accessible surface area; HTH,HLH, helix-loop-helix.

0022-2836/99/150877±20 $30.00/0

A detailed analysis of the DNA-binding sites of 26 proteins is presentedusing data from the Nucleic Acid Database (NDB) and the Protein DataBank (PDB). Chemical and physical properties of the protein-DNA inter-face, such as polarity, size, shape, and packing, were analysed. TheDNA-binding sites shared common features, comprising many discon-tinuous sequence segments forming hydrophilic surfaces capable of directand water-mediated hydrogen bonds. These interface sites were com-pared to those of protein-protein binding sites, revealing them to bemore polar, with many more intermolecular hydrogen bonds and buriedwater molecules than the protein-protein interface sites. By looking at thenumber and positioning of protein residue-DNA base interactions in aseries of interaction footprints, three modes of DNA binding were ident-i®ed (single-headed, double-headed and enveloping). Six of the eightenzymes in the data set bound in the enveloping mode, with the proteinpresenting a large interface area effectively wrapped around the DNA.

A comparison of structural parameters of the DNA revealed that somevalues for the bound DNA (including twist, slide and roll) were inter-mediate of those observed for the unbound B-DNA and A-DNA. The dis-tortion of bound DNA was evaluated by calculating a root-mean-squaredeviation on ®tting to a canonical B-DNA structure. Major distortionswere commonly caused by speci®c kinks in the DNA sequence, someresulting in the overall bending of the helix. The helix bending affectedthe dimensions of the grooves in the DNA, allowing the binding of pro-tein elements that would otherwise be unable to make contact. From thisstructural analysis a preliminary set of rules that govern the bending ofthe DNA in protein-DNA complexes, are proposed.

# 1999 Academic Press

Keywords: Protein-DNA complex; motif; binding modes; interface; DNA

distortion *Corresponding author

Introduction

Since the ®rst structure of a DNA molecule froma single crystal was solved in 1980 (Drew et al.,1980), the structures of over 220 DNA moleculescomplexed to proteins have been determined. Theprotein-DNA interactions in these complexes havebeen extensively documented for individual struc-tures and for speci®c DNA-binding motifs (for areview, see Harrison, 1991). The energetics andmode of interactions have been analysed, high-

drogen bonds and

ing author:

A-binding protein;helix-turn-helix;

non-polar interactions (for a review, see Larson &Verdine, 1996). Distorted DNA structures havealso been the subject of extensive research sincethey were ®rst identi®ed by their slow rate ofmigration during polyacrylamide gel electrophor-esis. DNA can be curved due to local helix struc-ture without the application of external forces(Trifonov, 1985, 1991; Lilley, 1986; Hagerman,1990; Dickerson et al., 1996). However, this intrinsiccurvature is distinct from DNA bending in whichthe DNA structure is forcibly distorted by thebinding of a protein structure (Olson, 1996; Olson& Zhurkin, 1996; Olson et al., 1988). Recent workby Dickerson (1998) comprehensively examinedDNA bending in a large data set of protein-DNAcomplexes using normal vector analysis. Concen-trating exclusively upon the DNA structure this

detailed analysis con®rmed previous studies

# 1999 Academic Press

Page 2: Protein-DNA interactions: a structural analysis

878 Protein-DNA Interactions

(Hagerman, 1990; Suzuki & Yagi, 1995; Young et al.,1995; Olson et al., 1998) in ®nding that forced bend-ing of DNA by proteins commonly occurs throughspeci®c kinks of the DNA helix, generally at pyri-midine-purine base steps. Hence there is anelement of sequence dependence to the forcedbending of DNA by proteins. The presence ofkinks means that base-pairs are unstacked andoften this is accompanied by an unwinding of thehelix. The values of twist, slide and roll for thebound DNA are intermediate of those for unboundB-DNA and A-DNA. Such difference have beenobserved previously (Travers, 1992; Nekludova &Pabo, 1994; Shakked et al., 1994; Olson & Zhurkin,1996).

The current work brings together analysis ofDNA-binding sites on proteins with a survey ofthe distortions observed in the DNA structuresthey bind. To achieve this, data stored in theNucleic Acid Database (NDB; Berman et al., 1992)is combined with that in the Brookhaven ProteinData Bank (PDB; Bernstein et al., 1977). A detailedcomputational analysis of the DNA-binding sitesof proteins is presented, including their chemicaland physical properties such as polarity, size,shape, and packing. These properties are comparedto those observed previously in protein-proteinbinding sites (Jones & Thornton, 1996). The DNA-binding sites are categorised for symmetry, second-ary structure binding motif, and functional class toexamine and classify different modes of binding.By looking at the gross architecture of the DNA-binding proteins, new modes of interactions areidenti®ed and these are studied in relation to thedistortions observed in the DNA structures. Thedegree of gross distortion observed in DNA struc-tures bound to proteins is quanti®ed by the calcu-lation of root-mean-squared deviations from acanonical B-DNA structure and compared withthose observed in unbound DNA helices. Commonfeatures between binding modes and the distor-tions they induce are summarised.

Results

The analysis presented is based on two data setsselected from the NDB: a data set of 26 protein-DNA complexes (data set A, Table 1) and a dataset of 21 unbound and unmodifed B-DNA struc-tures (data set B, see Materials and Methods).

DNA-binding proteins

A series of protein-DNA interface parameterscalculated for the DNA-binding proteins (DNA-BPs) are shown in Figure 1 and summarised inTable 2. The DNA-BPs buried between 618 AÊ 2 and2832.8 AÊ 2 of their accessible surface area (ASA) inthe interaction. The interface ASA as a percentageof the total protein surface ranged from 5 % to33 %. The large range re¯ects the fact that somestructures included in the data set are single

domains from multi-domain proteins and that

some bind as monomers, whilst others bind asdimers, pseudo-dimers or tetramers with two rec-ognition sites. In general, those proteins that bindthe DNA as a dimeric structure have a larger inter-face contact area than those that bind as mono-mers. The interfaces comprised between two and16 discontinuous sequence segments, with thedimeric protein interfaces being more segmentedthan the monomeric proteins. The DNA-BPs formbetween 0.9 and 2.4 intermolecular hydrogenbonds per 100 AÊ 2 of interface ASA. The number ofbridging water molecules observed showed a simi-larly large variation, from zero to 1.94 per 100 AÊ 2

of interface ASA. The gap volume indeces variedfrom 0.8 to 4.3, with the monomeric proteins hav-ing a more tightly packed protein-DNA interfacethan the dimeric proteins. Hence the monomericDNA-BPs have small and well-packed interactionsites, whilst the dimeric DNA-BPs present largerinteraction sites that are less well packed. A num-ber of differences were observed between thoseproteins that function as enzymes and those thatfunction as transcription factors. The enzymestended to have a larger and more highly segmen-ted protein-DNA interface than the transcriptionfactors. However, there was little differenceobserved between the intermolecular hydrogenbonding, gap volume index and the number ofbridging water molecules.

The analysis of the protein-DNA interfaces in thecurrent work has been conducted on a non-hom-ologous set of proteins, but it is also of interest tosee how the protein interfaces vary within homolo-gous families. Of the 26 non-homologous proteinstructures in the data set, nine are representativesfrom protein families with more than one member.The interface parameters for members of eachfamily were calculated, and the results can beviewed at http://www.biochem.ucl.ac.uk/bsm/DNA/server/family.html. In general, there wasvery little variation observed in the protein inter-faces between members of the same family. Five ofthe nine families (TATA box binding proteins,repressors, catabolic activator proteins, nuclear fac-tors and methyltransferases) show only small vari-ations between family members. The remainingfour families (homeodomains, zinc ®ngers, endo-nucleases, glucocorticoid receptors) showed widervariations, especially in the size of the interfaces.The endonuclease and the glucocorticoid receptorfamilies show wider variations as they bothinclude complexes in which proteins bind non-cog-nate sequences of DNA. Speci®c and non-speci®crecognition in the endonucleases is discussed inmore detail later. The homeodomain familyincludes the structure of Hin recombinase whichhas a fold intermediate of a prototypical helix-turn-helix and a eukaryotic homeodomain. This proteinhas a six amino acid residue carboxyl-terminalpeptide that makes contacts in the minor groove ofthe DNA at the edge of the recombination site.This additional motif (not present in the other

homeodomains) explains the larger interface ASA
Page 3: Protein-DNA interactions: a structural analysis

Table 1. Data set of 26 protein-DNA complexes selected from the NDB (01/05/1997)

PDB ID NDB ID NameResol(AÊ )

No.base-pairs Function

Oligomericstate

DominantDNA-binding

motif

DNA-groovebound Protein symmetry

Bindingmode Reference

1aay PDT039 Zif268 1.60 11 TF 1 ZnCoordinating

M 1-fold E Elrod-Erickson et al. (1996)

1ber PDR023 Catabolic gene activator protein 2.50 32 TF 2 HTH M 2-pseudo DH Parkinson et al. (1996)1bhm PDE020 BamHI endonuclease 2.20 12 Enzyme 2 Loops/other M�m 2-pseudo E Newman et al. (1995)1cma PDR008 Met repressor 2.80 10 TF 4 b-Ribbons M 2-fold DH Somers & Phillips (1992)1d66 PDT003 GAL4 2.70 19 TF 2 Zn

coordinatingM 2-fold DH Marmorstein et al. (1992)

1eri PDE001 EcoRI restriction endonuclease 2.70 14 Enzyme 2 Loops/other M 2-fold E McClarin et al. (1986)1gdt PDE0115 Gamma delta resolvase 3.00 35 Enzyme 2 HTH M�m 2-pseudo DH Yang & Steitz (1995)1hcr PDE009 Hin recombinase 1.80 29 Enzyme 1 HTH M�m 1-fold E Feng et al. (1994)1ign PDT035 Yeast RAP1 2.25 19 TF 1 HTH M�m 1-fold DH KoÈnig et al. (1996)1ihf PDT040 Intergration host factor 2.50 35 TF 2 b-ribbons m 2-pseudo DH Rice et al. (1996)1glu PDRC01 Gluccocorticoid receptor 2.90 19 TF 1 Zn

coordinatingM 2-fold DH Luisi et al. (1991)

1lmb PDR010 Lambda repressor 1.80 21 TF 2 HTH M 2-pseudo DH Beamer & Pabo (1992)1mdy PDT016 MyoD BHLH 2.80 14 TF 2 HLH M 2-pseudo DH Ma et al. (1994)1nfk PDT015 NF-kB 2.30 12 TF 2 Loops/other M 2-pseudo E Ghosh et al. (1995)1par PDR012 Arc repressor 2.60 23 TF 4 b-ribbons M 2-pseudo DH Raumann et al. (1994b)1pdn PDR018 Paired domain 2.50 17 TF 1 HTH M�m 1-fold SH Xu et al. (1995)1pnr PDR020 Purine repressor 2.70 18 TF 2 HTH M 2-fold DH Schumacher et al. (1994)1pue PDT033 PU.1 ETS domain 2.10 16 TF 1 HTH M�m 1-fold SH Kodandapani et al. (1996)1pvi PDE017 PvuII endonuclease 2.80 14 Enzyme 2 b-ribbons M 2-pseudo E Cheng et al. (1994)1rva PDE014 EcoRV endonuclease 2.00 12 Enzyme 2 Loops/other M�m 2-pseudo E Kostrewa & Winkler (1995)1tsr PDR022 P53 tumour supressor 2.00 22 TF 1 Loops/other M�m 1-fold SH Cho et al. (1994)1vas PDE022 T4 endonuclease V 2.75 14 Enzyme 1 Loop/other m 1-fold SH Vassylyev et al. (1995)1ytb PDT012 TATA box binding protein 1.80 14 TF 1 b-ribbons m 2-pseudo internal E Kim et al. (1993)2bop PDV001 Papillomavirus-1 E2 DNA-binding domain 1.70 18 TF 2 Loops/other M 2-fold DH Hegde et al. (1992)2dgc PDT029 GCN4 3.00 18 TF 2 Leu Zipper M 2-fold DH Kellor et al. (1995)3mht PDE0121 HhaI methyltransferase 2.70 13 Enzyme 1 Loops/other M 1-fold E O'Gara et al. (1996)

The data set is non-homologous by protein, and the resolution, function, oligomeric state and details of DNA-binding are listed for each. TF, transcription factor; HTH, helix-turn-helix; HLH,helix-loop-helix; M, major groove, m, minor groove; E, enveloping; DH, double-headed; SH, single-headed.

Page 4: Protein-DNA interactions: a structural analysis

Figure 1. Frequency distributions of interface parameters for protein-DNA complexes compared to protein-protein complexes. Each parameter is de®ned in the legend toTable 2.

Page 5: Protein-DNA interactions: a structural analysis

Table 2. Protein interface properties for a data set of 26 protein-DNA complexes

Protein-DNA complexes Protein-protein complexesMonomeric Dimeric Permanent Non-obligate

Number of examples 9 17 36 23� ASA (AÊ 2)a 1340.3 (406.1) 1716.5 (505.2) 1722 (1085.0) 804 (147.1)� %ASA (AÊ 2)b 16.5 (7.7) 11.8 (4.2) 15.9 (8.6) 12.1 (7.1)Sequence segmentationc 5.3 (3.0) 8.4 (3.4) 5.3 (2.8) 4.8 (2.2)Gap volume indexd 1.9 (1.0) 2.5 (0.7) 2.1 (0.9) 2.7 (0.9)H-bonds (/100 AÊ 2 �ASA)e 1.3 (0.4) 1.4 (0.4) 0.7 (0.5) 1.1 (0.4)Bridging water molecules

(/100 AÊ 2 �ASA)f0.7 (0.6) 0.7 (0.6) 0.3 (0.4) 0.4 (0.4)

% Polarityg 47.4 (5.5) 46.0 (6.6) 35.1 (6.1) 41.2 (7.7)

The interface properties for two types of protein-protein complex (Jones & Thornton, 1996) are shown for comparison. Permanentcomplexes are those in which the components only function together, whilst non-obligate complexes are those in which the compo-nents can exist alone as well as in the complex (Jones & Thornton, 1996).

a �ASA: For the protein-DNA complexes this is the ASA of the protein that is buried on complexation with the DNA. For the pro-tein-protein complexes this is the ASA of one protomer that is buried on complex formation. For hetero-complexes the mean ASAburied by each protomer was calculated. The ASA values were calculated using an implementation of the Lee & Richards (1971)algorithm developed by Hubbard (1992).

b %�ASA: for the protein-DNA complexes this is calculated as:

��ASA�P�=ASA�P�� � 100

where �ASA(P) is the ASA of protein buried on complexation with DNA and ASA(P) is the ASA of uncomplexed protein.For the protein-protein complexes this was calculated as:

��ASA�p1� ��ASA�p2�=ASA�p1� �ASA�p2�� � 100

Where �ASA(p1) is the ASA of ®rst protomer buried on complexation, �ASA(p2) is the ASA of second protomer buried on complexa-tion, ASA(p1) is the ASA of ®rst protomer uncomplexed, and ASA(p2) is the ASA of second protomer uncomplexed.

c Sequence segmentation: the number of sequence segments in the protein interface, de®ned such that interface residues separatedby more than ®ve residues in sequence were de®ned in different segments.

d The gap volume between protein and DNA, or two protein protomers was calculated using the algorithm SURFNET (Las-kowski, 1991). The index is de®ned as:

Gap Index �A� � gap volume between molecules �A3�=interfaceASA �A2��per complex�eHydrogen bonding: the number of inter-molecular hydrogen bonds per 100 AÊ 2 �ASA were calculated using HBPLUS (McDonald

& Thornton, 1994), in which hydrogen bonds are de®ned according to standard geometric criteria.f Bridging water molecules: the number of water molecules that form hydrogen bonds with both parts of a complex were calcu-

lated using HBPLUS (McDonald & Thornton, 1994).g % Polarity: this is de®ned as:

��ASA�polar�=�ASA�P�� � 100

Where �ASA(polar) is the ASA of polar atoms of protein buried on complexation and �ASA(P) is the ASA of protein buried on com-plexation with DNA.

Protein-DNA Interactions 881

observed in this structure compared to the othermembers of this family. The family of zinc ®ngersshows a wide variation in interface size as two ofthe complexes (the ZIF268 zinc ®nger and the YY1zinc ®nger) have three motifs bound to the DNA,and one complex (the Tramtrack protein) has onlytwo motifs bound.

Comparison with protein-protein interactions

The values for the interface parameters calcu-lated for the DNA-BP have previously been calcu-lated for two classes of protein-protein complex(Jones & Thornton, 1996; Table 2). The ®rst, termedpermanent complexes, are those in which the com-ponents only exist within the complex. The second,termed non-obligate complexes, are those in whichthe components can exist both in the complex andin isolation, e.g. enzyme inhibitor complexes (Jones& Thornton, 1996). In general, permanent protein-protein complexes have larger interfaces that are

more hydrophobic and more complementary.

Clearly all the protein-DNA complexes are non-obligate, as both the protein and DNA exist in iso-lation as well as in the complex.

The two protomers of the dimeric DNA-BPscombine to give an interface ASA with the DNAthat is comparable to that observed in a single pro-tomer in permanent protein-protein complexes.The single protomers of the monomeric DNA-BPshave an interface ASA between that observed forsingle protomers in permanent complexes andthose in non-obligate complexes. The percentageASA of the dimeric DNA-BPs is comparable to thatof the non-obligate protein-protein complexes andthe dimeric DNA-BPs to the permanent protein-protein complexes. The mean gap volume index ofthe monomeric DNA-BPs is comparable to that ofthe permanent protein-protein complexes, whilstthe dimeric DNA-BPs have a mean index compar-able to the non-obligate protein-protein complexes.

The most signi®cant differences between pro-tein-DNA complexes and protein-protein com-

plexes were observed in the parameters that relate
Page 6: Protein-DNA interactions: a structural analysis

882 Protein-DNA Interactions

to the polarity of the interactions. The ASA contri-bution of polar atoms to the interface clearly indi-cates that the DNA-BPs have binding sites that arefar more polar than those of protein-protein com-plexes (Figure 1(e)). In addition, a far greater num-ber of inter-molecular hydrogen bonds areobserved in the protein-DNA complexes than boththe permanent and the non-obligate complexes (1.4per 100 AÊ 2 compared to 0.7 and 1.1, respectively;Figure 1(f)). The protein-DNA complexes also hadup to twice the number of bridging water mol-ecules than either of the two types of protein-pro-tein complex (Figure 1(d)). These parameters givean indication of the overall polar nature of the pro-tein-DNA interface. This is further exempli®ed inthe interface residue propensities calculated forthe protein-DNA complexes and compared tothose in the permanent protein-protein complexes(Figure 2). The positively charged arginine residuehas the highest propensity for the protein-DNAinterface, followed by the polar threonine andasparagine residues, and the positively chargedlysine residue. This contrasts to the propensitiescalculated for the permanent protein-protein inter-faces that predominantly feature the more hydro-phobic residues (Jones & Thornton, 1996). Thepolar nature of the protein interface was expectedas it must complement the negative charge on thesurface of the DNA molecule. The negativelycharged aspartic and glutamic acid residues arevery rarely observed in the DNA binding sites, giv-ing low resultant propensities. In our previousanalysis (Jones & Thornton, 1996) the non-obligateprotein-protein interfaces were more polar than the

permanent protein-protein complexes, and the

Figure 2. Histogram of the interface residue propensities cto those for permanent protein-protein complex (Jones & Thooccurs more frequently in the interface than on the protein sthe Fauchere & Pliska (1983) hydrophobicity scale, with themost hydrophobic on the right-hand side of the graph.

DNA-BPs, investigated here present an even morepolar surface for interaction.

DNA-binding modes, motifs and symmetry

Three modes of binding were identi®ed in theproteins after creating interaction footprints(Figure 3). Proteins were classi®ed as (i) single-headed, (ii) double-headed or (iii) envelopingdependant upon the pattern of DNA base andbackbone contacts. The single-headed proteinswere de®ned as those with footprints with a singlecluster of base-contacting residues within a singlecluster of both sugar-phosphate backbone-contact-ing residues (Figure 3(a)). The double-headed pro-teins were de®ned as those with footprints withtwo distinct clusters of base-contacting resides, ortwo distinct clusters of backbone-contacting resi-dues within which were a number of unclusteredbase-contacting residues (Figure 3(b)). The envel-oping proteins formed a distinct class of protein inwhich the footprints showed a cleft lined by back-bone-contacting residues. Within this lining, base-contacting residues were present, usually forminga number of distinct sites (Figure 3(c).

Models created to depict the dominant second-ary structures of the binding motif, protein sym-metry and the type and relative position of theDNA groove bound are shown in Figure 4. Thesesimple models effectively summarise the grossanatomy of structures and aid classi®cation. Thesemodels reveal that proteins bind in the major andminor grooves, and some structures make contactsin both grooves simultaneously. All types of pro-

tein secondary structures are used to make con-

alculated for the protein-DNA complexes and comparedrnton, 1996). A propensity of >1 indicates that a residue

urface. The amino acid residues have been ordered usingmost hydrophilic residues on the left-hand side and the

Page 7: Protein-DNA interactions: a structural analysis

Figure 3. Protein and DNA footprints. The protein footprints differentiate between the residues contacting thesugar-phosphate backbone of the DNA and those contacting the bases. Protein residues that make no contacts withthe DNA are coloured blue, those contacting the sugar-phosphate backbone are coloured red, and those making basecontacts are coloured yellow. Each protein is shown from an angle that maximises the view of the protein-DNA inter-face and each is labelled with its PDB code. (a) Proteins with a single binding head: T4 endonuclease V (1vas), PU.1ETS domain (1pue). (b) Proteins with a double binding head: lambda repressor (1lmb), papillomavirus-1 E2 DNA-binding domain (2bop). (c) Proteins with an enveloping mode of binding: NF-kB (1nfk), EcoRI restriction endonu-clease (1eri). The DNA footprints are coloured such that those DNA base atoms contacted by protein are yellow,backbone atoms contacted by protein red and those atoms not contacted by protein purple. It should be noted thatthe scales are not comparable.

Protein-DNA Interactions 883

tacts, including beta sheets, alpha helices andloops.

It was observed that the proteins that bind DNAand function as enzymes predominantly interact inthe enveloping mode, using a large surface area ofinteraction to surround the DNA. The purpose ofthe envelope is to create the cleft that surroundsthe relatively large polynucloetide substrate. Thisis analogous to the catalytic pockets that housesmall molecule substrates. However, there are twoexceptions to this observation; the enzyme dg-resol-vase (1gdt) makes contacts with two bindingheads, and T4 endonuclease V (1vas) makes con-tacts with a single binding head. The transcription

factors predominantly use either one or two bind-

ing heads, contacting the DNA at speci®c sites.However, again there are exceptions, with the NF-kB (1nfk) and the TATA box binding protein (1ytb)using the envelope mode of binding. The Nf-kB(1nfk) structure uses a series of well-de®ned loopsto make contact with the DNA, similar to thoseobserved in some endonuclease structures. TheTATA box binding protein (1ytb) envelopes theDNA using a ten-stranded antiparallel b-sheet.

Another interesting point to note is that all thoseproteins that have a b-sheet DNA-binding motifhave 2-fold or pseudo 2-fold symmetry (Table 1).In two cases, the Met repressor (1cma) and the Arcrepressor (1par), the symmetry serves to bring

together strands from separate protomers to form a
Page 8: Protein-DNA interactions: a structural analysis

Figure 4. Simple model diagrams of protein-DNA complexes divided into (a) single-headed binding proteins,(b) double-headed binding proteins, and (c) enveloping proteins. The diagrams give an indication of the predominantsecondary structure of the binding motif, protein symmetry and the type and relative position of the DNA groovebound. The secondary structure of the predominant binding motifs are indicated using different symbols analogousto those used in TOPS diagrams (Westhead & Thornton, 1998). Only one symbol of each type is indicated in any onegroove, hence both a single sheet and two sheets are indicated by a single coloured triangle. The symmetry of eachprotein is indicated by using a different colour for each symmetry (or pseudo symmetry) related element. A singlesymbol shaded in two colours indicates that there are secondary structures of this type contributed by more than onesymmetry related element.

884 Protein-DNA Interactions

b-ribbon that is the DNA recognition motif. Twosymmetrically related antiparallel b-ribbons werealso observed in the structure of the integrationhost factor (1ihf). In the case of the TATA boxbinding protein (1ytb), two sub-domains related bya pseudo 2-fold symmetry, combine to form a ten-stranded antiparallel b-sheet which binds in theminor groove (Kim et al., 1993). In PvuII endonu-clease two antiparallel b-strands, one from eachsymmetrically related protomer, form the recog-nition elements in the major groove. In all theseproteins the DNA recognition b-sheet is antiparal-lel. The characteristic twist of a pair of antiparallelstrands has been suggested as the reason for thisbeing the favoured b-sheet formation (Phillips,1994).

Analysis of the grooves in which different bind-ing motifs and binding modes make contact withDNA reveals that proteins often bind in the major

groove (15 structures in the data set of 26; Table 1).

Eight other structures bind in both the major andminor groove simultaneously, whilst only threestructures in the data set bind solely in the minorgroove. The minor groove features in very fewcomplexes as it is too narrow in normal circum-stances to accommodate the common bindingmotifs such as HTH. The three structures whichbind solely in the minor groove (TATA box bind-ing protein (1ytb), integration host factor (1ihf) andthe T4 endonuclease V (1vas)) do so only as aresult of the DNA structure being severely dis-torted from the canonical structure of B-DNA. Thepresence of one or more kinks in the DNA in thesecomplexes effectively bend the DNA double helixand widen the minor groove (see below section).

Eight proteins in the current data set use boththe minor and major groove to make contact withDNA. All those that bind in this manner featureeither a HTH motif or extensive loop interactions.

Four structures with HTH motifs which bind in
Page 9: Protein-DNA interactions: a structural analysis

Table 3. Protein interface properties for three modes of DNA binding within the data set of 26 protein-DNA com-plexes

Single binding head Double binding head Enveloping

Number of examples 5 12 9� ASA (AÊ 2)a 983.9 (276.4) 1704.5 (519.2) 1763.3 (296.0)� %ASA (AÊ 2)b 12.1 (5.0) 13.6 (4.0) 14.0 (8.5)Sequence segmentationc 4.8 (2.2) 7.0 (2.7) 9.1 (4.3)Gap volume indexd 2.6 (1.1) 2.5 (0.8) 1.8 (0.7)H-bonds (/100 AÊ 2 �ASA)e 1.5 (0.5) 1.4 (0.3) 1.2 (0.3)Bridging water molecules (/100 AÊ 2 �ASA)f 0.6 (0.4) 0.5 (0.5) 0.8 (0.6)% Polarityg 50.3 (13.2) 47.1 (7.3) 48.2 (9.7)

For de®nitions of each parameter see the legend to Table 2.

Protein-DNA Interactions 885

this way are dg-resolvase (1gdt), Hin recombinase(1hcr), yeast RAP1 (1ign) and the paired domain(1pdn). With one exception, these exhibit the clas-sic HTH binding motif in the major groove andthen a polypeptide tail which binds in the minorgroove (Figure 4). The exception is dg-resolvasethat has two classic HTH motifs in the minorgroove and two helical arms that lie in the minorgroove (Figure 4). The helices ®t in the minorgroove as the DNA is kinked, altering the usualphysical parameters of the double helix, in thiscase the minor groove is widened to 7.6 AÊ fromthe average 5.1 AÊ (Yang & Steitz, 1995).

There was no preference observed for one bind-ing mode to favour one binding motif, the majorityof modes were observed in each motif class. Whenthe interface parameters were calculated for eachbinding mode (Table 3), it was observed that theenvelope binding mode presented the largest, mostsegmented and most closely packed of all threemodes. Those proteins with single binding headswere the least well packed, and had an interactionsurface with the DNA of nearly half the size that

of the other two modes.

Table 4. DNA structural parameters taken from the NDB foA) and a dataset of uncomplexed and unmodi®ed DNA stru

Unbound B-DNAMean Sd

Groove dimensionsMajor groove width 13.6 0.9Major groove depth 5.8 1.7Minor groove width 9.6 1.6Minor groove depth 6.7 0.5

Base step localTwist 35.9 6.2Tilt 0.2 3.4Roll 0.9 2.5Shift 0.02 0.5Slide ÿ0.07 0.7Rise 3.4 0.21

Torsion anglesa 298 34b 168 23g 50 38s 129 23e 199 36z 237 50w 214 25

Values for a data set of unbound and unmodi®ed A-DNA structur

Protein-bound DNA structures

DNA structural parameters in the NDB

The distributions of four groove dimensions, sixbase step local parameters and seven torsion anglesin the protein bound DNA (data set A), theunbound B-DNA (data set B), and an additionalset of unbound A-DNA structures from the NDB,are in Table 4. These structural parameters weretaken from the NDB where they are calculatedusing a number of algorithms including Curves 5.1(Lavery & Sklenar, 1989). The standard deviationsfor the majority of the parameters are larger in thebound DNA than for the unbound DNA. Themeans of the seven backbone torsion angles forboth the bound and the unbound DNA are withinthe populated ranges calculated previously for ahigh-resolution data set of B-DNA structures(Schneider et al., 1997). The means of the twist, tilt,roll and slide for the unbound DNA are similar tothe means of a data set of 38 unbound B-DNAcrystal structures (Gorin et al., 1995) and to 724

base-pair steps from unbound B-DNA structures

r a dataset of 26 protein bound DNA structures (data setctures (data set B)

Protein-bound DNA Unbound A-DNAMean Sd Mean Sd

11.5 2.0 5.0 3.85.0 2.1 9.3 2.57.5 2.2 9.8 0.74.1 1.8 1.0 0.8

32.6 11.5 31.4 4.50.4 5.0 0.1 2.74.5 11.2 6.7 5.0

0.01 0.7 0.0 0.5ÿ0.4 0.8 ÿ1.9 0.4

3.5 0.4 3.4 0.3

304 54 293 17173 33 174 1459 56 56 14

129 23 81 7195 34 203 12244 44 289 12247 27 199 8

es currently in the NDB are shown for comparison.

Page 10: Protein-DNA interactions: a structural analysis

886 Protein-DNA Interactions

analysed by Olson et al. (1998). The means of theseparameters for the bound DNA structures in thecurrent data set were also comparable to thosederived from 2114 bp steps from 92 protein-DNAcrystal complexes analysed by Olson et al. (1998).The main differences between the two data sets inthe current study were observed in the distri-butions of base step local twist, roll and slide.When proteins bind DNA the DNA exhibits lowerslide and twist values, and higher roll values thatare more characteristic of A-DNA. The A-DNA-likevalues of some properties in protein-bound DNAstructures have been observed and discussed pre-viously (Nekludova & Pabo, 1994; Shakked et al.,1994). The large roll angles (both positive andnegative) are due to the kinks that are frequentlyobserved in DNA structures bound to proteins.

Measuring DNA distortion

Speci®c numerical parameters are available thatdescribe the bending of a DNA structure per base-pair (NDB; Berman et al., 1992). However for thecomparison of multiple DNA structures fromdifferent complexes a single parameter, thatdescribed the overall distortion of the structurefrom one end of the DNA double helix to theother, was required. DNA distortion was measuredby calculating the root-mean-square deviation(rmsd) when each DNA structure was ®tted onto a40 bp canonical B-DNA structure. Distortion valueswere calculated for the DNA in data set A anddata set B (Figure 5) and it was found that nearlyall protein bound DNA structures had higher rmsdvalues than unbound DNA structures.

The distortions measured could be explained bya number of speci®c structural features that were

present in the DNA helix. These are tabulated for

Figure 5. Frequency distribution of rmsd values calculateDNA data set (®lled bars) and unbound DNA data set (open

the 13 complexes with the highest rmsd per base-pair values (Table 5). Two structures (1vas and3mht) were highly distorted due to ¯ipped-outbases. A more common feature observed in dis-torted DNA was the kink. Here a kink is de®nedas a distortion of the DNA helix such that a basestep has a local roll value of >20 �, and eight of thestructures exhibit one or more kinks (Table 5). Oneor more kinks in DNA can mean that the helix ofthe structure is effectively bent. However, if kinksin the helix compensate one another the overallhelical axis can remain straight, as in the case ofEcoRI restriction endonuclease (1eri; McClarin et al.,1986). In this structure the path of the helix isapproximately linear as a central negative roll iscompensated at adjacent steps on each side by apositive roll. Of the 18 kinks observed in the pro-tein-bound DNA structures (Table 5), eight occurat pyrimidine-purine (YR) base steps, which arethe steps that energy calculations (e.g. Sarai et al.,1989) have shown to be the most ¯exible. Thisobservation was also made in a series of DNA-transcription factor complexes (Suzuki & Yagi,1995) and more recently in a data set of 92 protein-DNA complexes (Olson et al., 1998). However,DNA can be bent without any kinks being present,an example of this is seen in the DNA complexedwith papillomavirus-1 E2 DNA binding domain(2bop). This DNA helix does not exhibit anyobvious kinks but many consecutive base stepshave local high roll angles that result in a helixwith a continuous curve (Hegde et al., 1992).

In summary, it can be seen from Table 5 thatthere are effectively three types of DNA distortion.Firstly, there are speci®c severe local distortionssuch as kinks that disrupt the path of the helixresulting in bending e.g. CAP (1ber). Secondly,

there are small local distortions that act cumulat-

d from ®tting each DNA structure in the protein-boundbars) to a 40 bp canonical B-DNA.

Page 11: Protein-DNA interactions: a structural analysis

Table 5. Summary of DNA deformations in a subset of protein-DNA complexes showing the largest rmsd values compared to canonical B-DNA structures

PDB ID Groove-bound Binding mode

No kinks (rollangle �) Base step at kink Groove deformation Intercalation Additional structural features of DNA

1ihf m Double-headed 2 (58 ÿ64) TT/AA AA/TT Minor groove compressed Pro�2ÿminor DNA bent and direction of DNA helix reversedwithin a short distance (bend angle 60 �)

1ber M Double-headed 2 (52 34) TG/CA CA/TG Minor groove compressed - DNA bent towards main body of protein (bendangle 87 �)

1gdt M�m Double-headed 2 (22 40) TA/TA TA/TA Minor groove widened - DNA bent away from main body of protein (bendangle 60 �)

1ytb m Enveloping 6 (46 30 37 48 27 38) TA/TA AT/AT AA/TTTA/TA AT/AT AA/TT

Minor groove widened Phe�2ÿminor DNA bent away from main body of protein

1vas m Single-headed 1 (37) TT/AA Minor groove widened - DNA is bent and has flipped-out base (bend angle60 �)

1cma M Double-headed none - Minor groove widened - DNA bent at two sites at centre of met boxestowards main body of protein

1pnr M�m Double-headed 1 (51) CG/CG Minor groove widened Leu�2ÿminor DNA bent away from main body of protein (bendangle 45 �)

1par M Double-headed none - Minor groove compressed atcentre of sequence

- DNA bent towards main body of protein (bendangle 50 �)

1rva M�m Enveloping 1 (47) TA/TA Major groove compressed -1pvi M Enveloping none - - Distribution of backbone torsion angles is

asymmetrical1pue M�m Single-headed none Minor groove widened - DNA bent towards from main body of protein

(bend angle 8 �)2bop M Double-headed none - Major and minor groove

compressed on concave side- DNA has a continuous curvature with max roll

angle of 16 � (bend angle 45 �)1eri M Enveloping 3 (29 ÿ52 29) AA/TT AT/AT TT/AA Minor and major grooves

widened- Central kink compensated by two symmetrical

smaller kinks so that the overall path of helix islinear

A kink has been de®ned as a base-pair with a local roll angle of >20 �, and the roll angles cited have been extracted from the NDB. The description of additional structural features have beentaken from the original references of each structure as cited in the PDB.

Page 12: Protein-DNA interactions: a structural analysis

Table 6. Distance and number of base-pairs between distal contacts in double-headed DNA binding proteins

PDB ID NDB ID Distance (AÊ )Actual no.base-pairs

Theoretical no.base-pairs Difference rmsd

1pnr PDR020 52.0 15 15 0 5.41d66 PDT003 52.3 16 15 1 2.41cma PDR008 54.2 18 16 2 5.51glu PDT030 46.6 15 13 2 3.01ign PDT035 34.7 12 10 2 3.21lmb PDR010 13.0 17 14 3 3.01mdy PDT016 24.6 10 7 3 2.52dgc PDT029 29.1 12 9 3 2.92bop PDV001 36.6 16 11 5 3.61par PDR012 57.3 22 17 5 5.01ber PDR023 42.5 21 12 9 9.91gdt PDE0115 58.8 27 17 10 9.71ihf PDT040 21.4 26 7 19 22.5

The theoretical number of base-pairs is based on the assumption of a 3.4 AÊ rise per base-pair. The difference between the theoreti-cal and actual number of bases is also recorded. The rmsd values of each DNA structure superimposed onto a canonical B-DNA arealso shown.

888 Protein-DNA Interactions

ively to disrupt the gross structure of the helix, e.g.the DNA-binding domain of Papillomavirus-1 E2(2bop). Thirdly, there are multiple severe local dis-tortions such as kinks that, when combined, do notdisruption the path of helix, leaving it almost line-ar, e.g. EcoRI endonuclease.

DNA bending was further investigated in thosecomplexes involving double-headed proteins(Figure 3(b)). The spacing of the distal contactsmade by the protein to the DNA was measuredand the number of bases between these contactswas counted (see Materials and Methods). Theseresults were tabulated and compared with thenumber of bases that could be ®tted between thesecontact points if the DNA were straight (assuminga rise per base-pair of 3.4 AÊ ; Table 6). This revealedthat in all but one structure, the bending of theDNA enable the insertion of more base-pairsbetween the protein contact points. Hence, bybending the DNA the protein can make contactwith speci®c bases of the DNA that would be toofar apart for the protein to reach if the DNA werestraight. Consider the most bent structure, inte-gration host factor, as an example. The most distalprotein contacts are measured at approximately21 AÊ apart. These are the contacts made by thehelices at the sides of the protein that effectivelyclamp the complex together (Rice et al., 1996). Ifthese sites contacted straight DNA, then onlyseven base-pairs would represent the extent of thecontact site on the DNA. Making contact by bend-ing the DNA structure means that there are 26 bpspanning the contact site of the DNA, althoughthere are only direct protein contacts to six base-pairs within this region (NUCPLOT; Luscumbeet al., 1997).

These results show that DNA distortion isrequired in some complexes for essential protein-base contacts to be made. But this is not the onlypicture; some proteins bind speci®cally to DNAwithout the need for distortion of DNA. In contrastto EcoRI restriction endonuclease (1eri) and EcoRV

endonuclease (1rva), the structure of BamHI endo-

nuclease (1bhm) shows that severe DNA distortionis not required for speci®c complex formation. Inthis structure, speci®c base-pair contacts occurprincipally in the major groove with most of theprotein residues being located at the NH2 terminusof a four-helix bundle (Newman et al., 1995).Hence, speci®c recognition requires DNA distor-tion in some circumstances but not others.

Models of complexes that have bent DNA showthat the bending can take one of two forms,(i) bending towards the major groove of the DNA,resulting in the compression of this groove and thewidening of the opposing minor groove(Figure 6(a)), or (ii) bending towards the minorgroove resulting in compression of the minorgroove but widening of the opposing major groove(Figure 6(b)). In (i) structures are seen in which theDNA is bent away from the main body of the pro-tein and also structures in which the DNA is benttowards the protein. In the gd-resolvase structurethe DNA is kinked in two places, leaving the DNAhelix with a bend which is away from the mainbody of the protein, widening the minor groove onthe protein side allowing minor groove contactsfrom residues in alpha helices (Yang & Steitz,1995). In a similar way, the widened minor groovepresents a binding site for helices, loops and sheetsin 1pnr, 1ytb, 1vas and 1rva. In 1pue the minorgroove is only slightly widened opposite the majorgroove in which the protein is bound and does notcontain any major secondary structure elements(Kodandapani et al., 1996). In 1cma the widenedminor grooves result from compression of themajor grooves at the centre of met boxes, whichcompress the beta ribbons bound there (Somers &Philips, 1992). In (ii) structures are only observedin which the DNA is bent towards the main bodyof the protein. All these structures are double-headed binding proteins with DNA wrappingaround the outside of the protein. This can takeplace with protein secondary structures bound inthe minor grooves (1ihf) or in the major grooves

(1par, 1ber) or both (2bop).
Page 13: Protein-DNA interactions: a structural analysis

Figure 6. Simple models of protein-DNA complexes in which the DNA has a large rmsd value when superimposedon canonical B-DNA. The diagrams give an indication of the predominant secondary structure of the binding motif,protein symmetry and the type and relative position of the DNA groove bound (see the legend to Figure 4). Thedirection of DNA bending is indicated by an angled line above each model and the groove which is compressed orwidened is indicated by an asterisk (*). (a) Complexes in which minor groove is widened. In gd-resolvase (1gdt), pur-ine repressor (1pnr), TATA box binding protein (1ytb), T4 endonuclease V (1vas) and EcoRV endonuclease (1rva) theDNA is bent away from the main body of the protein. In PU.1 ETS domain (1pue) and Met repressor the DNA isbent towards the main body of the protein. (b) Complexes in which minor groove is compressed. In all the examples(arc repressor (1par), intergration host factor (1ihf), catabolic gene activator protein (1ber) and papillomavirus-1 E2DNA-binding domain) the DNA is bent towards the main body of the protein.

Protein-DNA Interactions 889

EcoRV: specific and non-specific recognition

EcoRV endonuclease is one DNA-binding pro-tein whose structure has been solved complexedwith cognate and non-cognate sequences of DNA(Winkler et al., 1993; Kostrewa & Winkler, 1995;Horton & Persona, 1998). To make comparisonsbetween the protein-DNA interfaces in these struc-tures, the interface parameters for the protein and

the degree of DNA where calculated for three

Table 7. Protein interface properties for three structures of Ebound

PDB Code

Resolution (AÊ ) 3.0DNA-bound NDNA sequence CGNumber base-pairs bound 16� ASA (AÊ 2)a 12� %ASA (AÊ 2)b 5.7Sequence segmentationc 13Gap volume indexd 5.1H Bonds (/100 AÊ 2 �ASA)e 0.6Bridging H2O (/100 AÊ 2 �ASA)f 1.0% Polarityg 58rmsd per base-pair when fitted to 40 bp canonical B-DNA (AÊ ) 0.2

For de®nitions of each parameter see the legend to Table 2.

EcoRV structures in the PDB (Table 7). Thecomparisons reveal that in the cognate complexesthe ASA of the interface is >800 AÊ 2 larger thanin the non-cognate complex. The cognate complexesare tightly packed (gap volume index 1.6 and 1.8)with three times as many inter-molecular hydrogenbonds as the non-cognate complex that is verypoorly packed (gap volume index 5.7). The rmsdvalues per base-pair when the DNA in each com-

plex was ®tted to a 40 bp canonical B-DNA struc-

coRV endonuclease, with cognate and non-cognate DNA

2rve 4rve 1rva

3.0 2.0on-cognate Cognate Cognate

AGCTCG GGGATATCCC AAAGATATCTT10 10

31 2083 21239.0 9.216 161.8 1.71.5 1.50.1 1.555 53

9 0.46 0.47

Page 14: Protein-DNA interactions: a structural analysis

890 Protein-DNA Interactions

ture, reveal large differences: 0.29 for the non-cog-nate complex compared to 0.46 and 0.47 for the cog-nate complexes. These differences can explained bya central kink in the cognate DNA complexes(Winkler et al., 1993: Kostrewa & Winkler, 1995).The deformation of the DNA structure is requiredin order that the speci®c base contacts are made.

Discussion

The data set in this study includes protein struc-tures that represent 26 different protein homologyfamilies. A common approach to this type of anal-ysis would be the classi®cation and analysis oftheir DNA-binding motifs. There have been exten-sive reviews of individual motifs including helix-turn-helix motifs (Brennan, 1992), homeodomains(Wright, 1994), zinc co-ordinating motifs (Kaptein,1991; Schmiedeskamp & Klevit, 1994), leucine zip-pers (Pathak & Sigler, 1992; Hurst, 1995), helix-loop-helix motifs (Littlewood & Evan, 1995), andb-sheet motifs (Phillips, 1994; Raumann et al.,1994a). Recent analytical work has centred on theanalysis of helix binding geometry within helix-DNA interactions (Suzuki & Gerstein, 1995;Wintjens & Rooman, 1996) and on the confor-mation of the protein-bound DNA structures(Suzuki & Yagi, 1995; Young et al., 1995;Dickerson, 1998). Here, we have taken a new per-spective and analysed speci®c structural par-ameters of all DNA-binding motifs collectively,and classi®ed the protein structures into three newbinding modes. The binding sites have been com-pared to those observed on protein-protein com-plexes, and studied from a perspective of thedistortions they impose upon the DNA helices theybind.

What is clear from this study is that proteinsbinding DNA present very different binding sur-faces to those that bind other proteins. Many pro-teins bind as dimeric structures with 2-fold orpseudo-2-fold symmetry. The DNA-binding pro-teins present a relatively hydrophilic surface com-prised of many sequence segments. Inter-molecularhydrogen bonds and water-mediated hydrogenbonds feature widely in these interactions. Proteinatoms in the interaction surface can make contactsin the minor and major grooves of the DNA, withthe sugar-phosphate backbone and speci®c base-pairs. The binding sites on the proteins present sur-faces that share common features, but the frame-work upon which these surfaces are foundincludes all types of protein secondary structure(Figure 4). These structures interact in both themajor and the minor grooves of the DNA, and inmany different combinations. Patterns in theirgross architecture are apparent without progres-sing to the protein motif level of classi®cation.Three sub-groups (single-headed, double-headedand enveloping) were identi®ed by looking at thenumber and positioning of protein residue-DNA

base interactions in a series of interaction footprints

(Figure 3). This classi®cation grouped six of theeight enzymes into a single category (enveloping)in which the proteins present a large cleft-like sitethat envelopes the DNA. Four of the ®ve endonu-cleases in the data set bind in this manner, as doesthe methyltranferase and the recombinaseenzymes. The ®fth endonuclease is an exception asit binds DNA with a single binding head. How-ever, this structure recognises damaged DNAwhich is severely kinked by a ¯ipped-out base-pair(Vassylyev et al., 1995) and hence this requires avery different mode of protein binding.

The majority of proteins binding DNA interactin the major groove due to the physical limitationsof ®tting secondary structure motifs such as HTH(in which a single helix has diameter of approxi-mately 4.6 AÊ ; Shulz & Schirmer, 1979) into thenarrow minor groove, which on average measures5.7 AÊ in width (Blackburn & Gait, 1996). In com-parison, the major groove measures on average11.7 AÊ in width and 8.8 AÊ in depth (Blackburn &Gait, 1996). Minor groove binding is only observedin complexes where the DNA structure is signi®-cantly distorted, resulting in the widening of thegroove to allow entry of b-sheets (e.g. TATA boxbinding protein) or alpha helices (e.g. dg-resolvase).If binding is in the major groove, the DNA may ormay not be distorted, depending on what contactshave to be made. However, proteins with twobinding heads binding at distal sites on the DNAfrequently bend the DNA.

An interesting feature of some complexes is sim-ultaneous binding in the major and the minorgroove (eight of the 26 structures bound in bothgrooves). In some structures the binding involves aHTH motif in the major groove and a trailing loopin the minor groove (e.g. Hin recombinase, RAP1binding domain, paired domain), whilst in othersloops and helices interact in both grooves to envel-ope the DNA (e.g. BamHI endonuclease, EcoRVendonuclease). In the former structures the minorgroove interactions are as important for base recog-nition as the recognition helix of the HTH motif inthe major groove. Hin recombinase (1hcr) has anamino-terminal arm that lies in the minor groovethat makes base-speci®c contacts. Two residues inthis region, Gly139 and Arg140, are invariant in allinvertases: if they are deleted in Hin recombinasethe protein looses its sequence-speci®c binding tohixL recombination sites (Feng et al., 1994). Thebinding domains of RAP1 (1ign) featureN-terminal arm regions that each make contact toa single base from the minor groove (Konig et al.,1996). The structure of the paired domain (1pdn)features both a trailing loop and a type II b-turn inthe minor groove, the turn forming base-speci®chydrogen bonds with two bases and a water-mediated contact with a third base (Konig et al.,1996). Hence these structures exemplify the import-ance of binding in the minor groove, and showthat the sequence information of the DNA is avail-

able to proteins from this groove, although binding
Page 15: Protein-DNA interactions: a structural analysis

Protein-DNA Interactions 891

is more commonly achieved through the majorgroove.

The differences observed between speci®c andnon-speci®c binding in the complexes of EcoRVendonuclease imply that in some structures,speci®c binding requires a large degree of DNAdistortion (Winkler et al., 1993; Kostrewa & Wink-ler, 1995; Horton & Perona, 1998). In the threestructures analysed here, the two cognate DNAstructures had rmsd values (when ®tted to canoni-cal B-DNA) twice those of the non-cognate DNAsequences. Modelling studies of other structureshave con®rmed the theory that distortion of DNAis required in some interactions for speci®c basecontacts to be made. In the structure of Zif268 pro-tein-DNA (1aay), individual zinc ®ngers weredocked to canonical B-DNA and a near normal setof protein-DNA contacts were achieved (Elrod-Erickson, 1996). However, when the ®nger motifswere docked in this manner it became apparentthat the linkers that are present between the ®ngerswere not long enough to span the distancebetween the individually docked structures. Byuntwisting the DNA helix and enlarging the majorgroove (as observed in the complexed DNA), thedistance between the ®ngers was reduced. Hencethe three-repeated zinc ®nger structure could notbind canonical DNA. A similar modelling exper-iment was conducted on the MetJ repressor-oper-ator complex (Somers & Philips, 1992). When asingle dimer of the Met repressor was docked to acanonical (straight) B-DNA structure in the correctorientation (that observed in the native complex),there was a 208 AÊ 2 loss of contact area; in thenative complex with the DNA bent, the ASA ofcontact was 648 AÊ 2. Hence the required contactscould not be achieved with canonical B-DNA.

In some interactions where DNA is not distortedon protein binding, the protein itself undergoesconsiderable changes in conformation to achievespeci®c base contacts. For example, BamHI endo-nuclease (1bhm) undergoes a series of confor-mational changes on binding with DNA, includingrigid body motion of protomers, ordering of disor-dered loops and local side-chain and main-chainrearrangements (Newman et al., 1995). In otherstructures both the DNA and the protein are dis-torted. For example, in gd-resolvase the action ofbinding a bent DNA structure stabilises the foldingof the last 63 amino acid residue segment of theprotein, including part of a helix that is thoughtnot to be present in the unbound form of the pro-tein (Yang & Steitz, 1995). However, rmsd valuescalculated for proteins in the PDB in both the freestate and the DNA-bound state reveal relativelysmall conformational changes. CAP and EcoRVhad rmsd values (calculated over all atoms) of only3.0 AÊ and 4.2 AÊ , respectively. It is interesting tonote, in this light, that when the structure of theTATA-binding protein was ®rst solved in the freestate (before the protein-DNA complex wasknown) is was proposed that a suitable model for

the complex was for the protein to under confor-

mational changes to wrap around the DNA follow-ing the trajectory of the minor groove (Chasmanet al., 1993). In this structure the conformationalchange of the protein was seen as a more likelythan the extreme distortion of the DNA that wasrevealed when the protein-DNA complex wassolved (Kim et al., 1993).

Sequence dependence of DNA distortion is evi-dent in both intrinsic curvature of DNA(Hagerman, 1990) and protein-induced DNA bend-ing (Suzuki & Yagi, 1995; Young et al., 1995;Dickerson, 1998; Olson et al., 1998). Tight bendingof DNA requires that the stacking interactions ofthe base-pairs can accommodate the deformationsin the groove width. Con®rming earlier obser-vations, the kinks observed in the complexes inthis study commonly occurred at pyrimidine-pur-ine base steps (Table 5). However, an importantquestion to address is whether the distorted struc-tures of the protein-bound DNAs also exist in anfree state. However, there are very few examplesof DNA for which the crystal structure bound andunbound to a protein has been determined, andthese results are inconclusive. It was seen in thecurrent study (Figure 5) that the distortions inbound DNA were, in general, far more extremethan any seen in unbound DNA structures. How-ever, it has been suggested that in some structuresthe bound formation of the DNA is similar to thatobserved in an unbound state in solution (Travers,1992). In the Zif268-DNA complex it is proposedthat the zinc ®ngers bind to and further stabilisean already energetically favoured conformation ofthe DNA. Similarly, in the MetJ repressor-DNAcomplex it is suggested that the characteristic struc-ture of the DNA with an alternating high and lowtwist has been observed in a similar unboundDNA sequence.

What is clear is that there is no simple relation-ship between the features of the protein bindingsite and the distortion observed in the DNAbound. The extent of DNA distortion is in¯uencedby the forces imposed by the protein and by theenergy constraints on the DNA structure imposedby the sequence of bases (Berman, 1997). Morestructures of bound and unbound proteins andDNA are needed to further understand therelationship between protein binding sites andDNA distortion.

Conclusions

Three modes of protein binding to DNA havebeen identi®ed, single-headed, double-headed, andenveloping. The interfaces within these threemodes all share common features. They are popu-lated by more polar residues, in contrast to thosein protein-protein interfaces, and as a direct conse-quence they have many direct or water-mediatedinter-molecular hydrogen bonds. All three modescontain examples of both bent and unbent DNA.

Analyses of these data allowed us to make the fol-
Page 16: Protein-DNA interactions: a structural analysis

Figure 7. Data ¯ow diagram showing the selection ofthe protein-DNA complexes from the Nucleic AcidDatabase (NDB, 01/05/97). The protein-DNA complexeswere grouped into structurally related families usingSSAP (Taylor & Orengo, 1989).

892 Protein-DNA Interactions

lowing preliminary observations about DNA bend-ing in protein-DNA complexes.

Firstly, when proteins bind DNA the structure ofthe DNA is always distorted compared to a cano-nical B-DNA structure. This distortion is nearlyalways greater than that observed in unboundDNA structures.

Secondly, if the separation between the recog-nition sites on canonical B-DNA is larger than theseparation between the binding heads of the pro-tein, then the DNA will be bent to facilitate bind-ing.

Thirdly, the DNA is usually bent in complexesthat have strands or helices positioned in theminor groove of the DNA. The only exception tothis in the current data set is the paired domain(PDB code 1pdn).

Finally, the DNA is bent in all the observed-double headed complexes that have strands bind-ing in the major grooves one turn apart on theDNA. In contrast, DNA may or may not be bentby the observed double headed complexes thathave helices positioned in the major groove oneturn apart on the DNA.

This analysis has been conducted on a limiteddata set of 26 protein-DNA complexes. As morestructures are determined the validity of these con-clusions can be tested.

Materials and Methods

Data set selection

A data set of protein-DNA complexes was selectedfrom the NDB (Berman et al., 1992) on 01/05/1997(Figure 7). The data set was chosen such that itexcluded structurally related proteins, i.e. was non-homologous. To achieve this a structural alignmentprogram SSAP (Taylor & Orengo, 1989) was used toalign each protein chain against all other proteinchains. Only those chains that had a SSAP score of<80 were selected for inclusion in the data set; a SSAPscore of 580 (and a sequence identity of >20 %)between a pair of proteins indicates that they are struc-turally related. Using the SSAP scores the proteinstructures were organised into structurally relatedfamilies. A representative complex (with the best resol-ution, with at least ten base-pairs in the DNA and inwhich the protein bound the cognate or consensusDNA sequence) was selected from each family to pro-duce a non-homologous set of 26 protein-DNA com-plexes, referred to as data set A (Table 1).

A second data set of uncomplexed and unmodi®edB-DNA structures was also selected from the NDB (on01/05/1997; Berman et al., 1992). The resultant data setof 21 uncomplexed B-DNA structures, referred to as dataset B, included structures with the following PDB codes:250d, 1bd1, 5dnb, 1d23, 1d49, 1d56, 1cgc, 126d, 167d,196d, 252d, 1bna, 1d98, 1dn9, 1d29, 1d65, 119d, 194d,249d, 287d, and the NDB-coded structure BDJ061. Theresolutions of the structures in this data set ranged from1.4-2.5 AÊ . The DNA structures in this data set were usedfor comparison with the bound DNA structures in data

set A.

Data set classification

For each of the proteins in data set A the DNA-bind-ing motif was identi®ed by looking at the PDB data ®lesusing Rasmol (Sayle & Milner-White, 1995). For a rela-tively simple motif classi®cation, proteins were classedinto one of six DNA-binding motifs, (i) helix-turn-helix(HTH), (ii) helix-loop-helix (HLH), (iii) leucine zippers,(iv) zinc co-ordinating, (v) b-ribbons, and (vi) loops/other (Table 1). If a structure contained one of the ®rst®ve motifs as the dominant DNA-recognition element,then the structure was classi®ed into this motif group;otherwise it was classi®ed as loop/other. The function ofeach of the proteins was also collated from the literatureand classed simply as enzyme or transcription factor(Table 1). The oligomeric state of the DNA-binding pro-teins and the type of groove location of binding (i.e.major or minor groove on the DNA structure) were alsotabulated (Table 1). The symmetry within the proteinstructures was classi®ed using (i) n-fold, and (ii) pseudon-fold classes (Table 1). Where there was exact symmetrythe proteins were classi®ed into the appropriate n-foldclass. Where there were two protomers with non-exact

symmetry, or a form of internal repeat within a single
Page 17: Protein-DNA interactions: a structural analysis

Protein-DNA Interactions 893

protomer, the protein was classi®ed into the appropriatepseudo n-fold class.

Interaction footprints were made by de®ning the bind-ing sites of the protein structures. The binding site of aprotein molecule was de®ned to include any residue thatlost >1 AÊ 2 of accessible surface area (ASA) when passingfrom the uncomplexed to the complexed state. The ASAof the structures were calculated using an implemen-tation of the Lee & Richards (1971) algorithm developedby Hubbard (1992). Each footprint indicates the residuesof the protein that interact with the DNA, using a CPKdepiction of the protein. The footprints differentiatebetween those residues interacting with the sugar-phos-phate backbone of the DNA and those interacting withthe bases. These footprints were used to classify theDNA-binding proteins into three separate classes:(i) single-headed, (ii) double-headed, and (iii) envelop-ing, examples of which are shown in Figure 3. Thesingle-headed proteins were de®ned as those with foot-prints with a single cluster of base contacting residueswithin a single cluster of both sugar-phosphate back-bone-contacting residues (Figure 3(a)). The double-headed proteins were de®ned as those with footprintswith two distinct clusters of base-contacting resides, ortwo distinct clusters of backbone-contacting residueswithin which were a number of unclustered base-con-tacting residues (Figure 3(b)). The enveloping proteinsformed a distinct class of protein in which the footprintsshowed a cleft lined by backbone-contacting residues.Within this lining base-contacting residues were present,usually forming a number of distinct sites (Figure 3(c)).To quantify potential differences between these bindingmodes the means and standard deviations for six inter-face parameters were calculated separately for each(Table 3).

Additional models of each protein-DNA complexwere also created to aid in the classi®cation of the pro-teins and for summarising the gross anatomy of struc-tures for discussion. A simple model was constructed foreach complex which gave an indication of the predomi-nant secondary structure of the binding motif (a-helix,b-sheet or loop), symmetry of the protein, and type andrelative position of grooves bound on the DNA(Figure 4).

Analysis of protein DNA-binding sites

An algorithm was used to calculate a series of par-ameters for the DNA-binding sites of proteins in thedata set. The algorithm was a modi®ed version of oneused to calculate similar parameters for protein-proteininterfaces (Jones & Thornton, 1995). The parameters cal-culated for each binding site included size (in terms ofASA), interface sequence segmentation, numbers ofinter-molecular hydrogen bonds, the gap volumebetween the protein and the DNA molecule (Jones &Thornton, 1996), and the number of water moleculesforming hydrogen-bonding bridges between the proteinand the DNA (Table 2). The ASA contribution of polaratoms in the interface was also calculated (Table 2). Fre-quency distributions for each of the parameters were cal-culated for all protein-DNA complexes and comparedwith those obtained for protein-protein interactions(Figure 1). Two different types of protein-protein inter-actions have been used for comparison. The ®rst, termedpermanent complexes, are those in which the com-ponents only exist within the complex. The second,

termed non-obligate complexes, are those in which the

components can exist both in the complex and in iso-lation, e.g. enzyme inhibitor complexes (Jones &Thornton, 1996). In addition, the means and standarddeviations of the distributions have been tabulated forthe monomeric and dimeric DNA-binding proteins andthe protein-protein complexes (Table 2).

Residue interface propensities were calculated for theprotein-DNA complexes (Figure 2). The residue interfacepropensities give a measure of the relative importance ofdifferent amino acid residues in the DNA-binding site ofthe protein. This information can only be interpreted ifthe residue distributions at the interface are comparedwith those on the protein surface as a whole. Residueinterface, propensities were calculated for each aminoacid type (AAj) as the fraction of ASA that AAj contribu-ted to the interface compared with the fraction of ASAthat AAj contributed to the whole surface of the protein:

Interface residue propensity AAj:

�XNi

i�1

ASAAAj�i�

�XNi

i�1

ASA�i�

!� XNs

i�1

ASAAAj�s�

�XNs

i�1

ASA�s�

!Where �ASAAAj(i) is the sum of the ASA (in the protein)of the amino acid residues of type j in the interface;�ASA(i) is the sum of the ASA in the protein of allamino acid residues of all types in the interface;�ASAAAj(s) is the sum of the ASA (in the protein) of theamino acid residues of type j on the protein surface (thesurface being de®nes as those residues with >5 % relativeASA in isolation); �ASA(s) is the sum of the ASA in theprotein of all amino acid residues of all types on the pro-tein surface; Ni is the number of residues in the interface;and Ns is the number of residues on the protein surface,excluding the interface residues.

A propensity of >1 indicates that a residue occurs morefrequently in the interface than on the protein surface.

An internet resource

The protein-DNA interface parameters calculated herecan be calculated for any protein-nucleic acid complexusing the protein-nucleic acid interaction server (http://www.biochem.ucl.ac.uk/bsm/DNA/server). This server,similar to one developed for protein-protein interac-tions (http://www.biochem.ucl.ac.uk/bsm/PP/server;Luscombe et al., 1998), allows a user to upload the co-ordinates of any protein-nucleic acid complex andreceive back a report on its interface parameters. Thisuseful tool provides a simple and quick means of com-paring new complexes with those already known.

Analysis of DNA structure

A series of parameters, describing the local and globalstructure of DNA double helices are stored in the NDBand are calculated using a number of algorithms(Dickerson et al., 1989) including Curves (Lavery &Sklenar, 1989). The means and standard deviations ofthe distributions of four groove dimensions, six base steplocal parameters and the seven torsion angles in the pro-tein bound DNA (data set A) and the unbound DNA(data set B) were extracted from the NDB (Table 4).

Speci®c numerical parameters are available thatdescribe the bending of a DNA structure per base-pair

(Dickerson, 1989). However, for the comparison of mul-
Page 18: Protein-DNA interactions: a structural analysis

894 Protein-DNA Interactions

tiple DNA structures from different complexes a singleparameter that described the overall distortion of thestructure from one end of the DNA double helix to theother was required. When three-dimensional proteinstructures are compared, the root-mean-square devi-ations (rmsd) are routinely quoted to describe the overalldegree of similarity. This method has also beenemployed for the description of individual DNA struc-tures (e.g. Somers & Philips, 1992). To compare the 26DNA-bound structures in our study, rmsd values werecalculated for each protein bound DNA structure ®ttedto a 40 bp canonical B-DNA structure with Dickersonbase-pairing (Figure 5). The ®tting was performed usingthe McLachlan algorithm (McLachlan, 1982) asimplemented in the computer program ProFit (A.C.R.Martin, http://www.biochem.ucl.ac.uk/ �martin/prof-it). Fitting was conducted using all the backbone atoms ofthe DNA strands (denoted P, 01P, 02P, 05* C5*, C4*, O4*,C3*, O3*, C2*, C1* in PDB ®les). As the canonical B-DNAused was completely symmetrical, a single superpositionwas conducted for each DNA structure and a rmsd valuecalculated. All 21 unbound DNA structures were also®tted to the canonical B-DNA structure (Figure 5).

To ®nd the effect of bending the DNA helix on thecontacts made with the protein, the distance betweendistal binding sites was measured. This was done by®nding the Ca atom of the protein residue contacting thetwo most distal bases on the DNA helix. The contactcould be a hydrogen bond or a non-bonded contact. Alist of bonding residues was obtained using NUCPLOT(Luscombe et al., 1997). The distance between these twoCa atoms was measured using RasMol (Sayle & Milner-White, 1995) and the number of base-pairs between thedistal contacts were counted. Assuming that in a stan-dard canonical B-DNA structure there is a 3.4 AÊ rise perbase-pair (Blackthorn & Gait, 1996), the number of base-pairs that could be ®tted between the distal contacts ifthe DNA was undistorted was calculated (Table 6).

Interface footprints were created for the DNA struc-tures bound to proteins in data set A. Each footprintindicates the atoms of the DNA that interact with theprotein, using a CPK depiction of the DNA double helix.A DNA atom that lost >0.1 AÊ 2 on complexation with theDNA structure was de®ned in DNA-protein-interface.The footprints differentiate between sugar-phosphatebackbone atoms and base atoms of the DNA. Examplesof DNA interface footprints are shown in Figure 4 along-side the complementary footprint of the protein.

Acknowledgements

This work was carried out with funding from theDepartment of Energy, USA (grant number DE-FG02096ER62166.A000). We acknowledge the support ofall those involved in the Nucleic Acid Databank (NDB),especially Christine Zardecki and John Westbrook. Wealso thank Michael Huang for creating the footprint dia-grams of the DNA, Nicholas Luscombe for the use ofNUCPLOT, and Shri Jian for generating the co-ordinatesof the canonical B-DNA structure.

References

Beamer, L. J. & Pabo, C. O. (1992). Re®ned 1.8 AÊ crystalstructure of the gamma repressor-operator complex.

J. Mol. Biol. 227, 177-196.

Berman, H. M. (1997). Crystal studies of B-DNA: theanswers and the questions. Biopolymers, 44, 23-44.

Berman, H. M., Olson, W. K., Beveridge, D. L.,Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S. H.,Srinivasan, A. R. & Schneider, B. (1992). TheNucleic Acid Database. A comprehensive relationaldatabase of three-dimensional structures of nucleicacids. Biophys. J. 63, 751-759.

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. F., Brice, M. D., Rodgers, J. R., Kennard, O.,Shimanouchi, T. & Tasumi, M. (1977). The ProteinData Bank: a computer-based archival ®le of macro-molecular structures. J. Mol. Biol. 112, 535-542.

Blackburn, G. M. & Gait, M. J. (1996). Nucleic Acids inChemistry and Biology, Oxford University Press,New York.

Brennan, R. G. (1992). DNA recognition by the helix-turn-helix motif. Curr. Opin. Struct. Biol. 2, 100-108.

Chasman, D. I., Flaherty, K. M., Sharp, P. A. &Kornberg, R. D. (1993). Crystal structure of yeastTATA-binding protein and model for interactionwith DNA. Proc. Natl Acad. Sci. USA, 90, 8174-8178.

Cheng, X., Balendiran, K., Schildkraut, I. & Anderson,J. E. (1994). Structure of PviII endonuclease withcognate DNA. EMBO J. 13, 3927-3935.

Cho, Y., Gorina, S., Jeffrey, P. D. & Pavletich, N. P.(1994). Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenicmutations. Science, 265, 346-355.

Dickerson, R. E. (1998). DNA bending: the prevalence ofkinkiness and the virtues of normality. Nucl. AcidsRes. 26, 1906-1926.

Dickerson, R. E., Bansal, M., Calladine, C. R.,Diekmann, S., Hunter, W. N., Kennard, O., vonKitzing, E., Lavery, R., Nelson, H. C., Olson, W. K.,Saenger, W., Shakked, Z., Sklenar, H., Soumpasis,D. M. & Tung, C. S., et al. (1989). De®nitions andnomenclature of nucleic acid structure parameters.EMBO. J. 8, 1-4.

Dickerson, R. E., Goodsell, D. & Kopka, M. L. (1996).MPD and DNA bending in crystals and solution.J. Mol. Biol. 256, 108-125.

Drew, H. R., Wing, R. M., Takano, T., Broka, C.,Tanaka, S., Itakura, K. & Dickerson, R. E. (1981).Structure of a B-DNA dodecamer - conformationand dynamics. Proc. Natl Acad. Sci. USA, 78, 2179-2183.

Elrod-Erickson, M., Rould, M. A., Nekludova, L. &Pabo, C. O. (1996). Zif268 protein-DNA complexre®ned at 1.6 AÊ : a model system for understandingzinc ®nger-DNA interactions. Structure, 4, 1171-1180.

Faucher, J. & Pliska, V. (1983). Hydrophobic parametersp of amino acid side-chains from the partitioning ofN-acetyl-amino-acid amides. Eur. J. Med. Chem. 18,369-375.

Feng, J. A., Johnson, R. C. & Dickerson, R. E. (1994).Hin recombinase bound to DNA: the origin ofspeci®city in major and minor groove interactions.Science, 263, 348-355.

Ghosh, G., Duyne, G. V., Ghosh, S. & Sigler, P. B.(1995). Structure of NF-kB p50 homodimer boundto a kB site. Nature, 373, 303-310.

Gorin, A. A., Zhurkin, V. B. & Olson, W. K. (1995).B-DNA twisting correlates with base-pair mor-phology. J. Mol. Biol. 247, 34-48.

Hagerman, P. J. (1990). Sequence-directed curvature of

DNA. Annu. Rev. Biochem. 59, 755-781.
Page 19: Protein-DNA interactions: a structural analysis

Protein-DNA Interactions 895

Harrison, S. C. (1991). A structural taxonomy of DNA-binding domains. Nature, 353, 715-719.

Hegde, R. S., Grossman, S. R., Laimins, L. A. & Sigler,P. B. (1992). Crystal structure at 1.7 AÊ of the bovinepapillomavirus-1 E2 DNA-binding domain boundto its DNA target. Nature, 359, 505-512.

Horton, N. C. & Perona, J. J. (1998). Role of protein-induced bending in the speci®city of DNA recog-nition: crystal structure of EcoRV endonucleasecomplexed with d(AAAGAT) � d(ATCTT). J. Mol.Biol. 277, 779-787.

Hubbard, S. J. (1990). NACCESS. A computer programmein Fortran, University College, London.

Hurst, H. C. (1995). Transcription factors 1: bZIP pro-teins. Protein Pro®le, 2, 105-141.

Jones, S. & Thornton, J. M. (1995). Protein-protein inter-actions: a review of protein dimer structures. Prog.Biophys. Biol. 63, 31-65.

Jones, S. & Thornton, J. M. (1996). Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA, 93,13-20.

Kaptein, R. (1991). Zinc-®nger structures. Curr. Opin.Struct. Biol. 2, 109-115.

Kellor, W., Konig, P. & Richmod, T. J. (1995). Crystalstructure of a BZIP/DNA complex at 2.2 angstroms:determinants of DNA speci®c recognition. J. Mol.Biol. 254, 657-667.

Kim, Y., Geiger, J. H., Hahn, S. & Sigler, P. B. (1993).Crystal structure of a yeast TBP/TATA-box com-plex. Nature, 365, 512-520.

Kodandapani, R., Pio, F., Ni, C. Z., Piccialli, G., Klemsz,M., Mckercher, S., Maki, R. A. & Ely, K. R. (1996).A new pattern for helix-turn-helix recognitionrevealed by PU.1 ETS-domain-DNA complex.Nature, 380, 457-460.

Konig, P., Giralgo, R., Chapman, L. & Rhodes, D.(1996). The crystal structure of the DNA-bindingdomain of yeast RAP1 in complex with telomericDNA. Cell, 85, 125-136.

Kostrawa, D. & Winkler, F. K. (1995). Mg2� binding tothe active site of EcoRV endonulcease: a crystallo-graphic study of complexes with substrate and pro-duct DNA at 2 AÊ resolution. Biochemistry, 34, 683-696.

Larson, C. L. & Verdine, G. L. (1996). The chemistry ofprotein-DNA interactions. In Bioorganic Chemistry:Nucleic Acids (Hecht, S. M., ed.), pp. 324-346,Oxford University Press, New York.

Laskowski, R. A. (1995). SURFNET - a program forvisualizing molecular-surfaces, cavities, and inter-molecular interactions. J. Mol. Graph. 13, 323-330.

Lavery, R. & Sklenar, H. (1989). De®ning the structureof irregular nucleic acids: conventions and prin-ciples. J. Biomol. Struct. Dynam. 4, 655-667.

Lee, B. & Richards, F. M. (1971). The interpretation ofprotein structures: estimation of static accessibility.J. Mol. Biol. 55, 379-400.

Lilley, D. (1986). Bent molecules - how and why? Nature,320, 487-488.

Littlewood, T. D. & Evan, G. I. (1995). Transcription fac-tors 2: helix-loop-helix. Protein Pro®le, 2, 621-653.

Luisi, B. F., Xu, W. X., Otwinoski, Z., Freedman, L. P.,Yamamoto, K. R. & Sigler, P. B. (1991). Crystal anal-ysis of the interaction of glucocorticoid receptorwith DNA. Nature, 352, 497-505.

Luscombe, N. M., Laskowski, R. A. & Thornton, J. M.(1997). NUCPLOT: a program to generate schematicdiagrams of protein-nucleic acid interactions. Nucl.

Acid Res. 25, 4940-4945.

Luscombe, N. M., Laskowski, R. A., Westhead, D. R.,Milburn, D., Jones, S., Karmirantzou, M. &Thornton, J. M. (1998). New tools and resources foranalysing protein structures and their interactions.Acta Crystallog. sect. D, 54, 1132-1138.

Ma, P. C. M., Rould, M. A., Weintraub, H. & Pabo, C. O.(1994). Crystal structure of MyoD bHLH domain-DNA complex: perspectives on DNA recognitionand implications for transcriptional activation. Cell,77, 451-459.

Marmorstein, R., Carey, M., Ptashne, M. & Harrison,S. C. (1992). DNA recognition by GAL4: structureof a protein-DNA complex. Nature, 356, 408-414.

McClarin, J. A., Frederick, C. A., Wang, C., Greene, P.,Boyer, H. W., Grable, J. & Rosenberg, J. M. (1986).Structure of the DNA-EcoRI endonuclease recog-nition complex at 3 AÊ Resolution. Science, 234, 1526-1541.

McDonald, I. K. & Thornton, J. M. (1994). Satisfyinghydrogen-bonding potential in proteins. J. Mol. Biol.238, 777-793.

McLachlan, A. D. (1982). Rapid comparison of proteinstructures. Acta Crystallog. sect. A, 38, 871-873.

Nekludova, L. & Pabo, C. O. (1994). Distinctive DNAconformation with enlarged major groove is foundin Zn-®nger-DNA and other protein-DNA com-plexes. Proc. Natl Acad. Sci. USA, 91, 6948-6952.

Newman, M., Strzelecka, T., Dorner, L. F., Schildkraut,I. & Aggarwal, A. K. (1995). Structure of BamHIendonuclease bound to DNA: partial folding andunfolding on DNA binding. Science, 269, 656-663.

O'Gara, M., Klimasauskas, S., Roberts, R. J. & Cheng, X.(1996). Enzymatic C5-cytosine methylation of DNA:mechanistic implications of new crystal structuresfor HhaI methyltransferase-DNA-AdoHcy com-plexes. J. Mol. Biol. 261, 634-645.

Olson, W. K. (1996). Simulating DNA at low resolution.Curr. Opin. Struct. Biol. 6, 242-256.

Olson, W. K. & Zhurkin, V. B. (1996). Twenty years ofDNA bending. In Ninth Conversation in BiomolecularStereodynamics, Adenine Press, Albany, NY.

Olson, W. K., Gorin, A. A., Lu, X. J., Hock, L. M. &Zhurkin, V. B. (1998). DNA sequence-dependentdeformability deduced from protein-DNA crystalcomplexes. Proc. Natl Acad. Sci. USA, 95, 11163-11168.

Parkinson, G., Wilson, C., Gunasekera, A., Ebright,Y. W., Ebright, R. E. & Berman, H. M. (1996). Struc-ture of the CAP-DNA complex at 2.5 AÊ resolution:a complete picture of the protein-DNA interface.J. Mol. Biol. 260, 395-408.

Pathak, D. & Sigler, P. B. (1992). Updating structure-function relationships in the bZip family of tran-scription factors. Curr. Opin. Struct. Biol. 2, 116-123.

Phillips, S. E. (1994). The b-ribbon DNA recognitionmotif. Annu. Rev. Biophys. Biomol. Struct. 23, 671-701.

Raumann, B. E., Brown, B. M. & Sauer, R. T. (1994a).Major groove DNA recognition by b-sheets: the rib-bon-helix-helix family of gene regulatory proteins.Curr. Opin. Struct. Biol. 4, 36-43.

Raumann, B. E., Rould, M. A., Pabo, C. O. & Sauer, R. T.(1994b). DNA recognition by b-sheets in the Arcrepressor-operator crystal structure. Nature, 367,754-757.

Rice, P. A., Yang, S., Mizuuchi, K. & Nash, H. (1996).Crystal structure of an IHF-DNA complex: a protein

induced DNA U-turn. Cell, 87, 1295-1306.
Page 20: Protein-DNA interactions: a structural analysis

896 Protein-DNA Interactions

Sarai, A., Mazur, J., Nussinov, R. & Jernigan, R. L.(1989). Sequence dependence of DNA conformation-al ¯exibility. Biochemistry, 28, 7842-7849.

Sayle, R. A. & Milner-White, E. J. (1995). RASMOL - Bio-molecular graphics for all. Trends Biochem. Sci. 20,374-376.

Schmiedeskamp, M. & Klevit, R. E. (1994). Zinc ®ngerdiversity. Curr. Opin. Struct. Biol. 4, 28-35.

Schneider, B., Neidle, S. & Berman, H. M. (1997). Con-formations of the sugar-phosphate backbone in heli-cal DNA crystal structures. Biopolymers, 42, 113-124.

Schulz, G. E. & Schirmer, R. H. (1979). Principles of Pro-tein Structure, Springer-Verlag, New York.

Schumacher, M. A., Choi, K. Y., Zalkin, H. & Brennan,R. G. (1994). Crytal structure of LACI family mem-ber, PURR, bound to DNA: minor groove bindingby alpha helices. Science, 266, 763-770.

Shakked, Z., Guzlkevich-Guerstein, G., Frolow, F.,Rabinovich, D., Joachimiak, A. & Sigler, P. B.(1994). Determinants of repressor/operator recog-nition from the structure of the trp operator bindingsite. Nature, 368, 469-473.

Somers, W. S. & Phillips, S. E. V. (1992). Crystal struc-ture of the met repressor-operator complex at 2.8 AÊ

resolution reveals DNA recognition by b sheets.Nature, 359, 387-460.

Suzuki, M. & Gerstein, M. (1995). Binding geometry ofa-Helices that recognize DNA. Proteins: Struct.Funct. Genet. 23, 525-535.

Suzuki, M. & Yagi, N. (1995). Stereochemical analysis ofDNA bending by transcription factors. Nucl. AcidsRes. 23, 2083-2091.

Taylor, W. R. & Orengo, C. A. (1989). Protein structurealignment. J. Mol. Biol. 208, 1-22.

Travers, (1992). DNA conformation and con®gurationin protein-DNA complexes. Curr. Opin. Struct. Biol.

2, 71-77.

(Received 28 September 1998; received in r

Trifonov, E. N. (1985). Curved DNA. CRC Crit. Rev. Bio-chem, 19, 89-106.

Trifonov, E. N. (1991). DNA in pro®le. Trends Biochem.Sci. 16, 467-470.

Vassylyev, D. G., Kashiwagi, T., Mikami, Y., Aryoshi,M., Iwai, S., Ohtsuka, E. & Morikawa, K. (1995).Atomic model of a pyrimidine dimer excision repairenzyme complexed with a DNA structural basis fordamaged DNA recognition. Cell, 83, 773-782.

Westhead, D. R., Hatton, D. C. & Thornton, J. M. (1998).An atlas of protein topology cartoons available onthe World-Wide Web. Trends Biochem. Sci. 23, 35-36.

Winkler, F. K., Banner, D. W., Oefner, C., Tsernoglou,D., Brown, R. S., Heathman, S. P., Bryan, R. K.,Martin, P. D., Petratos, K. & Wilson, K. S. (1993).The crystal structure of EcoRV endonuclease and itscomplexes with cognate and non-cognate DNAfragments. EMBO J. 12, 1781-1795.

Wintjens, R. & Rooman, M. (1996). Structural classi®-cation of HTH DNA-binding domains and protein-DNA interaction modes. J. Mol. Biol. 262, 294-313.

Wright, P. E. (1994). POU domains and homeodomains.Curr. Opin. Struct. Biol. 4, 22-27.

Xu, W., Rould, M. A., Jun, S., Desplan, C. & Pabo, C. O.(1995). Crystal structure of a paired domain-DNAcomplex at 2.5 AÊ resolution reveals structural basisfor Pax developmental mutations. Cell, 80, 639-650.

Yang, W. & Steitz, T. A. (1995). Crystal structure of thesite-speci®c recombinase gd resolvase complexedwith a 34 bp cleavage site. Cell, 82, 193-206.

Young, M. A., Ravishanker, G., Beveridge, D. L. &Berman, H. M. (1995). Analysis of local helix bend-ing in crystal structures of DNA oligonucleotidesand DNA-protein complexes. Biophys. J. 68, 2454-

2468.

evised form 25 February 1999

Edited by K. Nagai

; accepted 2 March 1999)


Recommended