+ All Categories
Home > Documents > Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and...

Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and...

Date post: 11-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
6
Volume 304. number l. 15-20 FEBS 11086 O 1992 Federation of European Biochemical Societies 00145'793t921$5.00 June1992 Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains Robert B. Russell, JasonBreed and GeoffreyJ. Barton University oJ Oxford, Laboratory of MolecularBiophysics, The RexRichards Building,SouthParks Road,Oxford OXt 3QU, UK Received l7 March 1992; revised version received l3 April 1992 Src homology 2 (SH2) regionsare short (approximately100 amino acids),non-catalytic domainsconserved among a wide variety of proteins involved in cytoplasmic signaling induced by growth factors. It is thought that SH2 domainsplay an important role in the intracellularresponse to growth factor stimulation by binding to phosphotyrosine containing proteins. In this paper we apply the techniques of multiple sequence alignment, secondary structure prediction and conservation analysis to 67 SH2 domain amino acid sequenffis. This combined approach predrcts seven core secondary structureregionswith the patternB-a-f-f-f-f-a, identifies thoseresidues most likely to be buried in the hydrophobiccore of the native SH2 domain, and highlightspatterns of conservation indicativeof secondary structural elements. Residues likely to be involved in phosphotyrosine binding are shownand orientations ofthe predicted secondary structures suggested which could enable such residues to cooperate in phosphate binding. We propose a consensus pattem that encapsulates the principal conserved features of the SH2 domains.Comparisonof the proposedSH2 domain of akt to this pattern shows only l2l40 matches. suggesting that this domain may not exhibit SH2-like properties. SH2 domainl Structureprediction;Conservation analysis; Alignment; Tyrosine kinase I. INTRODUCTION Several growth factors stimulate cell proliferation by bindingto receptors with proteintyrosine kinase activi- ties. Binding inducesactivation of the receptor and leads to autophosphorylation of the receptor. The in- duction of receptor activation is associated with tyro- sine kinase activity and leads to phosphorylation of various cytoplasmicsubstrates, many of which are thoughtto be involved in intracellular signal transduc- tion. Recently, a conserved family of domainshas been shownto mediate the association of cytoplasmic sub- strates and specific autophosphorylation sites on the receptors. These src homology2 (SH2) domains have been found in a wide varietyproteins involved in intra- cellular signal transduction(see Koch et al. [], or Heldin [2] for reviews). SH2 domains areshort(approx- imately 100amino acids), non-catalytic regions, which are thought to facilitate recognition by binding to phosphotyrosine containing proteins. No tertiary structure informationon these SH2 do- mainsis available at present. Further, since they show no obvious sequence similarity to any domains of known three-dimensional structure, structural infer- Correspondenceaddress; G.J. Barton, University of Oxford, Labora- tory of Molecular Biophysics, The Rex Richards Building, South Parks Road, Oxford OXl 3QU, UK. Fax: (44) (865) 510 454. Publtshed by Elsevrcr Scrcnce Publishers B.V. ences may not be made through homology modelling techniques. Several sequencealignments of SH2 domains have beenproposed [1,3-5]. Koch et al. [1] describe five con- servedregions separated by four variable segments. The most highly conserved region is that corresponding to the FLVRES sequencefound in src, which has been implicated in phosphotyrosine binding [,6,7]. Other residues thought to be involved in phosphotyrosine binding include an arginine near the N-terminal and a histidine near the C-terminal end [,61. In this paper we extend these studies of residue con- servation by applying the techniques of multiple se- quence alignment, secondary structure prediction and conservation analysis to 67 SH2 domain sequences known to bind phosphotyrosine containing proteins. These methods have been previously used to identify accurately the secondary structure and core residuesin the annexin domain family [8]. In the present study we highlight previously observed conservation patterns in the SH2 domains, identify residueslikely to be buried in the hydrophobic core of the domain, predict the most likely regions of secondary structure (a-helix and B- sheet) and suggest residues likely to be involved in phos- phate binding in the light of general studies of phos- phate protein interactions [9]. Our analyses suggest re- gions of the SH2 domains that are likely to be involved in phosphotyrosine binding and provide tertiary struc- tural interpretations of recent site-directed mutagenesis experiments [6]. 15
Transcript
Page 1: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304. number l. 15-20 FEBS 11086O 1992 Federation of European Biochemical Societies 00145'793t921$5.00

June 1992

Conservation analysis and structure prediction of the SH2 family ofphosphotyrosine binding domains

Robert B. Russell, Jason Breed and Geoffrey J. Barton

University oJ Oxford, Laboratory of Molecular Biophysics, The Rex Richards Building, South Parks Road, Oxford OXt 3QU, UK

Received l7 March 1992; revised version received l3 April 1992

Src homology 2 (SH2) regions are short (approximately 100 amino acids), non-catalytic domains conserved among a wide variety of proteinsinvolved in cytoplasmic signaling induced by growth factors. It is thought that SH2 domains play an important role in the intracellular responseto growth factor stimulation by binding to phosphotyrosine containing proteins. In this paper we apply the techniques of multiple sequencealignment, secondary structure prediction and conservation analysis to 67 SH2 domain amino acid sequenffis. This combined approach predrctsseven core secondary structure regions with the patternB-a-f-f-f-f-a, identifies those residues most likely to be buried in the hydrophobic coreof the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements. Residues likely to be involved inphosphotyrosine binding are shown and orientations ofthe predicted secondary structures suggested which could enable such residues to cooperatein phosphate binding. We propose a consensus pattem that encapsulates the principal conserved features of the SH2 domains. Comparison ofthe proposed SH2 domain of akt to this pattern shows only l2l40 matches. suggesting that this domain may not exhibit SH2-like

properties.

SH2 domainl Structure prediction; Conservation analysis; Alignment; Tyrosine kinase

I . INTRODUCTION

Several growth factors stimulate cell proliferation bybinding to receptors with protein tyrosine kinase activi-ties. Binding induces activation of the receptor andleads to autophosphorylation of the receptor. The in-duction of receptor activation is associated with tyro-sine kinase activity and leads to phosphorylation ofvarious cytoplasmic substrates, many of which arethought to be involved in intracellular signal transduc-tion.

Recently, a conserved family of domains has beenshown to mediate the association of cytoplasmic sub-strates and specific autophosphorylation sites on thereceptors. These src homology 2 (SH2) domains havebeen found in a wide variety proteins involved in intra-cellular signal transduction (see Koch et al. [], orHeldin [2] for reviews). SH2 domains are short (approx-imately 100 amino acids), non-catalytic regions, whichare thought to facilitate recognition by binding tophosphotyrosine containing proteins.

No tertiary structure information on these SH2 do-mains is available at present. Further, since they showno obvious sequence similarity to any domains ofknown three-dimensional structure, structural infer-

Correspondence address; G.J. Barton, University of Oxford, Labora-tory of Molecular Biophysics, The Rex Richards Building, SouthParks Road, Oxford OXl 3QU, UK. Fax: (44) (865) 510 454.

Publtshed by Elsevrcr Scrcnce Publishers B.V.

ences may not be made through homology modellingtechniques.

Several sequence alignments of SH2 domains havebeen proposed [1,3-5]. Koch et al. [1] describe five con-served regions separated by four variable segments. Themost highly conserved region is that corresponding tothe FLVRES sequence found in src, which has beenimplicated in phosphotyrosine binding [,6,7]. Otherresidues thought to be involved in phosphotyrosinebinding include an arginine near the N-terminal and ahistidine near the C-terminal end [,61.

In this paper we extend these studies of residue con-servation by applying the techniques of multiple se-quence alignment, secondary structure prediction andconservation analysis to 67 SH2 domain sequencesknown to bind phosphotyrosine containing proteins.These methods have been previously used to identifyaccurately the secondary structure and core residues inthe annexin domain family [8]. In the present study wehighlight previously observed conservation patterns inthe SH2 domains, identify residues likely to be buriedin the hydrophobic core of the domain, predict the mostlikely regions of secondary structure (a-helix and B-sheet) and suggest residues likely to be involved in phos-phate binding in the light of general studies of phos-phate protein interactions [9]. Our analyses suggest re-gions of the SH2 domains that are likely to be involvedin phosphotyrosine binding and provide tertiary struc-tural interpretations of recent site-directed mutagenesisexperiments [6].

1 5

Page 2: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304. number I

2. MATERIALS AND METHODS

SH2 domain sequences were obtained from the PIR [10] version 30databank by scanning with the SH2 domain of human src using therigorous Smith and Waterman [1] algorithm. Obvious mismatcheswere removed, and sequences were only included in our analysis ifthey(or a close relative) had been shown previously to bind phosphoty-rosine containing protelns. A total of 60 sequences were extractedfrom the databank. Seven additional domains (Bovine p85 a andp N-and C-terminal SH2 domains [2]; human phosphotyrosine phosphat-

ase (PTP) 1C N- and C-terminal SH2 domains [3]; avran tensin SH2domain [5]) were obtained from the literature.

The alignment of 67 SH2 domarns shown in Fig. I was generated

by following a dendrogram derived from pairwise alignment scores

[ 3]. Poorly conserved regions within the alignment were then adjustedby eye to place a single gap between the highly conserved regrons. Inthe following discussion, positions wrthin the alignment are referredto by the name of the most common amino acid occurring and itslocation (e.g. arginine-39 refers to the highly conserved arginine withinthe human src FLVRES sequence).

Secondary structure predictions were obtained by combining theresults of five predictive algorithms applied independently to eachaligned sequence. Helix and strand predictions were performed bycombining the methods of Lim [14], Chou and Fasman [ 5] and Rob-son [16], whilst the methods of Rose [17] and Wilmot and Thornton

[18] were used to predict turns. The combrnatron of methods wasperformed as described in Banga et al. [19]. Where ambrguities exist,all predicted structural types are shown.

3. RESULTS AND DISCUSSION

3.1 . Secondary structure prediction and structural impli-cations of residue conservation

In a multiple alignment, regions found to contain alarge number of gaps, or which are varied in composi-tion across the family of sequences can be predicted asloop or non-core secondary structure regions with ahigh degree of confidence [20-221. On this basis wedefine seven regions of probable secondary structurebetween such loops (labelled A through G in Fig. 1).The prediction of turns by the methods of Rose [ 7] andWilmot and Thornton [ 8] reinforces the assignment ofregions A-G as secondary structures separated byloops.

In order to satisfy thermodynamic requirements. hy-drophobic residues (for example, leucine, valine ormethionine) are most often buried within the hydropho-bic protein core (e.g. see [23] and refs. therein). Accord-ingly, a position exhibiting conservation of hydropho-bic character across all family members implies it isimportant to the core of the structure. We thereforesuggest that posi t ions 3, 4, 5, 8,22,36,37, 38, 49. 51,

June 1992

53 , 68 , 71 ,73 ,84 , 86 , 93 ,96 .99 , 100 , 102 and 103 a relikely to be buried in the core of the native SH2 three-dimensional structure.

Although current secondary structure predictiontechniques achieve only about 60Vo aacvracy when ap-plied to a single sequence, prediction accuracy can beimproved by applying several complementary tech-niques to an aligned family of sequences. In addition,conservation of hydrophobic residues across the familyof proteins can be used to reinforce the results of sec-ondary structure prediction where characteristic pat-terns of conservation for a-helix and p-strand are ob-served. The result of applying these principles to theSH2 domain alignment are summarised in Table I.

Between positions 96 and 103, region G exhibits astriking pattern of hydrophobic conservation at i (= 96).i + 3. i + 4 and i + 7; were this to be an a-helix, theseresidues would all lie on one side. The pattern is oftenseen for a helix which packs against the core of theprotein and has been observed to interact with a singlehydrophobic residue, perhaps on a sheet [24].

In a strand or extended structure, the protein back-bone adopts a conformation where the sidechains ofsequential residues point in opposite directions. Conser-vation patterns involving alternating hydrophobic andhydrophilic residues are therefore often indicative ofsurface B-strands. Within those regions predicted as B-strands in our alignment, region D (hydrophobic posi-tions 51, 53: hydrophil ic positions 50, 52) and region E(hydrophobic positions 11,73: hydrophil ic positions 70,72) exhibit this pattern across most of the 67 SH2 do-mains considered.

Short conserved stretches of hydrophobic residuesare often indicative of buried p-strands. Regions A (po-

sit ions 3, 4. 5), C (positions 36, 37 ,38) and F (positions84, 85, 86) show this pattern. Within region A this ob-servation disagrees with the results of the combinedsecondary structure prediction. However, the toleranceof helix breaking residues such as glycine at position 6and proline at positions 7 and 9 are suggestive of a Gt

B-bulge [25], and hence support the prediction of B-strand within region A. In region C, there is a conservedhydrophobic at position 37 and a small residue con-served at position 35. A small sidechain at position i + 2from the central hydrophobic residue in a strand oftenaccommodates the packing of the hydrophobic face ofan e-helix against the sheet [24].

FEBS LETTERS

-)

Fig. l. Multiple alignment of 67 SH2 domains. Sequences are identified by their name and NBRF-PIR databank code (in parentheses) where

available. The last digit in the number above the alignment shows alignment position. A 15 residue segment in avian tensin (n, position 67), and

a 16 residue segment in avian sarcoma virus crk and gag-crk (n, positron 80) were removed for clarity. Boxed amino acids correspond to regions

A-G as defined in the text. The numbers between hrghly conserved regrons show the smallest and largest lengths present within the family. The

consensus sequence shows residues (top to bottom) which occur at a given position in order oftheir frequency. Consensus residues are only reported

for positions within regions A G, and only those amino acids which occur more than once are reported. If more than six amino acids occur more

than once at a position, the posrtion is considered highly variable and denoted by x rn the consensus line. Predicted secondary structure is described

as helix (h or H), extended (B-strand) (e or ts) and turn (t or T). Capital boxed regions highlight the most strongly predicted secondary structures.

The summary secondary structure is as defined in Table L

t 6

Page 3: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304, number I FEBS LETTERS June 1992

( , ( , ( , ()()( , ()( ,B A A I r l r i t r A O> > > > > > l Eo

zo

z >

F

F F > >

c|lq oC'C c } z z0 . a o o

v v C

a o a

8 3 3

( J ( , ( ,

c c cz < t so e o

3 3 =

z z z z

E C C C

a r d Q a

' F B 3

( ) ( ) ( ) ( J ( )v x c c x< < z > <Z Z < t ' 1 z

r r q > F k

z z z z

9 9 ( ) ( ,H V V H< < < @z z z <

k h & >

z z

z z

a o o ( , ( ,a > > v v

- - - . f . . . / / - @ . > > > > > > > F H F F F > > E I F F F E E E 5 l . 1 . 1 . 1 . 1 . ; f f i: : I I : : l : : l i t l g l E g g I l lX I ILE E < 4 q q g q x r r E t C t r > i " ; i i i i i e r E E r !d > E r E E iE r r e br i F z < <

l i : : i i 3 i 3 : : : : : : i : i i ? i i i ? i ? ? : r r i r i r r r r r r ; r ; ; ; ; ; ; : * _ * * * H H H > > H H H H > a r >r r i r a d r a r e i i F r d r r d i i q d r j j d r r j a j i E e : : : : : : : : : : : : : : : : : : : : : : : i : : : : : : : jq r n o a c ' o o o c ' c ' o c ' o v c ' v v . v . y - 9 - q t - g o o c r c i i l o o 6 E 6 G o r , l e e a a a a o a a a a e e o a < < e a o o r j r j i o a @ @v v > > . ' o . ' o . ' o o o o o o o . ' o o o o o o o i i 6 6 E ; ; a ; t < < < < < < < o r c n t t t ; ; u > > > > > @ & L q a a a * . q a; ; ; ; ; ; ; t j j j j : j j j j : : : : : j j i j i ; : i F * : : ; : r r r r r r r - r H r r ! r r H H > j j a r H H r H H H r r p@ q @ @ a h a o @ 6 @ @ 6 a 6 ? q - ? g q ? @ t a p o E r o o o o o o o e p e . a g a e e . i . g i ; ; ; ; ; ; ; , e ; ; ; ; ; ; t i ; ; d ; ;ygZZZZZy.Zf f f f f f 3 f 1 ;1 y f i y , ? i y i yFy i y y =>>>>a; i c ; ; ; ; ; ; z z t r a o o 6 < < a a r @ < e o

a z z E { r - i o c ' o o 0 o o o o o o 0 o g g q g g c ' o o F F r r ! < F r r F A C C C C E v C v o < > > r . t r > > > v ao a > > r r F F F t r F F t F F < q . { - < - <

" i f r i t : : : : : f : : : : f f , l : f t : I : f 3 : r F F F F F r r e q q j ! i ; e ; ! ! ; ; i , ) i , ' _ . a = z z z z c o a a a a ! r o o , q rA e e e o @ o b @ o , , o . e , . e , i i i i i ! . ! . F F F F ; ; ; e i ; ; E : : i ; ; 5 ; B ; X F I A ? Z 2 E Z Z Z : : E i A n : H : ; H R g : lo < < h a F k F h F k H F H F t s H k H H k F k k u u , ^ . ^ . ^ , ^ . ^ . ) l ) - - , . - - _ , . . - - . - -

q a o a a c t r q H H c c c c c c o

H g g : l g l : 9 y g y y i g r g g g g s j g 3 g j : E ! ! f E s . * _ = r _ H r * v @ ! r q t r c t r t r c r v v q & a v E E O > i c r > c t r a

F F Oc v

a tr a A q O A rrl O A rtl I'l ri O A A (tl O !i ril O A Q A A A e A A a a w @ o @ ts e r [ & [ A O a _ V V t v X g I I < < I O g @ o !(A A O rr l O O A O A Q A O O O a a O A A A A A A O rt l A AA a E: i E !C I I I > >:C Z V V t t V y H a o @ o Z O c'Oc'C'c 'Tq O () O

Y V V > c ' O c ' c ' a

z z z t 4 A Q A A Ad c c o F = 3 3 8

(r()z

Ff i o d a

> v vz o o> c rc

9 Q a a

o > q Q

> a c rl > h

i H > >o F r ( , O

;

v d * >

a tV C > c l @ F> , L Z

> q 6 i

o E <

J >

8 @ < O -

M t r lIEI.EJ f},F l " - | k rF t o - | uF l " I PF l " l r ,Fr l " ldK,t r I " - l k rF l " - | l sr E J - l l e

! Fl L-_leF, l4 t r lE IFI

E o l F Iu F lo [ J _

A . I Ffdl FI.[4.]- LI&

.FllF lF-,1

4 F , l! o [ i l

! [d l - I rda lo l - | F r!|el- | rd!

"+. IqF! o F r l I F! oF-.| L_!E oEf l

! t r l! F-.Ie f !

4 trrl! l!:l

! r r l - | loF{l- | loF I lqloFil I laH L-, Lloo F " l

F. l[:l

F]E tsJ

p. Fl loE l 0 f r r l l or f d l I pl t d l I p! [ql lqpr o - l l oE * l l oE Fi l Llo4 lF, l! l ts lo [ ' l

l F l

t r Y ' & t . t < z z z z z z z z z z z z z 4 4 z z z z z z i t z > > 2 2 3 3 7 6 6 6 6 U 6 E 8 d ; I I i I g I U i i i U : I 9 : 9 Y A 5 . 5 . 3o a e s o o e a o e a ! { a o a o q e o o o o o o o e o o o o o o 6 s g ; ; ; ; ; ; ; ; 6 2 ; ; : ; ?

_ , e e s o o < t r o o vo l o c c H A < A a I a a a J A A q a a & o a a a r " r c r r o r i , u c o a a - a i i u t ; o a E c t F t s F F F F F o o o o < iz z z z z @ z z q z z z z z z q @ a r r . < < < l i i < 2 X z > > E > > > @ a a o a c r q i

d a r ! Q d ! I j i e I a j i t J d I i a t I i t a t l >

O O O A O

A Q Q A A

q

g ^ e ^

$ I i I ?gitrie,i$*+?c?$?cagercFciea?ffi ?,,*??B',?3i?l?? 3?il?[???a'i ??i! F3Fi; j g3i ! i?! i gi$si if f3?fi3iifiii! ! fi 3ii 3 f ili!! ifgi !i 3f3i j i iii! ff u I> o o o o a , > o > a ! o d 6 t i< a o a o E E < t r < < o d < < < :

!a c '> I cd H

o c ' < z >

o Q O < t o

Fl. l-loI r l " I lolE l " | t rj r l - l l oF l * I o l oFI IF I J IOE J l F l l l a! El Llo

tEil, l l <! - l l <a - l l <4 tqt<r * l l <A F . l I l <Fl lr. l L_l<tE t F , lEJ FJ

@ F r g A A Zr r ) E

( ) v < oq x l > > r> hB

i , ; - E 6

J T J E ; g

F! ! F i i i i r i f ; ; i ; ; i i& > > > o & 6 6 A b b b t u r b t ! ! ! ! ! ! ! ! !

> > v x v r x v v v v r v v v v v v y x x x x t r l * v v v ; ! e c c ;'i a c a c c ; ; ! E E'e'e'au ctr tr E c c * _ * * - _ c * _i H : : i i i i i i i i i l X i t l : l l : i l : l t t i i t ! : : : r > > > > > r i h o o o o o t r e o o o o t r t r a h t r k a r r fI E I : t I E E q E E E E E E E E ! d E r s s r q r t t t i E E ! ! ; ; ; ; ; ; ; ; ; ; i : ; y y ; X * , u u e o r r r e r k a r r ): y y y y r y y ! ! y l y l y r r y F " y F y ! ! y ! ; ; ; i i J i I I i I I I I I I : I I X I I S B B S B B : H E H H E E I I :

H > > > > > > > > a t d t a a l >f 1 1 1 i 1 t 1 T a @ - @ @ @ @ @ @ a q q a @ @ @ @ 6 @ a q a q a a q a @ 6: ? : i d d d d a a d : d ; : ; ; ; ; ; ; ; ; ; ; : ; ; 3 3 3 3 3 : : : : : : : : : : r : : : : : : s 3 I : 3 : 3 3 r : : : 3 : :< < H t i o o o o o o o o o o o @ 6 a h 6 6 6 6 6 t ^ ; ; ; ; ; ; ; ; : -

' H x H H H r r r e r j a r r { r n I j I I r> > > > > > > ! ) . > > > > > , " J l r g g g 3 E 3 3 e ? ? ? ? ? ? 2 2 3 9 3 3 1 | l . 1 1 5 1 < < t . l F F t { F > > > > > >

a @ < < a @ q a @ q @ @ @ h 6 @ a- ? ? @ @ q . q _ q _ _ q _i_466G6GtiZ.wZa d o a v a e y e q H a a o o d q e al q q ql ql q 14 o ci ri rli qi iiqi v e o ltr H a a o o r <roC C C C C d C C C C C C C d d d d d d . d d d d d d d d ; ; ; ; ; ; - -a d o a v a o v o q o p e o o c i E d d l i t i e i s i c i " i 4 , i e " i ; i i i ; i i E f i f f i q E E q q B E n I I I I f 3 E f r f S E 3 3 B f r 3 f r f 3 3: ! : ! ! i i S S i S S S S i i i g i c ia E a ia E Ea EEa E a aE a aa E ; c ; ; i l i l i E E cc s ro e tr c c d cd c c c d d di i i : i i d i j j j j j j ; 1 j j i j j j j j j J F F r r j j j r r r r r a r r j > r r j d d r ! r d ! j d r r r a r r j e r r J rh t r b & k r & r r E r ! h u & t u h h h b h r ! k h r ! h E k r " r ! h r . E r " h ; ; ; ; e ; ; ; ; ; ; ; ; i i i l d i ; i i i : i i i i i i : i i i

5 3 E 3 3 3 5 E 3E E E E 5 3 a ; ; ; ; ; ; :; :; ;: n r t t? t nt iy:* ** y- s r ye !: i < < < < < o F F o @ @ o o e o o o o @ Fi Fi

a I d a ! d J J j l J a t a j j a j r ! Ir ! v & r r r j r d r ! j a j r r j r q q q q q q q i o 6 6 6 6 E r r r o j r r H a : < E E E E E r r r r r d E r r j r r a Q j a !B : 9 9 f H X 3 * t * * * * f l f : : f X : l X l i F c i i ; ; i > > > r > > > < @ o > a o r r : c z v r H t r c o d o o o r : r o € r o @ F ,a o z z o o a a o o a o a o r n o o & r B E I ! l e ! l ! l a a a r a r a n o o c ; { l i ; r h i i ; r ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; : l E E , ? S E g E I; J : ; n 3 r * n : : 3 : : 3 1 : : 1 : : : 1 1 1 1 1 ^ < 1 ! 1 3 ? < < < < < < < o < < < < <o v a a o a a a o a a o o o o o o e e e e e C R o o a o c i c i c i < i r , r j j . l i j ; l i i n d d ; ; ; ; ; ; ; ; : : : : ; i ; ; ; ; ; ; I ;! ; H E i g i i $ 3 $ g g X g y X X X : X 5 g X X 5 E 5 X S ! ! ! e E 4 4 4 4 4 q q 1 4 q q 1 < i s r . F ! i E E F < < @ o < i j e e o

z z q @ ta t4 ta ta F Fi F F ti o Fr o (, (, g,t s @ H H H H

V @ A O y X V I V V V V y V V X g V V I V V V V V O O Z , z z i o o d d d d _ ^ . ̂ ^ v ! ^ w d , ^ , ^ . ^v @ a o y x v r v v v v v ! v v ! ! i X i x ! ! ! 6 6 2222;a;; ; ;" i ; i ; i i i d ; ; ; ; ; i i i i i i i i : : : : i d d9 9 E E ( , 9 O O 0 0 O U 0 ( ) ( , ( , ( , o ( , o o o o o o r v v v v v v ( ) ( , ( ) o ( ) ( , o ( , ( ) s o g o Q < < < u r v ( , s o 9 g u g ( ) ( , o t r ( , ( )a t - a t t t t t a t t : t t l 3 F 1 1 l l " t " 3 : " l " g t " o a o a o q r x r r i c : c r r > < i > l r i r ; ; ; ; ; ; i ; ; ; ; ; ; ; n i

l 1

Page 4: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304. number I

Only region B (positions 15 22) is lacking in a clearconservation pattern indicative ofa specific type ofsec-ondary structure.

3.2. Mode of phosphotyrosine stablisationWithin protein structures, phosphate containing

compounds or phosphate groups are usually stabilisedin the following ways [9]: (i) by positively charged si-dechains of arginine or less frequently of lysine or his-tidine; (ii) by a network of hydrogen bonds involvingserine and threonine side chains and main chain NHatoms of small amino acids, such as glycine; or (iii) bythe positive (N-terminal) end of a helix dipole [26].

Positively charged residues are present in all of theSH2 domain sequences. The only utterly conserved res-idue is arginine-39 (Fig. 1), which has been shown re-cently to be critical for phosphate binding [6]. Arginine-l5 is conserved in all but three sequences, and histidine-70 is conserved in all but four sequences. If these resi-dues cooperate in phosphate binding, they must be closein the folded native SH2 domain structure. It is impor-tant to note that not all members of the family exhibitconservation at all of these positions. The presence orabsence of these residues may account for differing af-finities of these domains towards substrates. An inter-esting observation is that bovine GTPase activatingprotein (gap) C-terminal SH2 domain has a lysine atposition 15 and an arginine at position 70 whilst theN-terminal domain has arginine and histidine at thesepositions. This may account for the lower affinity of theC-terminal domain towards activated receptors [27].

In addition to the residues at positions 15, 39 and 70,there are positively charged residues conserved withinsome subfamilies. The most striking example is the neartotal conservation of arginine at positions 20,74 and89within src-like SH2 domains. Accordingly, substitutionof arginine-20, -74 or -89 with lysine, or a non-charged

June 1992

amino acid may disrupt the specific phosphotyrosinebinding properties of the src sub-family of SH2 do-mains.

Although there is insufficient information availableto positively identify residues that may form a hydrogenbond network, it is interesting to note that position 34is glycine, and position 52 is a serine or threonine in allbut two sequences.

3.3. Site-direcled mutagenesisSite-directed mutagenesis (SDM) is a powerful

method for inferring the structural importance of indi-vidual residues or small regions within proteins. How-ever, it is important to understand whether an intro-duced mutation affects activity by altering or removinga functional residue, or rather by causing gross confor-mational changes to the native three-dimensional struc-ture. Moreover, when the precise function of a particu-lar structure (or family of structures) is poorly under-stood, it is difficult to interpret macroscopic results(such as transformation studies) in structural terms.

Our findings suggest that only those point mutationsfound within the conserved regions A through G maybe interpreted structurally, since there is substantial var-iation outside of these regions. Furthermore, our analy-sis suggests that deletion mutation experiments shouldbe restricted to the loops linking regions A through G.Mutations that delete the predicted secondary structureregions are likely to severely disrupt the native SH2domain conformation.

Recently, Mayer et al. [6] performed substitution mu-tations in region C (Fig. l) of abl. They assayed mutantsfor specific phosphotyrosine binding, and applied lD-NMR to show that the mutant structures exhibited asimilar conformation to the native. Mayer et al. foundthat mutation of arginine-39 to lysine (R39K) and mu-tants S4lC and S43C had no detectable bindine to

FEBS LETTERS

Table I

Summary of predicted secondary structure regrons withrn the SH2 family

Region Length Predictedconformatron Comments

D

E

3-9

t5-22

3441

49-53

68 73

84 86

93 103

Strand

Helix

Strand

Strand

Strand

Strand

Helix

Weak helix prediction. Glycine at 6 suggests turn, or 6,p-bulge. Hydrophobics at 3, 4, 5possibly form a buried p-strand.

Weak prediction. No clear hydrophobic pattern.

Conserved hydrophobics at 36,37,38 suggest a buried strand. Polar residues at 39 4l suggesta strand with both sides exposed.

Conserved hydrophobics at 49,51,53 and polar residues at 50, 52 suggest surface strand.

Conserved hydrophobics at 68, 71 and 73 and polar resrdues at 70 and 72 suggest half-buriedstrand.

Hydrophobic character at 84 and 86 suggests buried strand.

Conserved hydrophobics at 96,99, 100, 103 suggest side packing against the protein core.

l 8

Page 5: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304. number I

Region

AktHuman srcMatchConsensus

Region

phosphotyrosine agarose, whilst mutants V38L, E40Qand E42Q bound to phosphotyrosine agarose with thesame affinity as the native protein. Mutations R39K,S4lC and S43C l ie at j (=39), i+2 and. j +4 whi lst(V38L, E40Q and E42Q) lie at. j - 1, j + I and/+ 3.These findings are easily interpreted if the conformationof this region is a B-strand as we predict since the twosets of residues would point out on opposite sides of thestrand, with only one side olthe strand stabilising phos-phate.

3.4. Constrqints on the native SH2 domainfoldAlthough we are unable to predict the detailed three-

dimensional structure of an SH2 domain, the secondarystructure prediction, conservation patterns and site-di-rected mutagenesis data provide clues to the nature ofthe common fold and phosphotyrosine interaction siteof the SH2 domain family.

The conservation at positions 15, 39 (arginine) and 70(histidine) in nearly all SH2 domains suggests their in-volvement in phosphotyrosine binding, and that theywill be close together in the native fold. Furthermore,the native structure should place positions exhibitingconserved hydrophobic character within the core of theprotein.

Several topologies may satisfy these constraints. If weassume that the assignment of secondary structure isbroadly correct within regions A-G, then one plausibletopology would be a single five-stranded B-sheet witha-helices (regions B and G) packing against each face.However, this arrangement seems unlikely since onlytwo helices are available to bury all the conserved hy-drophobics on the single B-sheet, Other plausible topol-ogies include orthogonal and parallel double sheet ar-

AktHuman srcMatchConoengus

June 1992

rangements, which might better account for the con-served amphipathy of the putative strands.

In any topology, the proximity requirement of posi-tions 15,39 and 70 should be satisfied. To this end.arginine-39 and histidine-7} may lie on the same face ofa B-sheet, or on different sheets facing each other. Ineither arrangement, arginine-15, predicted to lie at theN-terminal end of a helix could lie near these residuesand further stabilise the phosphate group by virtue ofboth the positively charged side chain and the helixdipole.

3.5. Putative SH2-like domain of aktBellacosa et al. [4] have reported that a retroviral

oncogene, akt,encodes a serine-threonine kinase whichincludes an SH2-like region. Fig. 2 illustrates the align-ment of this sequence with the src SH2 domain, forwhich it has a pairwise identity of 207o. The possibilitythat this region of akt may share a similar function tothe SH2 domains cannot be ruled out by our analysis.However, the SH2-like domain of akt shares only 12 ofthe 40 consensus residues shown in Fig. 1. aftt lacks fourof the most common SH2 consensus residues (leucine-22; glycine-34; arginine-39; histidine-70). In particular,the change ofthe highly conserved arginine-39 to lysineis unlikely to be tolerated in light of the SDM results ofMayer et al. [6]. akt also lacks two patterns of hydro-phobic residue conservation in regions D, and G, andthe insertion placed within region A implies furtherstructural dissimilarity to this family of domains.

Our initial search of the PIR databank revealed sev-eral other proteins which show sequence similarity toSH2 domains. In particular, a C-terminal region of Ta-caribe virus L protein RNA polymerase [28] shares 241

a

Y x IF Lv vL P

5

A

F F F

a

F x S L x x L V x H YY T I I Y H

G V L F LC M

G G G G G G G G G G G

FEBS LETTERS

G W L H K R G E Y I K T W R P R Y F L L K N D G T F I G Y K E R P Q DE WY F G K I T R R E S E R L L L N A E N P R G T F L V R E S E T T K G A Y C L S V S D

a a a a a a aI V Y F G x L S R x E A E x L L c T F L V R E S y S L S V

F H K I T G D V Q A S M I K R F V I T Iw A v c Q S A M D D A R C C FY D M P A N K A P L T LV E I V R I AL N S I

A A A A A A A B B B B B B B B c c c c c c c c D D D D D

V D Q R E S P L N N F S V A Q C Q L M K T E R P R P N T F I I RF D N A K G L N V K H Y K I R K L D S G G F Y I T S R T Q F N S L Q Q L V A Y Y S K

a a a

V K H Y K II R L F R VP Y C IN Q I V

ST

E E E E

Fig. 2. Sequence alignment of the SH2-like region of akt with human src. The alignment was determined by aligning akl multiply with all 67 SH2domains. Dots indicate posrtions where the aftl secuence matches the consensus shown in Fie. l.

l 9

Page 6: Conservation analysis and structure prediction of the SH2 ...€¦ · of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements.

Volume 304, number I

40 consensus residues including the sequence TFVLRDwithin region C. The confirmed members of the SH2domain family share between 3l and 40 of the consen-sus residues, suggesting that sequences with fewer than3l matches may not exhibit SH2like properties.

Acknowledgemenls: We thank Professor L.N. Johnson for her en-couragement and support. R.B.R. is a Commonwealth Scholar, anda member of Keble College, Oxford. J.B. is a member of St. John'sCollege, Oxford. G.J.B. thanks the Royal Society for support. We aregrateful to a referee for detailed and constructive comments on themanuscnpt.

REFERENCES

[1] Koch, C.A., Anderson, D., Moran, M.F., Ellis, C. and Pawson,T. (1991) Science 252, 668-673.

[2] Heldin. C. (1991) Trends Biochem. Sci., 16, 450 452.[3] Shen, S., Bastien, L., Posner, B.I. and Chretien, P. (1991) Nature

352,736 ',739.

[4] Bellacosa, A., Testa, J.R., Staal, S.P. and Tsichlis, P.N. (1991)Science 254, 2'7+277.

[5] Davis, S.. Lu, M.L., Lo, S.H., Lin, S., Butler, J.A., Druker. B.J.,Roberts, T.M., An, Q. and Chen, L.B. (1991) Science 252. 712-715.

[6] Mayer, 8.J., Jackson, P.K.. Van Etten, R.A. and Baltimore, D.(1992) Mol. Cell. Biol. 12, 609 618.

[7] Hidaka, M., Homma, Y. and Takenawa, T. (1991) Biochem.Biophys. Res. Commun. 180, 1490 1497.

[8] Barton, G.J., Freemont, P.F., Newman, R.H. and Crumpton.M.J. (1991) Eur. J. Biochem. 198,749 760.

[9] Johnson, L.N. (1984) In: Inclusion Compounds: Vol. 3, PhysicalProperties and Application. J.L. Atwood, J.E.D. Davies andMacNicol, D.D. (Eds.) pp. 507-569, Academic Press, London.

[0] Barker, W.C., George, D.G. and Hunt, L.T. (1990) MethodsEnzvmol. 183.31-49.

June 1992

I l ] Smith, T.F. and Waterman, M.S. (1981) J. Mol. Biol. 147,195197.

[12] Otsu, M., Hiles, I., Gout, I., Fry, M.J., Ruiz-Larrea, F.,Panayotou, G., Thompson, A., Dhand, R., Hsuan, J., Totty, N.,Smith, A.D., Morgan, S.J., Courtneidge, S.A., Parker, P.J. andWaterf ield, M.D. (1991) Cell 65,91-104.

[13] Barton, G.J. (1990) Methods Enzymol. 183, 403-428.[14] Lim, V.I. (1974) Mol. Biol. 88, 873.[15] Chou, P.Y. and Fasman, G.D. (1978)Adv. Enzymol. 47,45-148.[16] Garnier, J., Osguthorpe, D.J. and Robson, B. (1978) J. Mol. Biol.

120, 9't -120.

[17] Rose, G.D. (1978) Nature 272, 586-590.[18] Wilmot, A.C.M. and Thornton, J.M. (1988) J. Mol. Biol. 203,

221-232.[9] Banga, J.P., Mahadevan, D., Barton. G.J.. Sutton, B.J., Sal-

danha, J.W., Odell, E. and McGregor, A.M. (1990) FEBS Lett.266,133 141.

[20] Crawford, LP., Niermann, T. and Kirschner, K. (1987) Proteins:Structure. Function and Genetics 1, 118-129.

[21] Benner, S.A. and Gerloff, D. (1990) Adv. Enz. Regul. 31, 1211 8 1 .

[22] Thornton, J.M., Flores, T.P., Jones, D.T. and Swindells, M.B.(1991) Nature 354. 105 106.

[23] Schulz, G.E. and Schirmer, R.H. (1979) Springer-Verlag, NewYork.

[24] Cohen, F.8., Sternberg, M.J.E. and Taylor, W.R. (1982) J. Mol.Biol. 156,82t 862.

[25] Richardson, J., Getzoff, E,. and Richardson, D. (1978) Proc. Natl.Acad. Sci. 75, 2574 2578.

[26] Hol, W.G.J., van Duijnen, P.T. and Berendsen. H.J.C. (1978)Nature 273, 443 446.

[27] Anderson, D., Koch, C.A., Grey, L., Ellis, C.. Moran, M.F. andPawson, T. (1990) Science 250, 979 982.

[28] Iapaluccr, S., Lopez, R., Rey, O., Lopez, N., Franze-Fernandez,M.T., Cohen, G.N., Lucero, M., Ochoa, A. and Zakin, M.M.(1989) Virology 170, 4047.

FEBS LETTERS

20


Recommended