+ All Categories
Home > Documents > 21023893

21023893

Date post: 04-Jun-2018
Category:
Upload: damien333
View: 217 times
Download: 0 times
Share this document with a friend

of 15

Transcript
  • 8/13/2019 21023893

    1/15

    Marcus Krantz Evren Becit Stefan Hohmann

    Comparative genomics of the HOG-signalling system in fungi

    Received: 3 April 2005 / Revised: 9 September 2005 / Accepted: 24 September 2005 / Published online: 9 February 2006 Springer-Verlag 2006

    Abstract Signal transduction pathways play crucial rolesin cellular adaptation to environmental changes. In thisstudy, we employed comparative genomics to analyse thehigh osmolarity glycerol pathway in fungi. This system

    contains several signalling modules that are usedthroughout eukaryotic evolution, such as a mitogen-activated protein kinase and a phosphorelay module.Here we describe the identication of pathway compo-nents in 20 fungal species. Although certain proteinsproved difficult to identify due to low sequence conser-vation, a main limitation was incomplete, low coveragegenomic sequences and fragmentary genome annotation.Still, the pathway was readily reconstructed in eachspecies, and its architecture could be compared. Themost striking difference concerned the Sho1 branch,which frequently does not appear to activate the Hog1MAPK module, although its components are conserved

    in all but one species. In addition, two species lackedapparent orthologues for the Sln1 osmosensing histidinekinase. All information gathered has been compiled in anMS Excel sheet, which also contains interactive visuali-sation tools. In addition to primary sequence analysis, weemployed analysis of protein size conservation. Proteinsize appears to be conserved largely independently fromprimary sequence and thus provides an additional toolfor functional analysis and orthologue identication.

    Keywords Yeast Comparative genomics HOG Protein size Signal transduction Osmotic stress

    Introduction

    Signal transduction pathways are the information routeswith which cells sense changes in their environment andeventually mount an appropriate response. Generalprinciples of signal transduction and the architecture of signalling modules are well-conserved across eukaryotes.One such example is the mitogen-activated protein(MAP) module, which occurs in all eukaryotes. Theprinciple setup of three tiers of protein kinases (MAP-KKK, MAPKK and MAPK) is used in numerouspathways within each organism, generating an intricatesignalling network. Together, MAPK pathways orches-trate cell growth, morphogenesis and cell division inresponse to hormones, stress and other abiotic signals(Widmann et al. 1999).

    As in many areas of cell biology Saccharomyces ce-revisiae is a suitable model organism. Individual signal-ling pathways have been studied in great detail using thetools of genetics, molecular and cellular biology as wellas functional genomics. While a large amount of infor-mation has accumulated on S. cerevisiae , a criticalquestion concerns which features of a system are genericand which are species-specic. In principle, a satisfac-tory answer to this question can only be given if anorthologous system, such as the osmosensing MAPKpathway, is studied in detail in several organisms of different evolutionary relationship. The availability of genome information from a growing number (presentlysome 20) of fungal species allows addressing, withinlimitations, generic and species-specic features bycombining sequence comparison with the knowledgegained from S. cerevisiae . This work focuses on the highosmolarity glycerol (HOG) pathway, which mediatesresponses to hyperosmotic shock and probably also toother stresses. In this manuscript we have tried toreconstruct, from genomic information, this pathway indifferent yeasts and lamentous fungi. Examples areprovided on how the information can be presentedusing standard software. In the accompanying paper

    Electronic Supplementary Material Supplementary material isavailable for this article at http://dx.doi.org/10.1007/s00294-005-0038-x and is accessible for authorized users.

    Communicated by J. Hasek

    M. Krantz E. Becit S. Hohmann ( & )Department for Cell and Molecular Biology, Go teborg University,Box 462, 40530 Go teborg, SwedenE-mail: [email protected].: +46-31-7732595Fax: +46-31-7732599

  • 8/13/2019 21023893

    2/15

    (Krantz et al. 2006) we have used sequence comparisonto probe previous functional and structural analyses, asa tool to generate hypotheses for functional analysis andto study domain organisation of some selected proteinsof the pathway.

    The HOG pathway from S. cerevisiae (Fig. 1) con-sists of two branches that seem to sense osmotic changesin different ways (de Nadal et al. 2002; Hohmann 2002;ORourke et al. 2002; Saito and Tatebayashi 2004;Sheikh-Hamad and Gustin 2004; Westfall et al. 2004).The Sln1-branch consists of the Sln1Ypd1Ssk1 phos-phorelay system, which is the eukaryotic counterpart of bacterial two-component systems. Sln1 is a sensor his-tidine kinase, Ypd1 a phosphotransfer protein and Ssk1a response regulator (Posas et al. 1996; Santos andShiozaki 2001; Catlett et al. 2003). Hyperosmotic shockdeactivates Sln1, leading to enhanced levels of dephos-pho-Ssk1, which is an activator of the MAPKKKs Ssk2and Ssk22 (Posas et al. 1996). These two proteins seemto have largely redundant (Maeda et al. 1994) but pos-sibly also different specic functions (Yuzyuk andAmberg 2003). Ssk2 and Ssk22 activate, by phosphor-ylation, the MAPKK Pbs2, which in turn activates, byphosphorylation, the MAPK Hog1 (Maeda et al. 1994).The Sho1-branch contains two scaffold proteins that arecrucial for signalling and specicity: the plasma mem-brane-localised Sho1 (Raitt et al. 2000; Seet and Pawson

    2004 ) as well as the MAPKK Pbs2 (Posas and Saito1997 ; ORourke and Herskowitz 1998). Sho1 recruitsPbs2 to the cell surface during signalling (Raitt et al.2000 ; Reiser et al. 2000). Both Sho1 and Pbs2 can bindthe MAPKKK Ste11 (Zarrinpar et al. 2004). Ste11 isactivated by phosphorylation, which is mediated by theSte20 and Cla4 kinases (Raitt et al. 2000; Reiser et al.2000 ) and requires Ste50 (Ramezani-Rad 2003). Ste20 inturn depends on the membrane-bound G-protein Cdc42(Elion 2000; Reiser et al. 2000; Ramezani-Rad 2003).Hence, it appears that signalling is controlled byrecruitment (Ptashne and Gann 2003). The mechanismsor the proteins that stimulate recruitment in the rstplace and sense osmotic changes are, however, un-known. Numerous proteins appear to interact withcomponents of the Sho1 branch according to genetic orbiochemical evidence; additional work is needed to elu-cidate if these proteins truly play a role in signal gen-eration or transmission. Sho1, Cdc42, Ste20, Ste50 andSte11 are also needed for pseudohyphal developmentand, except Sho1, for mating (Elion 2000; Seet andPawson 2004). Activated Hog1 has numerous targets inthe cell, such as the transcription factors Msn2/Msn4,Hot1, Sko1, Smp1 (Rep et al. 1999, 2000, 2001; Alepuzet al. 2001, 2003; Proft et al. 2001; Proft and Struhl2002 ; de Nadal et al. 2003), the histone deacetylaseRpd3 (De Nadal et al. 2004), the protein kinases Rck1

    Fig. 1 The HOG pathway. This study includes the known components ( colour ) as well as protein that interacts with pathway componentsand are involved in morphogenesis ( grey )

  • 8/13/2019 21023893

    3/15

    and Rck2 (Bilsland-Marchesan et al. 2000; Bilslandet al. 2004), the metabolic regulatory enzyme Pfk26(Dihazi et al. 2004) and probably more.

    Fungal genomics is presently an active eld. We haveconsidered genomes for which information is readilyavailable (Table 1). Several other genomes have beensequenced but the information is either kept condentialfor commercial purposes or is not available in a formatallowing straightforward analysis. Likewise, there arelimitations to several of the resources used in this study.The species included in this study encompass closerelation to S. cerevisiae , that is, species of the genusSaccharomyces , more distantly related yeasts, includingyeasts that had not undergone whole genome duplica-tion, the distantly related ssion yeast Schizosacchar-omyces pombe , as well as four ascomycetous and onebasidiomycetous lamentous fungi.

    In this manuscript we present the catalogue of HOGpathway components from yeast and fungi and we dis-cuss pathway conservation.Emphasis of this analysis wason identication of components, the existence of para-logues in different organisms and the degree of sequenceconservation of different components. In addition to acomparison of primary sequence, analysis of domainstructure and predicted protein size proved useful to as-sess the quality of annotation and sequencing and for theidentication of orthologues versus paralogues.

    Materials and methods

    Data sources used

    Sequences were retrieved from the sources indicated inTable 1. Where possible, sequences were retrieved byBLAST (Altschul et al. 1997), using both the sequencefrom S. cerevisiae and that of the syntenic homologue inAshbya gossypii . Where blast searches were not possible,the annotated orthologue was used. In either case, onlycandidate genes that hit the original protein when usedto blast the S. cerevisiae genome were considered forfurther analysis.

    Analysis tools used

    Multiple alignments as well as calculations of identityscores were performed with ClustalW and default set-tings, protein domain denitions were performed viaPfam and FingerPRINTScan, and prediction of trans-membrane domains were done with TMHMM. Refer-ences to analysis software tools are summarised inTable 2.

    Size analysis

    Proteins from yeasts and lamentous fungi were con-sidered as two separate populations. Each protein size

    was compared to the average and standard deviationbased on the rest of its population. Probability wascalculated from the resulting t-distribution, and thethreshold of signicance was set to 0.001. After eachsuch elimination, the test was repeated with theremaining population until no outliers were found.When two outliers lay close to the signicancethreshold, they were both tested against the remainingpopulation. After elimination of all outliers, yeast andlamentous fungi were compared with heteroscedastict-tests, with a threshold of signicance of 0.01.Coefficient of variation is the relative standarddeviation, that is, calculated as the ratio betweenstandard deviation and the average value. Whenmentioned in the text, coefficient of variation alwaysrefers to size.

    Tools to present data

    The data is compiled in one Excel workbook andaccessible via a number of retrieval and visualisation

    sheets. These are interactive and allow the user tocompare the pathway between any two species, tovisualise the alignment results and to retrieve sequencesfor further analysis. It also contains an overview of thecomponents found in each species as well as a completetable with the gene designations. Instructions for use areenclosed in the workbook.

    Extrapolation of missing identities

    Each missing identity score was estimated from thesurrounding eight (or as many as were present) identityscores via calculation of ratios of nearby genes to nearbyspecies as per:

    P s 1n X

    P 1 s P 1 s 1

    P s 1 P 1 s

    P 1 s 1 P s 1

    P 1 s

    P 1 s 1 P s 1

    P 1 s P 1 s 1

    P s 1

    The number of estimates, n, ranges from one to fourdepending on the available numbers surrounding themissing one. The procedure was repeated iterativelyuntil each missing value had been estimated.

    Resources made available on the website of the groupand as supplementary material.

    Supplementary Table 1: List of all gene designations inall organisms.

    Supplementary Table 2: Excel le with all proteinsequences and alignment scores from this study,including visualisation tools.

    P 1 (S 1 ) P , S 1 P +1 , S 1

    P 1 , S Missing valueProtein= P in species= S : P (S ) P+1 , S

    P 1 , S +1 P , S +1 P +1 , S +1

  • 8/13/2019 21023893

    4/15

    T a

    b l e 1 R e s o u r c e s u s e d f o r s e q u e n c e r e t r i e v a l

    O r g a n i s m

    C h a r a c t e r i s t i c s

    D a t a s o u r c e

    R e f e r e n c e s

    S a c c h a r o m y c e s c e r e v i s i a e

    B a k e r s ( b u d d i n g ) y e a s t , w

    i d e l y u s e d i n e x p e r i m e n t a l

    r e s e a r c h

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    G o ff e a u e t a l . ( 1 9 9 6 )

    S . p a r a d o x u s

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    S . b a y a n u s

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    S . c a s t e l i i

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    S . k l u y v e r i

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    S . k u d r i a v z e v i i

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    S . m i k a t a e

    C l o s e l y r e l a t e d t o S . c e r e v i s i a e

    h t t p : / / w w w . y e a

    s t g e n o m e . o r g /

    C l i f t e n e t a l . ( 2

    0 0 3 ) a n d K e l l i s e t a l . (

    2 0 0 3 )

    A s h b y a g o s s y p i i

    F i l a m e n t o u s y e a s t , d e t a i l e d s y n t e n y a n a l y s i s a v a i l a b l e

    h t t p : / / a g d . u n i b a s . c

    h /

    D i e t r i c h e t a l . ( 2 0 0 4 )

    C a n d i d a g l a b r a t a

    H e m i a s c o m y c e t e , p a t h o g e n r e l a t e d t o S . c

    e r e v i s i a e

    h t t p : / / c b i . l a b r i . f r

    / G e n o l e v u r e s /

    D u j o n e t a l . ( 2 0

    0 4 )

    Y a r r o w i a l i p o l y t i c a

    H e m i a s c o m y c e t e , o b l i g a t e a e r o b , l i p o p h y l i c

    , d i s t a n t f r o m

    S . c e r e v i s i a e

    h t t p : / / c b i . l a b r i . f r

    / G e n o l e v u r e s /

    D u j o n e t a l . ( 2 0

    0 4 )

    D e b a r y o m y c e s h a n s e n i i

    H e m i a s c o m y c e t e , m a r i n e y e a s t , h a l o t o l e r a n t

    h t t p : / / c b i . l a b r i . f r

    / G e n o l e v u r e s /

    D u j o n e t a l . ( 2 0

    0 4 )

    K l u y v e r o m y c e s l a c t i s

    H e m i a s c o m y c e t e , m i l k y e a s t , e x p e r i m e n t a l o r g a n i s m

    h t t p : / / c b i . l a b r i . f r

    / G e n o l e v u r e s /

    D u j o n e t a l . ( 2 0

    0 4 )

    C a n d i d a a l b i c a n s

    P a t h o g e n i c

    , o b l i g a t e d i p l o i d , q u i t e d i s t a n t f r o m S .

    c e r e v i s i a e , e x p e r i m e n t a l o r g a n i s m

    h t t p : / / w w w . c a n

    d i d a g e n o m e . o r g /

    J o n e s e t a l . (

    2 0 0 4 )

    S c h i z o s a c c h a r o m y c e s

    p o m b e

    F i s s i o n y e a s t , v e r y d i ff e r e n t f r o m S . c e r e v i s i a e , c

    o m m o n

    e x p e r i m e n t a l o r g a n i s m

    , S t y 1 p a t h w a y , w e l l s t u d i e d

    h t t p : / / w w w . s a n

    g e r . a c . u k /

    P r o j e c t s / S

    _ p o m b e /

    W o o d e t a l . ( 2 0

    0 2 )

    K l u y v e r o m y c e s w a l t i i

    D i v e r g e d f r o m S . c e r e v i s i a e b e f o r e g e n o m e d u p l i c a t i o n

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u / s e q /

    Y e a s t D u p l i c a t i o n /

    K e l l i s e t a l . (

    2 0 0 4 )

    A s p e r g i l l u s n i d u l a n s

    F i l a m e n t o u s f u n g u s , e

    x p e r i m e n t a l o r g a n i s m

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u /

    r e s o u r c e s . h t m l /

    U n p u b l i s h e d

    F u s a r i u m g r a m i n e a r u m

    F i l a m e n t o u s f u n g u s , c

    r o p p a t h o g e n , e x p

    e r i m e n t a l

    o r g a n i s m

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u /

    r e s o u r c e s . h t m l /

    U n p u b l i s h e d

    M a g n a p o r t h e g r i s e a

    F i l a m e n t o u s f u n g u s , r

    i c e p a t h o g e n , e

    x p e r i m e n t a l o r g a n i s m

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u /

    r e s o u r c e s . h t m l /

    U n p u b l i s h e d

    N e u r o s p o r a c r a s s a

    F i l a m e n t o u s f u n g u s , e

    x p e r i m e n t a l o r g a n i s m

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u /

    r e s o u r c e s . h t m l /

    G a l a g a n e t a l . ( 2 0 0 3 ) a n d

    B o r k o v i c h e t a l . (

    2 0 0 4 )

    U s t i l a g o m a y d i s

    F i l a m e n t o u s f u n g u s , m

    a i z e p a t h o g e n , e

    x p e r i m e n t a l

    o r g a n i s m

    h t t p : / / w w w . b r o

    a d . m

    i t . e d u /

    r e s o u r c e s . h t m l /

    U n p u b l i s h e d

  • 8/13/2019 21023893

    5/15

    Supplementary Table 3: Word les (one for eachprotein) with protein sequences and multiple alignments.

    Supplementary Table 4: Brief discussion for eachprotein on the orthologues identied

    Results

    Scope of the analysis

    Figure 1 shows a sketch of the HOG pathway. Theanalysis includes components that have previously beenshown to be part of the pathway, and these are indicatedin colour. In addition, the analysis also encompassedproteins for which evidence has been reported that theygenetically or physically interact with components of thepathway. Those are indicated in grey. Except for Spa2,all those interactions concern proteins of the Sho1branch and they also play roles in morphogenic deci-sions, such as formation of mating projections. Table 3summarises the proteins included in the analysis and theevidence that links them to the HOG pathway.

    Identication of orthologues

    To facilitate the identication of orthologues we madeuse of the well-annotated genome sequence of A. gos-sypii . A. gossypii (and also Kluyveromyces waltii) didnot undergo the whole genome duplication that oc-curred in Saccharomyces yeasts (Dietrich et al. 2004;Kellis et al. 2004). There is a high degree of syntenybetween the S. cerevisiae and A. gossypii genomes,which means that gene order is largely conserved(Dietrich et al. 2004). Syntenic homologues derive froma common ancestor and hence are likely true ortho-logues. We found that protein sequences from la-mentous fungi and distantly related yeasts were about

    equally diverged from S. cerevisiae as they were fromA. gossypii (see also Table 5).

    Hence, we performed searches for orthologuesstarting from the S. cerevisiae proteins in four steps: (1)we rst determined the orthologue from A. gossypii ; (2)we used the S. cerevisiae sequences to search for likelyorthologues in all other genomes; (3) we used the A. gossypii sequences to search for most likely orthologuesin all other genomes; (4) we compared the results. Inseveral cases we could decide for most likely ortho-logues based on this comparison. Evaluation of the

    domain structure was employed to distinguish trueorthologues. For instance, many fungi have severalhistidine kinases (Catlett et al. 2003) but only histidinekinases with two transmembrane domains for osmo-sensing (Ostrander and Gorman 1999) were regarded asSln1 orthologues. In addition, we found that conser-vation of predicted protein size could be used toidentify orthologues.

    Visualisation tools

    Table 4 gives an overview of putative orthologuesidentied and Supplementary Table 1 lists the genedesignation for them in each organism. Failure toidentify putative orthologues was due to one of the fol-lowing three reasons: (1) incomplete genome sequence,such as in several Saccharomyces species; (2) too lowsequence similarity to allow identication in more dis-tant species; (3) lack of an orthologue.

    Table 5 lists the overall sequence identities for allorthologues identied with S. cerevisiae as references. Asentire protein sequences were compared, this presenta-tion ignores the fact that certain domains, such as forinstance protein kinase domains, show much higheridentity values.

    All sequences can be downloaded as SupplementaryTables 2 and 3. Supplementary Table 2 is an Excel leconsisting of multiple sheets, several of which functionas interactive visualisation tools. Sheet 1 containsinstructions. Sheet 2 (HOG pathway overview) is asketch of the HOG pathway in which pairwise ortho-logue comparisons between two freely selectable speciescan be generated (see Fig. 2 for the comparison be-tween S. cerevisiae and A. gossypii as example). Sheet 3(alignment overview) allows generating versions of Table 5 (sequence identities) with different referencespecies. It also contains a graphical presentation of sequence identities (see further). Sheet 4 (proteinalignment) provides tables for each protein with allpairwise comparisons between species. Sheet 5 (proteinsequence retrieval) allows copying of protein sequencesdirectly for pasting into ClustalW for customisedmultiple alignments. Sheets 6 (component overview)and 7 (gene designations) contain Table 4 and Sup-plementary Table 1, respectively.

    Supplementary Table 3 consists of a set of Word les,each containing all sequences of a specic protein forcopying and pasting into ClustalW (equivalent to Sheet

    Table 2 Resources and tools used for sequence analysis

    Resource/analysis tool Web address Purpose References

    ClustalW http://www.ebi.ac.uk/clustalw/ Multiple sequence alignment Thompson et al. ( 1994 )Pfam http://www.sanger.ac.uk/Software/Pfam/ Protein domain identication Bateman et al. ( 2002 )FingerPRINTScan http://www.ebi.ac.uk/printsscan/index.html/ Protein domain identication Attwood et al. ( 2000 )TMHMM http://www.cbs.dtu.dk/services/TMHMM-2.0/ Prediction of transmembrane domains Moller et al. ( 2001 )

  • 8/13/2019 21023893

    6/15

  • 8/13/2019 21023893

    7/15

    5 in the Supplementary Table 2). Each Word le alsocontains one or several multiple alignments for eachprotein. Finally, Supplementary Table 4 contains brief discussions of the orthologues identied for each pro-tein.

    Reconstructing the maps

    The interactive Excel visualisation tool (SupplementaryTable 2) allows viewing the HOG pathway and itscomponents for each organism and also displays se-

    Table 4 List of components identied

  • 8/13/2019 21023893

    8/15

    quence identities between two freely selectable species. Itshould be noted that this display (Fig. 2) is distinctfrom, for example, the KEGG database (http://www.genome.jp/kegg/), as it focuses on fungal proteins,a comprehensive catalogue of pathway components aswell as on species comparison. The fact that this pre-sentation uses standard software available on mostcomputers makes it useful for simple dissemination inthe research community.

    Overall similarities

    Figure 3 is a graphical presentation of sequence simi-larity using identity scores between pairs of proteins.Each graph has two halves where S. cerevisiae proteins(upper part) or A. gossypii sequences (lower part) wereused as reference. Species as well as proteins are sortedaccording to their overall degree of divergence. Theupper panel (Fig. 3a) was constructed using the raw dataand missing proteins are shown as black areas. In thelower panel (Fig. 3b) those gaps were lled by estimatingthe degree of sequence identity from neighbouringscores, using an algorithm described in Sect. Materialsand methods. This algorithm extrapolates the sequenceidentity for a given protein from the identity scores of neighbouring species and proteins.

    With the exception of the block of sequences fromthe Saccharomyces sensu strictu species, which arehighly similar when compared with each other, thedisplay is symmetric. This indicates that the sameproteins are conserved, or divergent, when S. cerevisiaeor A. gossypii are used as reference and hence suggeststhat none of the proteins in the HOG pathway wasunder specic selective pressure in a subset of thespecies investigated. This is further illustrated by thefact that for each protein, sequences gradually divergedfrom the reference with evolutionary distance of theorganism.

    In a different representation, using sequence identitieswith S. cerevisiae proteins as reference, proteins wereplotted versus species and again proteins and specieswere sorted according to increasing overall divergence.Figure 4a shows species curves (i.e. identity on the y-axisand proteins on the x-axis). It appears that curves (andhence species) group into three clusters, largely inaccordance with their phylogenetic relationship. Theupper cluster with most highly conserved sequencescontains the Saccharomyces sensu strictu species, thesecond cluster contains the yeasts Saccharomyces kluy-veri and Saccharomyces castelii , Candida glabrata,Kluyveromyces lactis, K. waltii and A. gossypii . Thebottom cluster contains all lamentous fungi as well asthe unicellular fungi Candida albicans , Yarrowia lipoly-tica and Sz. pombe . With more highly divergent proteinsthe latter two clusters begin to merge.

    The protein curves (Fig. 4b, identity on the y-axisand species on the x-axis) illustrate that two proteinsstand out as most highly conserved: Cdc42 and Hog1.

    T a

    b l e 5 S e q u e n c e i d e n t i t i e s

  • 8/13/2019 21023893

    9/15

    Fig. 2 The Excel tool. This gure is an interactive, schematic representation of the HOG pathway, which allows comparison betweenspecies. It provides both a structural overview in the selected species, and a quantitative comparison to a freely chosen reference species

  • 8/13/2019 21023893

    10/15

    Those are highly conserved throughout eukaryotes.Proteins that show a higher degree of conservation havean enzymatic function and the relevant domain is com-

    monly the best conserved part. Transcriptional regula-tors as well as adaptor or scaffold proteins are morepoorly conserved.

    Fig. 3 Sequence identity. Foreach protein and species,sequence homology toSaccharomyces cerevisiae andAshbya gossypii has beenplotted in the upper and lowerhalves , respectively. The displayis symmetrical, except theexclusion of S. cerevisiae in theupper half and A. gossypii in thelower half . The differencesbetween the halves are mainlylimited to the Saccharomycessensu strictu species, and apartfrom these the upper and lowerhalves are virtually mirrorimages of each other. Panel adisplays the raw data andmissing sequences arerepresented by black holes . Inpanel b, these holes have beenlled by extrapolation fromadjacent genes and species, asdescribed in Sect. Materialsand methods

  • 8/13/2019 21023893

    11/15

    Fig. 4 Identity plots. The upper panel ( a ) plots the identity scoresto Saccharomyces cerevisiae as a function of protein for eachspecies. The lower panel ( b) plots the identity scores to S. cerevisiaeas a function of species for each protein. As seen in panel a, thespecies divide in three groups. The upper group consists of the

    Saccharomyces sensu strictu species, the middle group consists of Saccharomyces castelii, Saccharomyces kluyveri, Candida glabrata,Kluyveromyces lactis and Kluyveromyces waltii , and the thirdgroups contain Candida albicans, Yarrowia lipolytica, Schizosac-charomyces pombe and the lamentous fungi

  • 8/13/2019 21023893

    12/15

    Discussion

    The availability of genome sequences from a group of related species allows analysis of generic (conserved)features of individual proteins as well as entire path-ways. This opportunity is just emerging. Recent fungalgenome initiatives allow, however, to make use of thisopportunity. A general problem, however, concerns thequality of the sequences generated (often only low cov-erage) and the fact that annotation is slow or even notconducted at all for several sequenced genomes. Actionis needed to generate complete, high quality and well-annotated genome sequences. In any case, we believethat the available sequence information should be usedfar more rigorously in all studies that address gene andprotein function. Here we have studied the HOG path-way using fungal genome information.

    The HOG pathway in fungi

    It is possible to reconstruct the HOG-signalling pathwayin all fungal species investigated here. Limitations inorthologue identication were generally observed fortranscription factors as well as more peripheral com-ponents.

    A consensus HOG-signalling system consists of theSln1 and the Sho1 branch converging on Pbs2, althoughas discussed later there seem to be signicant differencesin this design in different groups of species. The Sln1branch consists of a membrane-inserted osmosensinghistidine kinase, which is part of phosphorelay systemwith Ypd1 and Ssk1. The role of the MAPKKK in thisbranch is taken by two largely redundant proteins (Ssk2/Ssk22) in some species, while others have a single pro-tein. The signal is then transferred through Pbs2 toHog1. The Sho1 branch consists of Sho1, whose exact

    role [scaffold or more (Krantz et al. 2006)] still remainsto be determined. The signal is somehow generated by orthrough Ste20 and Cla4, which in turn depend on Cdc42and Cdc24, and transferred to Ste11 with the help of Ste50. There is a network of interacting proteins aroundthis module. Negative regulators are protein phospha-tases, which differ in number between species. It appearsthat Ptp2 and Ptp3 have different function in someyeasts while in others and in lamentous fungi oneprotein may take both roles. In addition, a Ptc1 and aPtc2/Ptc3 protein seem to be responsible for threoninedephosphorylation. Targets of Hog1 are a MAP kinase-activated protein kinase (in Saccharomyces two redun-dant proteins Rck1 and Rck2) as well as a series of transcription factors. It is possible that the transcrip-tional output differs substantially among the fungiinvestigated since transcription factors are poorly con-served. Those include a zinc-nger protein (two redun-dant proteins Msn2 and Msn4 in Saccharomyces ), abZIP (Sko1), two non-redundant relatives (Hot1 andMsn1) and Smp1. However, experimental studies areneeded in different yeasts and lamentous fungi toinvestigate the transcriptional output and the spectrumof transcription factors mediating it.

    The most dramatic differences in design seem toconcern pathway input. As is well known, Sz. pombedoes not possess a transmembrane-spanning histidinekinase and hence it is still unclear how osmosensingoccurs in the Sln1 branch in ssion yeast. We could alsonot identify a membrane-spanning histidine kinase inUstilago maydis , indicating that basidiomycetes mayalso employ a different sensing mechanism. It is alsoknown that ssion yeast lacks a Sho1 orthologue. Evenmore remarkably, there is evidence that even in fungithat possess a Sho1 orthologue the HOG pathway is notnecessarily stimulated via the Sho1 branch. Filamentousfungi as well as C. albicans appear to lack the proline-rich domain that links Pbs2 to Sho1 (Krantz et al. 2006).It has recently been demonstrated experimentally inAspergillus nidulans that indeed the Hog1 orthologue inthis organism is not stimulated by a Sho1 branch(Furukawa et al. 2005). This indicates that the Sho1branch does not primarily have a role in osmosensingbut rather functions in morphogenic control.

    Analysis and visualisation tools

    A comparative genomics study inevitably generates alarge amount of information for which new ways of presentation are needed. In this study, we have strived tovisualise the results in different ways using standardMicrosoft Office software and those examples mightprove useful for presenting results of similar studies. Inessence, the entire information bank is stored in Sup-plementary Table 2, which also contains several inter-active tools that facilitates retrieval and visualisation of the stored data. It allows structural reconstruction of thepathway, although subject to sequence coverage, within

    Fig. 5 Conservation of size versus primary sequence. For each setof orthologues, average identity scores were plotted versus thecoefficient of variation of the protein size. The correlation betweenprimary sequence conservation and size conservation is surprisinglylow, although highly conserved proteins such as Hog1 and Cdc42tolerates changes in neither size nor sequence ( top left ). If these twoextreme cases are disconsidered, the correlation drops even further(R 2 =0.056 instead of 0.13)

  • 8/13/2019 21023893

    13/15

    each species, as well as comparisons using differentspecies as reference. The idea was to make these toolssimple to use and such that they run without any addi-tional software on any PC using recent Office software.As they only rely on the functions intrinsic to MS Excel,they can be altered or extended without any knowledgein programming.

    Size conservation and size as an analytic tool

    Comparative genomics does not only allow analysis of the conservation of primary protein sequence but alsoemploys domain structure and variation in size as a toolto evaluate protein function. By denition, orthologousproteins share function and consequently functionaldomains, which presumably require structures of similarsize. Similar to primary sequence variation, variability insize, expressed as the coefficient of variation, differs be-tween proteins. Although, again like primary sequencevariation, it depends on which proteins are included inthe comparison, it shows a surprisingly small correlationto sequence conservation (Fig. 5). Thus, it appears thatsize and primary sequence is conserved by differentmechanisms, although the highly conserved proteinstolerate changes in neither sequence nor size.

    Consequently, size provides an additional handle onprotein function. While size is subject to variation,such variation is presumably centred on a mean andsignicant deviations from this are indicative of eitherone of two events: that the protein has received/lostfunctional domains or that it is a false orthologue. Therst case may not mean that the protein does not fullthe orthologous function, but that the function maycontain additional features and/or input or output.This is supported by the fact that signicant differencesare most often observed in U. maydis , as compared tothe lamentous fungi, and Y. lipolytica and Sz. pombe ,as compared to the yeast. Those species are known tohave diverged most. In fact, all four cases whereSaccharomyces species proteins are implicated, it ap-pears to be due to sequencing and/or annotation er-rors. Yet, even if this frequency of sequencing andannotation errors is considered representative, themajority of the signicant size variations are indicativeof biological differences rather than sequencing arte-facts.

    Finally, as will be discussed more thoroughly else-where, a majority of the isogene pairs show a signicantsize difference, advocating that their functions have di-verged. Accordingly, the protein pairs known to havedivergent functions are consistently signicantly differ-ent in size. Thus, we conclude that size analysis is avaluable addition to the analysis of primary sequence.

    Acknowledgements This work was supported by the EuropeanCommission (the QUASI project, contract LSHG-CT2003-530203)and the Swedish Research Council (research position to SH anddifferent research grants).

    References

    Alepuz PM, Jovanovic A, Reiser V, Ammerer G (2001) Stress-induced MAP kinase Hog1 is part of transcription activationcomplexes. Mol Cell 7:767777

    Alepuz PM, de Nadal E, Zapater M, Ammerer G, Posas F (2003)Osmostress-induced transcription by Hot1 depends on a Hog1-mediated recruitment of the RNA Pol II. EMBO J 22:2433 2442

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, MillerW, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic AcidsRes 25:33893402

    Attwood TK, Croning MD, Flower DR, Lewis AP, Mabey JE,Scordis P, Selley JN, Wright W (2000) PRINTS-S: the databaseformerly known as PRINTS. Nucleic Acids Res 28:225227

    Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR,Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL(2002) The Pfam protein families database. Nucleic Acids Res30:276280

    Bi E, Chiavetta JB, Chen H, Chen GC, Chan CS, Pringle JR(2000) Identication of novel, evolutionarily conservedCdc42p-interacting proteins and of redundant pathways link-ing Cdc24p and Cdc42p to actin polarization in yeast. MolBiol Cell 11:773793

    Bilsland E, Molin C, Swaminathan S, Ramne A, Sunnerhagen P(2004) Rck1 and Rck2 MAPKAP kinases and the HOG path-way are required for oxidative stress resistance. Mol Microbiol53:17431756

    Bilsland-Marchesan E, Arino J, Saito H, Sunnerhagen P, Posas F(2000) Rck2 kinase is a substrate for the osmotic stress-acti-vated mitogen-activated protein kinase Hog1. Mol Cell Biol20:38873895

    Borkovich KA, Alex LA, Yarden O, Freitag M, Turner GE, ReadND, Seiler S, Bell-Pedersen D, Paietta J, Plesofsky N, PlamannM, Goodrich-Tanrikulu M, Schulte U, Mannhaupt G, NargangFE, Radford A, Selitrennikoff C, Galagan JE, Dunlap JC,Loros JJ, Catcheside D, Inoue H, Aramayo R, Polymenis M,Selker EU, Sachs MS, Marzluf GA, Paulsen I, Davis R, EbboleDJ, Zelter A, Kalkman ER, ORourke R, Bowring F, YeadonJ, Ishii C, Suzuki K, Sakai W, Pratt R (2004) Lessons from thegenome sequence of Neurospora crassa : tracing the path fromgenomic blueprint to multicellular organism. Microbiol MolBiol Rev 68:1108

    Brown JL, Jaquenoud M, Gulli MP, Chant J, Peter M (1997)Novel Cdc42-binding proteins Gic1 and Gic2 control cellpolarity in yeast. Genes Dev 11:29722982

    Catlett NL, Yoder OC, Turgeon BG (2003) Whole-genome anal-ysis of two-component signal transduction genes in fungalpathogens. Eukaryot Cell 2:11511161

    Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, MajorsJ, Waterston R, Cohen BA, Johnston M (2003) Finding func-tional features in Saccharomyces genomes by phylogeneticfootprinting. Science 301:7176

    Cullen PJ, Sabbagh W Jr, Graham E, Irick MM, van Olden EK,Neal C, Delrow J, Bardwell L, Sprague GF Jr (2004) A sig-naling mucin at the head of the Cdc42- and MAPK-dependentlamentous growth pathway in yeast. Genes Dev 18:1695 1708

    De Nadal E, Zapater M, Alepuz PM, Sumoy L, Mas G, Posas F(2004) The MAPK Hog1 recruits Rpd3 histone deacetylase toactivate osmoresponsive genes. Nature 427:370374

    Dietrich FS, Voegeli S, Brachat S, Lerch A, Gates K, Steiner S,Mohr C, Pohlmann R, Luedi P, Choi S, Wing RA, Flavier A,Gaffney TD, Philippsen P (2004) The Ashbya gossypii genomeas a tool for mapping the ancient Saccharomyces cerevisiaegenome. Science 304:304307

    Dihazi H, Kessler R, Eschrich K (2004) HOG-pathway inducedphosphorylation and activation of 6-phosphofructo-2-kinaseare essential for glycerol accumulation and yeast cell prolifer-ation under hyperosmotic stress. J Biol Chem 279:2396123968

  • 8/13/2019 21023893

    14/15

  • 8/13/2019 21023893

    15/15