Prediction of glycosylation across the human proteome and the
correlation to protein function
Ramneek Gupta and S�ren Brunak
Center for Biological Sequence Analysis� Bldg����� Bio�Centrum
Technical University of Denmark� DK����� Lyngby� Denmark�
� Introduction
The addition of a carbohydrate moeity to the side�chain of a residue in aprotein chain in�uences the physicochemical properties of the protein� Gly�cosylation is known to alter proteolytic resistance� protein solubility� stability�local structure� lifetime in circulation and immunogenicity����
Of the various forms of protein glycosylation found in eukaryotic systems�the most important types are N�linked� O�linked GalNAc �mucin�type� and O���linked GlcNAc �intracellular�nuclear� glycosylation� N�linked glycosylationis a co�translational process involving the transfer of the precursor oligosac�charide� GlcNAc�Man�Glc�� to asparagine residues in the protein chain� Theasparagine usually occurs in a sequon Asn�Xaa�Ser�Thr� where Xaa is notProline� This is however� not a speci�c consensus since not all such sequonsare modi�ed in the cell� O�linked glycosylation involves the post�translationaltransfer of an oligosaccharide to a serine or threonine residue� In this case�there is no well�de�ned motif for the acceptor site other than the near vicinityof proline and valine residues�
We have developed glycosylation site prediction methods for these threetypes of glycosylation� using arti�cial neural networks that examine correla�tions in the local sequence context and surface accessibility� In this paper� wehave used glycosylation site information on human proteins to illustrate thecontribution of glycosylation to protein function and assess how widespreadthis modi�cation is across the human proteome�
� Methods
��� Data set
Analysis shown in this paper was derived on a set of human proteins obtainedfrom the swiss�prot �rel� � database� This consisted of ��� � well anno�tated proteins� We chose to work with proteins from a single organism i�e�humans� to restrict the diversity of oligosaccharyltransferase acceptor sites�
Pacific Symposium on Biocomputing 7:310-322 (2002)
Glycosylation in simple organisms� such as yeast� is well studied ���� but theirglycans are usually high mannosylated structures� and it is not clear how sim�ilar their mechanism of glycosylation is to that of humans� Combining datafrom di�erent organisms would complicate the analysis� possible �families� ofacceptor speci�cities causing ambiguity in distinguishing acceptor �positive�sites from non�acceptor �negative� sites�
��� Functional categories for proteins
De�ning protein function is a complicated task� and there are many di�erentways of describing the roles and functions of a protein in a cell� This is thetopic of many on�going ontology projects�� Here we chose to use a cellular roledescriptor and subcellular location as our categorisations�
�� categories �� de�ned� � unknown� re�ective of the �cellular role� of theprotein in the cell were employed �as shown in Figure ��� The automatic classassignment to sequences was made by an extension of the Euclid system per�forming a lingustic analysis and clustering of swiss�prot keywords ���� Key�words were parsed for the human proteins in swiss�prot� For each functionalclass� the informative weight �Z�score� of each keyword was extracted from adictionary �� Keyword sums gaves scores to all categories for a particular se�quence� The central point of the Euclid system is the dictionary� The primaryversion of this dictionary was generated from an initial set of carefully� handannotated proteins from di�erent organisms spanning every kingdom of life�From this initial set� a �rst dictionary was de�ned which was used to assign allswiss�prot proteins and the process of dictionary de�nition and assignmentwas reiterated until convergence� This �nal dictionary obtained was used toassign functional classes to around ����� human proteins from swiss�prot�
The cellular role categories themselves were derived from an earlier pro�posed scheme for Escherichia coli � which was later extended by the TIGRgroup for other complete genomes� These categories comprise � functionalclasses which are subsets of three superclasses� Energy� Communication andInformation� Proteins which do not �t in the � categories are assigned to�Other� �functionally unde�ned cluster� or to �Unknown� �sequences which donot contain the relevant keywords needed for classi�cation in the above sys�tem��
Subcellular locations of proteins were obtained from swiss�prot annota�tions and psort predictions� �where no parsable swiss�prot annotation wasfound��
Pacific Symposium on Biocomputing 7:310-322 (2002)
� Results
��� N�Glycosylation
N�linked glycosylation modi�es membrane and secreted proteins� This co�translational process occurs in the endoplasmic reticulum and is known toin�uence protein folding� The modi�cation attributes various functional prop�erties to a protein� To examine if certain categories of proteins were more proneto glycosylation than others� we studied the spread of known glycosylation sitesacross di�erent categories�
N�glycosylation may also display some positional preferences in the proteinchain� Speci�cally� it has been shown that sites need to be ����� residues awayfrom the N�terminus �� and that glycosylation e�ciency is reduced within ��residues of the C�terminus ���
In our data set of approximately ����� human proteins� only � proteins�at �� con�rmed sites� were annotated in swiss�prot as N�glycosylated �notconsidering proteins with only potential or probable sites�� Figure � il�lustrates the spread of human glycosylation sites along the protein chain andacross predicted subcellular locations and keyword based assignment of cel�lular role categories� Relative positions of sites on proteins were calculatedwith respect to normalised sequence lengths� The sequence length� dividedinto tenths is shown along the x�axis� from the N�terminal start on the left tothe C�terminal end on the right�
N�glycosylated proteins appeared to almost exclusively belong to the func�tional category� �Transport and binding�� This may not be too surprising con�sidering that this category consists largely of membrane and secreted proteins�Only a few proteins belonged to any other cellular role category and most ofthese appeared involved in central intermediary metabolism� Subcellularly� ex�tracellular proteins were the most favoured and others occurred in membraneproteins and in the endoplasmic reticulum or Golgi�
A clear positional preference for glycosylation sites on protein chains wasapparent� The terminal ends of proteins seemed unfavourable and most sitesseemed to occur N�terminal to the centre of the protein chain ��� to ��� alongthe length from the N�terminal start�� The frequency of sites smoothly taperedo� on both ends from this peak with a longer C�terminal tail� This statisticalobservation agrees with speci�c experimental indications of a ����� residuedistance from the N�terminal and a �� residue distance from the C�terminalend ������ One peculiar observation from the �gure was the C�terminal sites innuclear proteins� On examination� these turned out to be around �� proteinswhich were indeed annotated to be N�glycosylated in the C�terminal� However�this seems to be an anomaly of the sub�cellular prediction by psort� For
Pacific Symposium on Biocomputing 7:310-322 (2002)
N−Glyc site positions across subcellular compartments
Relative position across protein chain −−>Nterm
−10%
10−20%
20−30%
30−40%
40−50%
50−60%
60−70%
70−80%
80−90%
90%−Cterm
Cytoplasmic
Endoplasmic Reticular/Golgi
Extracellular/Secreted
Lysosomal and Others
Membrane
Mitochondrial
Nuclear
0
27
N−Glyc site positions across cellular role categories
Relative position across protein chain −−>
Nterm−
10%
10−2
0%
20−3
0%
30−4
0%
40−5
0%
50−6
0%
60−7
0%
70−8
0%
80−9
0%
90%−
Cterm
Amino acid biosynthesis
Biosynthesis of cofactors
Cell envelope
Cellular processes
Central intermediary metabolism
Energy metabolism
Fatty acid and phospholipid metabolism
Other categories
Purines and pyrimidines
Regulatory functions
Replication
Transcription
Translation
Transport and binding proteins
0
56
Figure �� Categorical distribution of known N�glycosylation sites across the pro�tein chain� Colour indicates frequency of sites �green to pink in increasing order�� Pro�tein chains� normalised in length� are represented across the x�axis from N�terminal to C�terminal� Subcellular locations �top� were predicted using psort� and cellular role classi��cation �bottom� by lexical analysis of swiss�prot keywords �Alfonso Valencia et al��� MostN�glycosylation sites were clustered in the �rst half of all protein chains� and mainly occurredin extracellular transport and binding proteins�
Pacific Symposium on Biocomputing 7:310-322 (2002)
instance� some secreted proteins among these were Vasopressin�Neurophysin��Copeptin precursor� Von Willebrand Factor Precursor and ImmunoglobulinDelta Chain C�
Experimental determination of glycosylation sites is di�cult to achieve aslarge amounts of puri�ed protein are needed for the analysis of glycosylationsites� In addition� glycosylation can be an organism� and tissue speci�c event�Therefore only a few glycoproteins have been characterised so far as re�ectedin the low percentage of glycoprotein entries in swiss�prot �approx� ��� ofhuman proteins� see also ���� This motivates the need for developing theoreticalmeans of predicting the glycosylation potential of sequons�
��� O�linked GalNAc Glycosylation
The addition of GalNAc linked to serine or threonine residues of secreted andcell surface proteins� and further addition of Gal�GalNAc�GlcNAc residues ��is also known as mucin type glycosylation and is catalysed by a family ofUDP�N�acetylgalactosamine� polypeptide N�acetylgalactosaminyltransferases�GalNAc�transferases�� The modi�cation� a post�translational event� takesplace in the cis�Golgi compartment�� after N�glycosylation and folding of theprotein� and a�ects secreted and membrane bound proteins�
There is no acceptor motif de�ned for O�linked glycosylation� The onlycommon characteristic among most O�glycosylation sites is that they occur onserine and threonine residues in close vicinity to proline residues� and that theacceptor site is usually in a beta�conformation� A prediction method ����� forthis type of glycosylation on mammalian proteins has been built earlier andmade available as a web server a� A database of O�glycosylated sequences isalso availableb and was used in constructing the O�glycosylation site predictionmethods ���
Figure � shows the spread of predicted glycosylation sites �O�GalNAc�mucin�type� across di�erent categories and across the protein chain� To con�struct this plot� sequence lengths were normalised� and relative position ex�pressed on a percent ������� scale� Glycosylation sites were binned ��� binsacross each sequence�� and their frequency plotted across di�erent categories�Sites tend to cluster towards the C� and N�termini of proteins for some cat�egories� This �gure also shows that O�glycosylation acceptor sites occur ina wide range of proteins� though glycosylation patterns �frequency� positionsacross chain� may di�er for di�erent types of proteins�
ahttp�www�cbs�dtu�dkservicesNetOGlycbhttp�www�cbs�dtu�dkdatabasesOGLYCBASE
Pacific Symposium on Biocomputing 7:310-322 (2002)
2
4
6
8
10
12
14
16
18
20
Position
10080
6040
200
trans
port a
nd bi
nding
trans
lation
trans
cripti
on
replic
ation
regula
tory f
uncti
ons
purin
es an
d pyri
midine
s
fatty
acid
metabo
lism
energ
y meta
bolis
m
centr
al int
ermed
iary m
etabo
lism
cellu
lar pr
ocess
es
cell e
nvelo
pe
biosy
nthesi
s of c
ofacto
rs
amino
acid
biosy
nthesi
s
Figure � Postional O�GalNAc glycosylation� O�GalNAc �mucin type� glycosylationdisplays preference for position across a protein chain which could be signi�cant acrossdi�erent categories� The Position axis re�ects normalised protein chain length from N�terminal � on the axis� to C�terminal �� �� The height of the bars indicates the numberof predicted O�GalNAc sites �in � �� � human proteins� for a particular category in aparticular position bin�
Pacific Symposium on Biocomputing 7:310-322 (2002)
��� O�linked GlcNAc Glycosylation
Glycosylation of cytosolic and nuclear proteins by single N �acetylglucosamine�GlcNAc� monosaccharides is known to be highly dynamic and occurs on pro�teins with wide�ranging functions and cellular roles �����N �acetylglucosamine�donated by the nucleotide precursor UDP�N �acetylglucosamine� is attached ina beta�anomeric linkage to the hydroxyl group of serine or threonine residues�
So far� all proteins with O���GlcNAc linked residues� are also known to bephosphorylated� Evidence suggests that at least in some cases� these two post�translational modi�cation events may share a reciprocal relationship���� Thispeculiar behaviour strongly suggests a regulatory role for this modi�cation�Sites which can be both glycosylated and alternatively phosphorylated arealso known as �yin�yang� sites ��
The acceptor site for O���GlcNAc glycosylation does not display a de�niteconsensus sequence� nor are there many annotated sites in public databases�However� the fuzzy motif is marked by the close proximity of Proline andValine residues� a downstream tract of Serines and an absence of Leucine andGlutamine residues in the near vicinity �data not shown�� A prediction methodfor this type of glycosylation on human proteins has been built and madeavailable c as a web server �in preparation��
Out of approximately ����� human sequences from swiss�prot �rel� ��over ����� had at least one predicted O�GlcNAc site� ���� of these proteinshad at least one high scoring O�GlcNAc site prediction �with ���� high scoringSer�Thr sites�� A number of these were DNA�binding proteins and involved intranscriptional regulation� When ranked according to scores� a large fraction atthe top of this list were found to be nuclear proteins �as annotated in swiss�
prot�� The O�GlcNAc transferase itself �P��� subunit� was found to havepredicted O�GlcNAc sites�
To study if the O���GlcNAc modi�cation was speci�c for certain types ofproteins� we classi�ed the potentially modi�ed proteins into cellular role cat�egories and subcellular locations� Figure illustrates the spread of proteinswith at least one high�scoring O���GlcNAc site� across di�erent categories�Also shown in this �gure is the spread of phosphorylated proteins �as pre�dicted�� by NetPhosd�� �Yin�yang� proteins� proteins with pest regions�� andproteins with O���GlcNAc ����� sites which fall within pest regions�
chttp�www�cbs�dtu�dkservicesYinOYangdhttp�www�cbs�dtu�dkservicesNetPhos
Pacific Symposium on Biocomputing 7:310-322 (2002)
pest-glcnac
pest
glcnacyinyang
phos
pest-glcnac
pest
glcnacyinyang
phos
Distribution of sites across categories of (swissprot) human proteins
Subcellular locations
Cellular role categoriesCellular role categories for proteins were predictedusing a linguistic approach (on keywords) by Valencia et al.SWISS-PROT
Subcellular locations for protein wereobtained from a combination of annotations and predictions.SWISS-PROT PSORT
Figure �� Predicted O���GlcNAc sites across the human proteome� The two panels�top� bottom� indicate di�erent categorisations of proteins as depicted in the innermost andoutmost circles of the pies� Individual rings represent di�erent post�translational modi��cations and their occurrence in the corresponding category� E�g�� phosphorylation occurswidely across all categories of proteins� Potential O�GlcNAc sites occur in half of all nuclearproteins and regulatory proteins� They also occur widely in replication and transcriptionproteins� Proteins with pest regions and O�GlcNAc sites are mostly regulatory althoughpest regions themselves also occur in other categories�
Pacific Symposium on Biocomputing 7:310-322 (2002)
While the O���GlcNAc modi�cation seems to potentially a�ect almostall types of proteins� most O�GlcNAcylated proteins were either regulatoryproteins or �transport and binding� proteins� A large fraction of unclassi�edproteins ��unknown� in role categories� were also predicted to contain this modi��cation� Over half of all nuclear proteins contained a high ranking O���GlcNAcmodi�ed site� Cytoplasmic proteins� membrane proteins and secreted proteinsalso contained potential sites�
Phosphorylation is a very wide�spread modi�cation ��� This is re�ected inour graphs as phosphorylation sites �� �� potential by NetPhos� appeared wellrepresented in all protein categories� However� Yin�yang sites appeared to existlargely in regulatory proteins� transcription related proteins or �transport andbinding proteins�� and were mostly nuclear� O�GlcNAcylated pest regions werealso mostly nuclear� though a large membrane fraction also existed� Aroundhalf of all these proteins were involved in regulatory functions�
In an additional study� the number of potential O���GlcNAc sites in pro�teins was studied with respect to function and cellular location� Figure � illus�trates the number of predicted �high�scoring� sites per ��� Ser�Thr residues�per protein�� Proteins with ��� predicted GlcNAc sites �per ��� Ser�Thr�were predominantly nuclear� cytoplasmic or membrane proteins� Nuclear andcytoplasmic proteins carried the highest densities of sites� a few cytoplasmicproteins having as many as �� high�scoring O�GlcNAc sites among ��� Ser�Thrresidues� With respect to cellular roles� proteins belonging to the category�Purines� pyrimidines� nucleosides and nucleotides� contained well spaced outsites �only a few sites among ��� Ser�Thr residues�� Proteins with a widerdistribution of sites included regulatory� transcription� replication� �transportand binding�� cell envelope and the �unknown� category proteins� The high�est density of sites ����� per ��� Ser�Thr� was found in transcription andregulatory proteins� though some �unknown� proteins had over �� sites �per��� Ser�Thr�� In general� the intracellular O���GlcNAc modi�cation does notseem to cluster among close residues or display any characteristic spacing aswas evident for the O���GlcNAc modi�cation a�ecting surface and membraneproteins of Dictyostelium discoideum ���
Pacific Symposium on Biocomputing 7:310-322 (2002)
A
100
200
300
400
500
# O-GlcNAcs per 100 Ser/Thr
NuclearMitochondrial
MembraneLysosomal and Others
Extracellular/secreted
E.R./GolgiCytoplasmic
40
20
B
50
100
150
200
250
300
350
5040
3020
100
# O-GlcNAcs per 100 Ser/ThrUnknown
Transport a
nd binding proteins
Translat
ion
Transcri
ption
Replication
Regulatory functio
ns
Purines, pyrim
idines, nucleo
sides, and nucleo
tides
Other categ
ories
Fatty aci
d and phospholipid metabolism
Energy metab
olism
Central interm
ediary metab
olism
Cellular
processes
Cell envelo
pe
Biosynthesis of co
factors,
prosthetic groups, a
nd carrier
s
Amino acid biosynthesis
Figure �� Number of predicted O���GlcNAc sites per ��� Ser�Thr� in di�erentcategories of human proteins� �A� shows proteins in di�erent subcellular locations and�B� indicates cellular role categories� The z�scale � �� in A or ��� in B� is a frequencycount for a particular bin� e�g� � O�GlcNAcs �per � SerThr� occur most frequently fornuclear proteins in �A�� These modi�cations usually do not occur in clusters� Although po�tential acceptor sites are largely found in nuclearcytoplasmic proteins �usually regulatory��they also surprisingly occur in membrane proteins �mostly transport and binding proteins��
Pacific Symposium on Biocomputing 7:310-322 (2002)
Human proteome�wide scans revealed that the O���GlcNAc acceptor pat�tern occurs across a wide range of functional categories and subcellular com�partments� For humans� the most populated functional categories were regu�latory proteins and transport and binding proteins� Nuclear and cytoplasmicproteins were prominent� though membrane and secreted proteins were sur�prisingly also in high numbers� It is interesting to know that acceptor patternsexist on these proteins too� but the cellular machinery de�nes protein target�ting and consequently in�uences their modi�cations� The prediction serverguards against this possibility by generating a warning when a potential signalpeptide is detected by SignalP e�
PEST regions� rich in the amino acids Proline �P�� Glutamic acid �E��Serine �S� and Threonine �T�� are hypothesised to be degradative signals forconstitutive of conditional protein degradation��� Phosphorylation� a commonmechanism to activate the pest�mediated degradation pathway� may be sig�nalled by deglycosylation in the same region� Our scans revealed that a smallfraction of O�GlcNAc sites appeared in PEST regions� Such sites were mostlyfound in proteins involved in regulatory functions�
� Final Remarks
Glycosylation is clearly a modi�cation a�ecting a wide range of proteins� and isnow known to a�ect both intracellular and secreted proteins� Di�erent types ofglycosylation have varying site preferences on proteins� and occur in di�erentpatterns across the protein chain�
In a project �in preparation� predicting protein function solely from pro�tein chain global properties �molecular weight� length� etc�� and potentialpost�translational modi�cations� glycosylation was one of the most importantdeterminants for functional classi�cation�
Since characterising glycoproteins experimentally is a tedious and time�consuming task� it is worthwhile at this juncture to develop tools for predict�ing glycosylation sites� This is essential information for deciphering proteinfunction and characterising complete proteomes�
� Acknowledgements
The Danish National Research Foundation is acknowledged for support�
ehttp�www�cbs�dtu�dkservicesSignalP
Pacific Symposium on Biocomputing 7:310-322 (2002)
� References
�� H Lis and N Sharon� Protein glycosylation� Structural and functionalaspects� Cur� J� Biochem�� �������� � �
�� EF Hounsell� MJ Davies and DV Renouf� O�linked protein glycosylationstructure and function� Glycoconjugate J�� ��� ���� � ��
� MA Kukuruzinska� ML Bergh and BJ Jackson� Protein glycosylation inyeast� Annu� Rev� Biochem�� ��� ��� ��� � ��
�� TR Gemmill and RB Trimble� Overview of N� and O�linked oligosaccha�ride structures found in various yeast species� Biochim� Biophys� Acta������������� � �
�� M Ashburner� CA Ball� JA Blake� D Botstein� H Butler� JM Cherry�AP Davis� K Dolinski� SS Dwight� JT Eppig� MA Harris� DP Hill� L Issel�Tarver� A Kasarskis� S Lewis� JC Matese� JE Richardson� M Ringwald�GM Rubin and G Sherlock� Gene ontology� tool for the uni�cation ofbiology� The Gene Ontology Consortium� Nat� Genet�� ������� � �����
�� J Tamames� C Ouzounis� G Casari� C Sander and A Valencia� EUCLID�automatic classi�cation of proteins in functional classes by their databaseannotations� Bioinformatics� ���������� � �
�� C Blaschke� MA Andrade� C Ouzounis and A Valencia� Automaticextraction of biological information from scienti�c text� protein�proteininteractions� In Proc�� Intelligent Systems for Molecular Biology� pages������ Menlo Park� CA� � � AAAI Press�
� MA Andrade� C Ouzounis� C Sander� J Tamames and A Valencia� Func�tional classes in the three domains of life� J� Mol� Evol�� � ���������� �
� M Riley� Functions of the gene products of Escherichia coli� Microbiol�
Rev�� ������ ��� � ���� K Nakai and P Horton� PSORT� a program for detecting sorting signals
in proteins and predicting their subcellular localization� Trends Biochem�
Sci�� ������� � ���� IM Nilsson and G von Heijne� Determination of the distance between
the oligosaccharyltransferase active site and the endoplasmic reticulummembrane� J� Biol� Chem�� ����� ����� � �
��� I Nilsson and G von Heijne� Glycosylation e�ciency of Asn�Xaa�Thrsequons depends both on the distance from the C terminus and on thepresence of a downstream transmembrane segment� J� Biol� Chem������������� �����
�� R Apweiler� H Hermjakob and N Sharon� On the frequency of proteinglycosylation� as deduced from analysis of the SWISS�PROT database�
Pacific Symposium on Biocomputing 7:310-322 (2002)
Biochim� Biophys� Acta�� ������� � ���� J Roth� Y Wang� AE Eckhardt and RL Hill� Subcellular
localization of the UDP�N�acetyl�D�galactosamine� polypeptide N�acetylgalactosaminyltransferase�mediatedO�glycosylation reaction in thesubmaxillary gland� Proc� Natl� Acad� Sci� USA� �� �� � � ��
��� JE Hansen� O Lund� J Engelbrecht� H Bohr� JO Nielsen�JES Hansen and S Brunak� Prediction of O�glycosylation of mam�malian proteins� speci�city patterns of UDP�GalNAc�polypeptide N�acetylgalactosaminyltransferase� Biochem� J�� ������� � ��
��� JE Hansen� O Lund� N Tolstrup� AA Gooley� KL Williams� and SBrunak� NetOglyc� Prediction of mucin type O�glycosylation sites basedon sequence context and surface accessibility� Glycoconjugate J�� ���������� � �
��� R Gupta� H Birch� K Rapacki� S Brunak� and JE Hansen� O�GLYCBASEversion ���� a revised database of O�glycosylated proteins� Nucleic AcidsRes�� ��������� � �
�� GW Hart� KD Greis� LY Dong� MA Blomberg� TY Chou� MS Jiang�EP Roquemore� DM Snow� LK Kreppel and RN Cole� O�linked N�acetylglucosamine� the �yin�yang� of Ser�Thr phosphorylation� Nuclearand cytoplasmic glycosylation� Adv� Exp� Med� Biol�� ����������� ��
� � DM Snow and GW Hart� Nuclear and Cytoplasmic Glycosylation� Int�
Rev� Cytol�� �������� � ���� FI Comer and GW Hart� O�Glycosylation of Nuclear and Cytosolic
Proteins� Dynamic Interplay Between O�GlcNAc and O�Phosphate� J�
Biol� Chem�� ����� �� �� ��� �������� N Blom� S Gammeltoft� and S Brunak� Sequence and structure�based
prediction of eukaryotic protein phosphorylation sites� J� Mol� Biol��� ���������� � �
��� M Rechsteiner and SW Rogers� PEST sequences and regulation byproteolysis� Trends Biochem� Sci�� ����������� � ��
�� EG Krebs� The growth of research on protein phosphorylation� Trends
Biochem� Sci�� � �� � � ����� R Gupta� E Jung� AA Gooley� KL Williams� S Brunak� and J Hansen�
Scanning the available Dictyostelium discoideum proteome for O�linkedGlcNAc glycosylation sites using neural networks� Glycobiology� ���� ������ � �
Pacific Symposium on Biocomputing 7:310-322 (2002)