+ All Categories
Home > Documents > Pacific Symposium on Biocomputing 7:310-322...

Pacific Symposium on Biocomputing 7:310-322...

Date post: 23-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
13
Pacific Symposium on Biocomputing 7:310-322 (2002)
Transcript
Page 1: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

Prediction of glycosylation across the human proteome and the

correlation to protein function

Ramneek Gupta and S�ren Brunak

Center for Biological Sequence Analysis� Bldg����� Bio�Centrum

Technical University of Denmark� DK����� Lyngby� Denmark�

� Introduction

The addition of a carbohydrate moeity to the side�chain of a residue in aprotein chain in�uences the physicochemical properties of the protein� Gly�cosylation is known to alter proteolytic resistance� protein solubility� stability�local structure� lifetime in circulation and immunogenicity����

Of the various forms of protein glycosylation found in eukaryotic systems�the most important types are N�linked� O�linked GalNAc �mucin�type� and O���linked GlcNAc �intracellular�nuclear� glycosylation� N�linked glycosylationis a co�translational process involving the transfer of the precursor oligosac�charide� GlcNAc�Man�Glc�� to asparagine residues in the protein chain� Theasparagine usually occurs in a sequon Asn�Xaa�Ser�Thr� where Xaa is notProline� This is however� not a speci�c consensus since not all such sequonsare modi�ed in the cell� O�linked glycosylation involves the post�translationaltransfer of an oligosaccharide to a serine or threonine residue� In this case�there is no well�de�ned motif for the acceptor site other than the near vicinityof proline and valine residues�

We have developed glycosylation site prediction methods for these threetypes of glycosylation� using arti�cial neural networks that examine correla�tions in the local sequence context and surface accessibility� In this paper� wehave used glycosylation site information on human proteins to illustrate thecontribution of glycosylation to protein function and assess how widespreadthis modi�cation is across the human proteome�

� Methods

��� Data set

Analysis shown in this paper was derived on a set of human proteins obtainedfrom the swiss�prot �rel� � database� This consisted of ��� � well anno�tated proteins� We chose to work with proteins from a single organism i�e�humans� to restrict the diversity of oligosaccharyltransferase acceptor sites�

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 2: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

Glycosylation in simple organisms� such as yeast� is well studied ���� but theirglycans are usually high mannosylated structures� and it is not clear how sim�ilar their mechanism of glycosylation is to that of humans� Combining datafrom di�erent organisms would complicate the analysis� possible �families� ofacceptor speci�cities causing ambiguity in distinguishing acceptor �positive�sites from non�acceptor �negative� sites�

��� Functional categories for proteins

De�ning protein function is a complicated task� and there are many di�erentways of describing the roles and functions of a protein in a cell� This is thetopic of many on�going ontology projects�� Here we chose to use a cellular roledescriptor and subcellular location as our categorisations�

�� categories �� de�ned� � unknown� re�ective of the �cellular role� of theprotein in the cell were employed �as shown in Figure ��� The automatic classassignment to sequences was made by an extension of the Euclid system per�forming a lingustic analysis and clustering of swiss�prot keywords ���� Key�words were parsed for the human proteins in swiss�prot� For each functionalclass� the informative weight �Z�score� of each keyword was extracted from adictionary �� Keyword sums gaves scores to all categories for a particular se�quence� The central point of the Euclid system is the dictionary� The primaryversion of this dictionary was generated from an initial set of carefully� handannotated proteins from di�erent organisms spanning every kingdom of life�From this initial set� a �rst dictionary was de�ned which was used to assign allswiss�prot proteins and the process of dictionary de�nition and assignmentwas reiterated until convergence� This �nal dictionary obtained was used toassign functional classes to around ����� human proteins from swiss�prot�

The cellular role categories themselves were derived from an earlier pro�posed scheme for Escherichia coli � which was later extended by the TIGRgroup for other complete genomes� These categories comprise � functionalclasses which are subsets of three superclasses� Energy� Communication andInformation� Proteins which do not �t in the � categories are assigned to�Other� �functionally unde�ned cluster� or to �Unknown� �sequences which donot contain the relevant keywords needed for classi�cation in the above sys�tem��

Subcellular locations of proteins were obtained from swiss�prot annota�tions and psort predictions� �where no parsable swiss�prot annotation wasfound��

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 3: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

� Results

��� N�Glycosylation

N�linked glycosylation modi�es membrane and secreted proteins� This co�translational process occurs in the endoplasmic reticulum and is known toin�uence protein folding� The modi�cation attributes various functional prop�erties to a protein� To examine if certain categories of proteins were more proneto glycosylation than others� we studied the spread of known glycosylation sitesacross di�erent categories�

N�glycosylation may also display some positional preferences in the proteinchain� Speci�cally� it has been shown that sites need to be ����� residues awayfrom the N�terminus �� and that glycosylation e�ciency is reduced within ��residues of the C�terminus ���

In our data set of approximately ����� human proteins� only � proteins�at �� con�rmed sites� were annotated in swiss�prot as N�glycosylated �notconsidering proteins with only potential or probable sites�� Figure � il�lustrates the spread of human glycosylation sites along the protein chain andacross predicted subcellular locations and keyword based assignment of cel�lular role categories� Relative positions of sites on proteins were calculatedwith respect to normalised sequence lengths� The sequence length� dividedinto tenths is shown along the x�axis� from the N�terminal start on the left tothe C�terminal end on the right�

N�glycosylated proteins appeared to almost exclusively belong to the func�tional category� �Transport and binding�� This may not be too surprising con�sidering that this category consists largely of membrane and secreted proteins�Only a few proteins belonged to any other cellular role category and most ofthese appeared involved in central intermediary metabolism� Subcellularly� ex�tracellular proteins were the most favoured and others occurred in membraneproteins and in the endoplasmic reticulum or Golgi�

A clear positional preference for glycosylation sites on protein chains wasapparent� The terminal ends of proteins seemed unfavourable and most sitesseemed to occur N�terminal to the centre of the protein chain ��� to ��� alongthe length from the N�terminal start�� The frequency of sites smoothly taperedo� on both ends from this peak with a longer C�terminal tail� This statisticalobservation agrees with speci�c experimental indications of a ����� residuedistance from the N�terminal and a �� residue distance from the C�terminalend ������ One peculiar observation from the �gure was the C�terminal sites innuclear proteins� On examination� these turned out to be around �� proteinswhich were indeed annotated to be N�glycosylated in the C�terminal� However�this seems to be an anomaly of the sub�cellular prediction by psort� For

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 4: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

N−Glyc site positions across subcellular compartments

Relative position across protein chain −−>Nterm

−10%

10−20%

20−30%

30−40%

40−50%

50−60%

60−70%

70−80%

80−90%

90%−Cterm

Cytoplasmic

Endoplasmic Reticular/Golgi

Extracellular/Secreted

Lysosomal and Others

Membrane

Mitochondrial

Nuclear

0

27

N−Glyc site positions across cellular role categories

Relative position across protein chain −−>

Nterm−

10%

10−2

0%

20−3

0%

30−4

0%

40−5

0%

50−6

0%

60−7

0%

70−8

0%

80−9

0%

90%−

Cterm

Amino acid biosynthesis

Biosynthesis of cofactors

Cell envelope

Cellular processes

Central intermediary metabolism

Energy metabolism

Fatty acid and phospholipid metabolism

Other categories

Purines and pyrimidines

Regulatory functions

Replication

Transcription

Translation

Transport and binding proteins

0

56

Figure �� Categorical distribution of known N�glycosylation sites across the pro�tein chain� Colour indicates frequency of sites �green to pink in increasing order�� Pro�tein chains� normalised in length� are represented across the x�axis from N�terminal to C�terminal� Subcellular locations �top� were predicted using psort� and cellular role classi��cation �bottom� by lexical analysis of swiss�prot keywords �Alfonso Valencia et al��� MostN�glycosylation sites were clustered in the �rst half of all protein chains� and mainly occurredin extracellular transport and binding proteins�

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 5: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

instance� some secreted proteins among these were Vasopressin�Neurophysin��Copeptin precursor� Von Willebrand Factor Precursor and ImmunoglobulinDelta Chain C�

Experimental determination of glycosylation sites is di�cult to achieve aslarge amounts of puri�ed protein are needed for the analysis of glycosylationsites� In addition� glycosylation can be an organism� and tissue speci�c event�Therefore only a few glycoproteins have been characterised so far as re�ectedin the low percentage of glycoprotein entries in swiss�prot �approx� ��� ofhuman proteins� see also ���� This motivates the need for developing theoreticalmeans of predicting the glycosylation potential of sequons�

��� O�linked GalNAc Glycosylation

The addition of GalNAc linked to serine or threonine residues of secreted andcell surface proteins� and further addition of Gal�GalNAc�GlcNAc residues ��is also known as mucin type glycosylation and is catalysed by a family ofUDP�N�acetylgalactosamine� polypeptide N�acetylgalactosaminyltransferases�GalNAc�transferases�� The modi�cation� a post�translational event� takesplace in the cis�Golgi compartment�� after N�glycosylation and folding of theprotein� and a�ects secreted and membrane bound proteins�

There is no acceptor motif de�ned for O�linked glycosylation� The onlycommon characteristic among most O�glycosylation sites is that they occur onserine and threonine residues in close vicinity to proline residues� and that theacceptor site is usually in a beta�conformation� A prediction method ����� forthis type of glycosylation on mammalian proteins has been built earlier andmade available as a web server a� A database of O�glycosylated sequences isalso availableb and was used in constructing the O�glycosylation site predictionmethods ���

Figure � shows the spread of predicted glycosylation sites �O�GalNAc�mucin�type� across di�erent categories and across the protein chain� To con�struct this plot� sequence lengths were normalised� and relative position ex�pressed on a percent ������� scale� Glycosylation sites were binned ��� binsacross each sequence�� and their frequency plotted across di�erent categories�Sites tend to cluster towards the C� and N�termini of proteins for some cat�egories� This �gure also shows that O�glycosylation acceptor sites occur ina wide range of proteins� though glycosylation patterns �frequency� positionsacross chain� may di�er for di�erent types of proteins�

ahttp�www�cbs�dtu�dkservicesNetOGlycbhttp�www�cbs�dtu�dkdatabasesOGLYCBASE

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 6: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

2

4

6

8

10

12

14

16

18

20

Position

10080

6040

200

trans

port a

nd bi

nding

trans

lation

trans

cripti

on

replic

ation

regula

tory f

uncti

ons

purin

es an

d pyri

midine

s

fatty

acid

metabo

lism

energ

y meta

bolis

m

centr

al int

ermed

iary m

etabo

lism

cellu

lar pr

ocess

es

cell e

nvelo

pe

biosy

nthesi

s of c

ofacto

rs

amino

acid

biosy

nthesi

s

Figure � Postional O�GalNAc glycosylation� O�GalNAc �mucin type� glycosylationdisplays preference for position across a protein chain which could be signi�cant acrossdi�erent categories� The Position axis re�ects normalised protein chain length from N�terminal � on the axis� to C�terminal �� �� The height of the bars indicates the numberof predicted O�GalNAc sites �in � �� � human proteins� for a particular category in aparticular position bin�

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 7: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

��� O�linked GlcNAc Glycosylation

Glycosylation of cytosolic and nuclear proteins by single N �acetylglucosamine�GlcNAc� monosaccharides is known to be highly dynamic and occurs on pro�teins with wide�ranging functions and cellular roles �����N �acetylglucosamine�donated by the nucleotide precursor UDP�N �acetylglucosamine� is attached ina beta�anomeric linkage to the hydroxyl group of serine or threonine residues�

So far� all proteins with O���GlcNAc linked residues� are also known to bephosphorylated� Evidence suggests that at least in some cases� these two post�translational modi�cation events may share a reciprocal relationship���� Thispeculiar behaviour strongly suggests a regulatory role for this modi�cation�Sites which can be both glycosylated and alternatively phosphorylated arealso known as �yin�yang� sites ��

The acceptor site for O���GlcNAc glycosylation does not display a de�niteconsensus sequence� nor are there many annotated sites in public databases�However� the fuzzy motif is marked by the close proximity of Proline andValine residues� a downstream tract of Serines and an absence of Leucine andGlutamine residues in the near vicinity �data not shown�� A prediction methodfor this type of glycosylation on human proteins has been built and madeavailable c as a web server �in preparation��

Out of approximately ����� human sequences from swiss�prot �rel� ��over ����� had at least one predicted O�GlcNAc site� ���� of these proteinshad at least one high scoring O�GlcNAc site prediction �with ���� high scoringSer�Thr sites�� A number of these were DNA�binding proteins and involved intranscriptional regulation� When ranked according to scores� a large fraction atthe top of this list were found to be nuclear proteins �as annotated in swiss�

prot�� The O�GlcNAc transferase itself �P��� subunit� was found to havepredicted O�GlcNAc sites�

To study if the O���GlcNAc modi�cation was speci�c for certain types ofproteins� we classi�ed the potentially modi�ed proteins into cellular role cat�egories and subcellular locations� Figure illustrates the spread of proteinswith at least one high�scoring O���GlcNAc site� across di�erent categories�Also shown in this �gure is the spread of phosphorylated proteins �as pre�dicted�� by NetPhosd�� �Yin�yang� proteins� proteins with pest regions�� andproteins with O���GlcNAc ����� sites which fall within pest regions�

chttp�www�cbs�dtu�dkservicesYinOYangdhttp�www�cbs�dtu�dkservicesNetPhos

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 8: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

pest-glcnac

pest

glcnacyinyang

phos

pest-glcnac

pest

glcnacyinyang

phos

Distribution of sites across categories of (swissprot) human proteins

Subcellular locations

Cellular role categoriesCellular role categories for proteins were predictedusing a linguistic approach (on keywords) by Valencia et al.SWISS-PROT

Subcellular locations for protein wereobtained from a combination of annotations and predictions.SWISS-PROT PSORT

Figure �� Predicted O���GlcNAc sites across the human proteome� The two panels�top� bottom� indicate di�erent categorisations of proteins as depicted in the innermost andoutmost circles of the pies� Individual rings represent di�erent post�translational modi��cations and their occurrence in the corresponding category� E�g�� phosphorylation occurswidely across all categories of proteins� Potential O�GlcNAc sites occur in half of all nuclearproteins and regulatory proteins� They also occur widely in replication and transcriptionproteins� Proteins with pest regions and O�GlcNAc sites are mostly regulatory althoughpest regions themselves also occur in other categories�

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 9: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

While the O���GlcNAc modi�cation seems to potentially a�ect almostall types of proteins� most O�GlcNAcylated proteins were either regulatoryproteins or �transport and binding� proteins� A large fraction of unclassi�edproteins ��unknown� in role categories� were also predicted to contain this modi��cation� Over half of all nuclear proteins contained a high ranking O���GlcNAcmodi�ed site� Cytoplasmic proteins� membrane proteins and secreted proteinsalso contained potential sites�

Phosphorylation is a very wide�spread modi�cation ��� This is re�ected inour graphs as phosphorylation sites �� �� potential by NetPhos� appeared wellrepresented in all protein categories� However� Yin�yang sites appeared to existlargely in regulatory proteins� transcription related proteins or �transport andbinding proteins�� and were mostly nuclear� O�GlcNAcylated pest regions werealso mostly nuclear� though a large membrane fraction also existed� Aroundhalf of all these proteins were involved in regulatory functions�

In an additional study� the number of potential O���GlcNAc sites in pro�teins was studied with respect to function and cellular location� Figure � illus�trates the number of predicted �high�scoring� sites per ��� Ser�Thr residues�per protein�� Proteins with ��� predicted GlcNAc sites �per ��� Ser�Thr�were predominantly nuclear� cytoplasmic or membrane proteins� Nuclear andcytoplasmic proteins carried the highest densities of sites� a few cytoplasmicproteins having as many as �� high�scoring O�GlcNAc sites among ��� Ser�Thrresidues� With respect to cellular roles� proteins belonging to the category�Purines� pyrimidines� nucleosides and nucleotides� contained well spaced outsites �only a few sites among ��� Ser�Thr residues�� Proteins with a widerdistribution of sites included regulatory� transcription� replication� �transportand binding�� cell envelope and the �unknown� category proteins� The high�est density of sites ����� per ��� Ser�Thr� was found in transcription andregulatory proteins� though some �unknown� proteins had over �� sites �per��� Ser�Thr�� In general� the intracellular O���GlcNAc modi�cation does notseem to cluster among close residues or display any characteristic spacing aswas evident for the O���GlcNAc modi�cation a�ecting surface and membraneproteins of Dictyostelium discoideum ���

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 10: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

A

100

200

300

400

500

# O-GlcNAcs per 100 Ser/Thr

NuclearMitochondrial

MembraneLysosomal and Others

Extracellular/secreted

E.R./GolgiCytoplasmic

40

20

B

50

100

150

200

250

300

350

5040

3020

100

# O-GlcNAcs per 100 Ser/ThrUnknown

Transport a

nd binding proteins

Translat

ion

Transcri

ption

Replication

Regulatory functio

ns

Purines, pyrim

idines, nucleo

sides, and nucleo

tides

Other categ

ories

Fatty aci

d and phospholipid metabolism

Energy metab

olism

Central interm

ediary metab

olism

Cellular

processes

Cell envelo

pe

Biosynthesis of co

factors,

prosthetic groups, a

nd carrier

s

Amino acid biosynthesis

Figure �� Number of predicted O���GlcNAc sites per ��� Ser�Thr� in di�erentcategories of human proteins� �A� shows proteins in di�erent subcellular locations and�B� indicates cellular role categories� The z�scale � �� in A or ��� in B� is a frequencycount for a particular bin� e�g� � O�GlcNAcs �per � SerThr� occur most frequently fornuclear proteins in �A�� These modi�cations usually do not occur in clusters� Although po�tential acceptor sites are largely found in nuclearcytoplasmic proteins �usually regulatory��they also surprisingly occur in membrane proteins �mostly transport and binding proteins��

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 11: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

Human proteome�wide scans revealed that the O���GlcNAc acceptor pat�tern occurs across a wide range of functional categories and subcellular com�partments� For humans� the most populated functional categories were regu�latory proteins and transport and binding proteins� Nuclear and cytoplasmicproteins were prominent� though membrane and secreted proteins were sur�prisingly also in high numbers� It is interesting to know that acceptor patternsexist on these proteins too� but the cellular machinery de�nes protein target�ting and consequently in�uences their modi�cations� The prediction serverguards against this possibility by generating a warning when a potential signalpeptide is detected by SignalP e�

PEST regions� rich in the amino acids Proline �P�� Glutamic acid �E��Serine �S� and Threonine �T�� are hypothesised to be degradative signals forconstitutive of conditional protein degradation��� Phosphorylation� a commonmechanism to activate the pest�mediated degradation pathway� may be sig�nalled by deglycosylation in the same region� Our scans revealed that a smallfraction of O�GlcNAc sites appeared in PEST regions� Such sites were mostlyfound in proteins involved in regulatory functions�

� Final Remarks

Glycosylation is clearly a modi�cation a�ecting a wide range of proteins� and isnow known to a�ect both intracellular and secreted proteins� Di�erent types ofglycosylation have varying site preferences on proteins� and occur in di�erentpatterns across the protein chain�

In a project �in preparation� predicting protein function solely from pro�tein chain global properties �molecular weight� length� etc�� and potentialpost�translational modi�cations� glycosylation was one of the most importantdeterminants for functional classi�cation�

Since characterising glycoproteins experimentally is a tedious and time�consuming task� it is worthwhile at this juncture to develop tools for predict�ing glycosylation sites� This is essential information for deciphering proteinfunction and characterising complete proteomes�

� Acknowledgements

The Danish National Research Foundation is acknowledged for support�

ehttp�www�cbs�dtu�dkservicesSignalP

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 12: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

� References

�� H Lis and N Sharon� Protein glycosylation� Structural and functionalaspects� Cur� J� Biochem�� �������� � �

�� EF Hounsell� MJ Davies and DV Renouf� O�linked protein glycosylationstructure and function� Glycoconjugate J�� ��� ���� � ��

� MA Kukuruzinska� ML Bergh and BJ Jackson� Protein glycosylation inyeast� Annu� Rev� Biochem�� ��� ��� ��� � ��

�� TR Gemmill and RB Trimble� Overview of N� and O�linked oligosaccha�ride structures found in various yeast species� Biochim� Biophys� Acta������������� � �

�� M Ashburner� CA Ball� JA Blake� D Botstein� H Butler� JM Cherry�AP Davis� K Dolinski� SS Dwight� JT Eppig� MA Harris� DP Hill� L Issel�Tarver� A Kasarskis� S Lewis� JC Matese� JE Richardson� M Ringwald�GM Rubin and G Sherlock� Gene ontology� tool for the uni�cation ofbiology� The Gene Ontology Consortium� Nat� Genet�� ������� � �����

�� J Tamames� C Ouzounis� G Casari� C Sander and A Valencia� EUCLID�automatic classi�cation of proteins in functional classes by their databaseannotations� Bioinformatics� ���������� � �

�� C Blaschke� MA Andrade� C Ouzounis and A Valencia� Automaticextraction of biological information from scienti�c text� protein�proteininteractions� In Proc�� Intelligent Systems for Molecular Biology� pages������ Menlo Park� CA� � � AAAI Press�

� MA Andrade� C Ouzounis� C Sander� J Tamames and A Valencia� Func�tional classes in the three domains of life� J� Mol� Evol�� � ���������� �

� M Riley� Functions of the gene products of Escherichia coli� Microbiol�

Rev�� ������ ��� � ���� K Nakai and P Horton� PSORT� a program for detecting sorting signals

in proteins and predicting their subcellular localization� Trends Biochem�

Sci�� ������� � ���� IM Nilsson and G von Heijne� Determination of the distance between

the oligosaccharyltransferase active site and the endoplasmic reticulummembrane� J� Biol� Chem�� ����� ����� � �

��� I Nilsson and G von Heijne� Glycosylation e�ciency of Asn�Xaa�Thrsequons depends both on the distance from the C terminus and on thepresence of a downstream transmembrane segment� J� Biol� Chem������������� �����

�� R Apweiler� H Hermjakob and N Sharon� On the frequency of proteinglycosylation� as deduced from analysis of the SWISS�PROT database�

Pacific Symposium on Biocomputing 7:310-322 (2002)

Page 13: Pacific Symposium on Biocomputing 7:310-322 (2002)psb.stanford.edu/psb-online/proceedings/psb02/gupta.pdfdev elop ed glycosylation site prediction metho ds for these three t yp es

Biochim� Biophys� Acta�� ������� � ���� J Roth� Y Wang� AE Eckhardt and RL Hill� Subcellular

localization of the UDP�N�acetyl�D�galactosamine� polypeptide N�acetylgalactosaminyltransferase�mediatedO�glycosylation reaction in thesubmaxillary gland� Proc� Natl� Acad� Sci� USA� �� �� � � ��

��� JE Hansen� O Lund� J Engelbrecht� H Bohr� JO Nielsen�JES Hansen and S Brunak� Prediction of O�glycosylation of mam�malian proteins� speci�city patterns of UDP�GalNAc�polypeptide N�acetylgalactosaminyltransferase� Biochem� J�� ������� � ��

��� JE Hansen� O Lund� N Tolstrup� AA Gooley� KL Williams� and SBrunak� NetOglyc� Prediction of mucin type O�glycosylation sites basedon sequence context and surface accessibility� Glycoconjugate J�� ���������� � �

��� R Gupta� H Birch� K Rapacki� S Brunak� and JE Hansen� O�GLYCBASEversion ���� a revised database of O�glycosylated proteins� Nucleic AcidsRes�� ��������� � �

�� GW Hart� KD Greis� LY Dong� MA Blomberg� TY Chou� MS Jiang�EP Roquemore� DM Snow� LK Kreppel and RN Cole� O�linked N�acetylglucosamine� the �yin�yang� of Ser�Thr phosphorylation� Nuclearand cytoplasmic glycosylation� Adv� Exp� Med� Biol�� ����������� ��

� � DM Snow and GW Hart� Nuclear and Cytoplasmic Glycosylation� Int�

Rev� Cytol�� �������� � ���� FI Comer and GW Hart� O�Glycosylation of Nuclear and Cytosolic

Proteins� Dynamic Interplay Between O�GlcNAc and O�Phosphate� J�

Biol� Chem�� ����� �� �� ��� �������� N Blom� S Gammeltoft� and S Brunak� Sequence and structure�based

prediction of eukaryotic protein phosphorylation sites� J� Mol� Biol��� ���������� � �

��� M Rechsteiner and SW Rogers� PEST sequences and regulation byproteolysis� Trends Biochem� Sci�� ����������� � ��

�� EG Krebs� The growth of research on protein phosphorylation� Trends

Biochem� Sci�� � �� � � ����� R Gupta� E Jung� AA Gooley� KL Williams� S Brunak� and J Hansen�

Scanning the available Dictyostelium discoideum proteome for O�linkedGlcNAc glycosylation sites using neural networks� Glycobiology� ���� ������ � �

Pacific Symposium on Biocomputing 7:310-322 (2002)


Recommended