Glycoinformaticstoolstoanalyzeandcuratelargescaleexperimentaldatasets
Sriram Neelamegham
Departments of Chemical & Biological Engineering, Biomedical Engineering and Medicine
State University of New York, Buffalo, NY
9:10-9:50amMarch6,2018Tokyo,Japan
Overviewofresearchinterests
SystemsBiologyInput-outputrelationship
Inflammation
Circulation.2003;107:929-934
Regenerativemedicine
ATVB,25:1321,2005.
Thrombosis
Input-outputresponse
• TheirgenerationWet-lab:NextGenerationSeq.,LC-MSwithCRISPR-Cas9perturbations
• Theirvisualization,analysisandsimulationDrylab:LC-MSdataanalysisprograms,Pathwaymaps
VirtualGlycome.org:Systemslevelviewofglycosylation
DNA
RNA
Protein
ktranslation,ENZ ktranslation,SCAFFOLD• 5’-capping&polyAtail• Codonusage• Ribosomebindingaffinity• miRNA• RNAbindingproteins
GlycoEnzyme
ktranscription• TranscriptionFactoractivity• Epigeneticregulation
Endoplasmicreticulum&Golgicisternae
(Cis,Media,Trans,TGN)leadingtosecretory
vesicles
kdegrad.,SCAFF.
kdegrad.,RNA
kdegrad.,ENZ
Feed
backregu
latio
n
kdegrad.,GLYCOP.
• GolgiTransport• Glycoenzymelocalization&reactionrates• Sugar-nucleotidebiosynth.
kglycosylation
Glycoprotein
Neelamegham&MahalCurrOpinStructBiol.40:145-152,2016
Open-sourceintegrationofknowledgeacrossscales
• GNAT-Web:Glycosylationnetworkanalysistoolbox
• DrawGlycan-SNFG:SimpletooltoconvertIUPACstringstoSNFGsketches
• GlycoPAT:High-throughputanalysisofLC-MSndata,withfocusonglycoProteomics
1.GNAT-WebGlycosylationNetworkAnalysisToolbox
• DefineglycoEnzymesinsilico• DevelopreactionnetworkfromRNA-SeqamdMSdataprocessing
• Eventually,simulatereactionnetworkstobridgedataacrossscales
YusenZhou
LiuG,etal.Bioinformatics.24(23):2740-7,2008;Glycobiology.21(12):1541-53,2011;Bioinformatics.29(3):404-6,2013;PLoSOne.9(6):e100939,2014.
XMLbasedglycoenzymedefinitionGlyco-enzyme DB
Enzyme entry
General Information
Related BioPath
Related Database
Enzyme Specificity
Gene Symbol
Protein Name
Enzyme Class
CAZy Family
E.C.No
Reaction
KEGG ORTHOLOGY
KEGG PATHWAY
Organism
Uniprot
NCBI GeneID
GlyMap GeneID
DNA_RefSeq
Protein RefSeq
BRENDA
OMIM number
Glycan Type
Compartment
Substrate
Add/Remove
Preceding
Substrate Constraints
Product Constraints
Kinetic Parameters (MathML) Vm (Max velocity)
Km (Michaelis-Menten constant)
Units
MGAT4A
Enzymerule Value
Add GlcNAc(b1-4)
Substrate ^Man(a1-3)
Preceding GlcNAc(b1-2)
^: Caret is space for inclusion of arbitrary branches
References: a. Bennun et al. PLoS Comput Biol. 9(1):e1002813, 2013; b. Taniguchi, Fukuda, Narimatsu, Angata, Handbook of Glycosyltransferases and Related Genes. c. http://acgg.asia/db/ggdb/(glycogenedatabase)
Enzymespecificity(e.g.MGAT4)
add GlcNAc(β1-4) to substrate ‘^Man(a1-3)’, provided it is preceded by GlcNAc(b1-2)
Constraint: MGAT4 acts before addition of: a. Galactose (Gal), i.e. Gal cannot exist in string or Gal#0 b. Bisecting MGAT3, i.e. GlcNAc(b1-4)^Man(b1-4)#0
Maximum # = 0 indicates NOT; But it could be any other number as well
SubstConstraints Value
MaxSubsubst Gal#0&GlcNAc(b1-4)^Man(b1-4)#0
MGAT4A
Enzymerule Value
Add/Remove GlcNAc(b1-4)
Substrate ^Man(a1-3)
Preceding GlcNAc(b1-2)
^: Caret is space for inclusion of arbitrary branches
References: a. Bennun et al. PLoS Comput Biol. 9(1):e1002813, 2013; b. Taniguchi, Fukuda, Narimatsu, Angata, Handbook of Glycosyltransferases and Related Genes. c. http://acgg.asia/db/ggdb/(glycogenedatabase)
Enzymespecificity(e.g.MGAT4)
add GlcNAc(β1-4) to substrate ‘^Man(a1-3)’, provided it is preceded by GlcNAc(b1-2)
Constraint: MGAT4 acts before addition of: a. Galactose (Gal), i.e. Gal cannot exist in string or Gal#0 b. Bisecting MGAT3, i.e. GlcNAc(b1-4)^Man(b1-4)#0
Maximum # = 0 indicates NOT; But it could be any other number as well
SubstConstraints Value
MaxSubsubst Gal#0&GlcNAc(b1-4)^Man(b1-4)#0
Integrationintoenzymedatabase
<?xml version="1.0" encoding="UTF-8"?> <EnzymeDB> <Enzyme EnzClass=“…"> <GeneralInfo>…</GeneralInfo> <BioPath>…</BioPath> <RelatedDB>…<RelatedDB> <EnzSpecificity> … </EnzSpecificity> <EnzKinetics>…</ EnzKinetics > </Enzyme> <Enzyme EnzClass =“…"> … </Enzyme> … </EnzymeDB>
XML data structure for glycoenzyme
Example
Custom database generation
Obtain from existing databases
Enzyme specificity &
kinetics
View database elements
Ref:3rdEd.ofEssentialsofGlycobiology
Create pathways : forward
Specify starting material and enzymes
Constraints to limit network size
Enzyme MAN1A1 MAN1B1 MAN2A1 MGAT1 MGAT2 MGAT3 MGAT4A MGAT5 B4GALT1 B3GNT2
Create pathways :reverse
Specify ending glycans & enzymes
Enzyme MAN1A1 MAN1B1 MAN2A1 MGAT1 MGAT2 MGAT3 MGAT4A MGAT5
B4GALT1
Pathway: 4 compartment CSTR
0(CSTR)
Cis-
Dtransp. = 10.8/h
Medial- Trans- TGN
Gly1
Gly2
Speciesbalanceequation:𝑑[ 𝐺𝑙𝑦↓𝑖,𝑗 ]/𝑑𝑡 = 𝐷↓𝑡𝑟𝑎𝑛𝑠𝑝. ×[𝐺𝑙𝑦↓𝑖,𝑗−1 ]− 𝑉↓𝑖,𝑗 ×[𝐺𝑙𝑦↓𝑖,𝑗 ]/𝑉𝑐×𝐾𝑚↓𝑖,𝑗 ×(1+∑𝑘↑▒[𝐺𝑙𝑦↓𝑘,𝑗 ]∕(𝐾𝑚↓𝑘,𝑗 ) ) − 𝐷↓𝑡𝑟𝑎𝑛𝑠𝑝. ×[𝐺𝑙𝑦↓𝑖,𝑗 ]
in Reaction out
i=36 glycans j=4 compartment
Gly1,2=8.32 ×10-9 pmol/h.
Vc= 2.5 um3;
In silico simulation
• Deterministic • Stochastic
High mannose Bi- Tri- Tetra- Bisecting
Cis- 11.62% 54.64% 4.61% 0.06% 48.38%
Medial- 2.73% 80.26% 8.85% 0.07% 70.33%
Trans- 2.44% 82.61% 9.12% 0.07% 71.11%
TGN 2.43% 82.73% 9.13% 0.07% 71.11%
O-glycosylation
Glycolipid biosynth.
Other pathways
Forwardinference#ofspecies #ofreactions Time
144 300 ~8s405 692 ~30s916 3330 ~70s166 452 ~11s
Reverseinference160 526 ~12s181 611 ~14s356 1246 ~37s96 267 ~6s
Pathway generation times: short!
What else could this be useful for?
*CollaborationwithLaraMahal(NYU)
• MappingRNA-SeqandGlycomicsdatatocontructpathwaymaps.
• GlycoMir:TheglycogenemicroRNAtargets• Functionalintegrationwithotherdatabases
2.DrawGlycan-SNFG
• FromIUPACtoSymbolicNomenclatureforGlycans(SNFG)
• Drawglycopeptides• Drawglycanandpeptidefragmentation• Otherfeatures…
Glycobiology.2017Mar15;27(3):200-205
KaiCheng
Advantages:straightforward,easytoread&write,adequateforcommonuse
IG(-N "b2 171.1" -C
"y8")EADFN[Gal(b1-4)GlcNAc(b1-6 -NR
"366.1")[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)
[Neu5Ac(a2-6 -NR "292.1" -R
"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R
"2077" -U
"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4 -NR
"2157" -R "1486")[Fuc(a1-6)]GlcNAc(b?-?)]RSK
Gal(b1-4)GlcNAc(b1-6)
[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)
[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)
]Man(b1-4)GlcNAc(b1-4)
[Fuc(a1-6)]GlcNAc(b?-?)
Gal(b1-4)GlcNAc(b1-6)
[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)
[Neu5Ac(a2-6 -NR "292.1" -R
"3350")Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(
b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b?-?)
Gal(b1-4)GlcNAc(b1-6)
[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)
[Neu5Ac(a2-6 -NR "292.1" -R
"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R
"2077" -U
"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4)
[Fuc(a1-6)]GlcNAc(b?-?)
IGEADFN[Gal(b1-4)GlcNAc(b1-6 -NR "366.1")
[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)
[Neu5Ac(a2-6 -NR "292.1" -R
"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R
"2077" -U
"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4 -NR
"2157" -R "1486")[Fuc(a1-6)]GlcNAc(b?-?)]RSK
DrawGlycan-SNFG:Renderglycansandglycopeptides
Fragmentationoptions
-Option Representation
1 -R Glycanreducingend
2 -NR Glycannon-reducingend
3 -N PeptidebackboneN-terminus
4 -C PeptidebackboneC-terminus
Monosaccharideoptions
-Option Representation
1 -U Annotateabovemonosac.
2 -D Annotatebelowmonosac.
3 -P Identifyaperpendicularmonosac.
4 -CHAR Introducearbitrarytextorpresentmonosac.intextform
Bondoptions-Option Representation
1 -BOLD Paintglycosidicbondbold
2 -ZIG Paintglycosidicbondzigzag
3 -WAVY Paintglycosidicbondwavy
4 -DASH Paintglycosidicbonddashed
5 -WEDGE Paintglycosidicbondwedge
Repeats,adductsandfuzzyoptions
-Option Representation
1 -RS Repeatingunitstart
2 -RE Repeatingunitends
3 -ADDUCTAddglycanadduct
4 -CURLY Ambiguousassignments/fuzzystructures
DrawGlycan:Web,GUI&Command-lineversion
VirtualGlycome.org/DrawGlycandrawglycan.sourceforge.net
3.GlycoPAT
• Analyzehigh-throughputglycoproteomicsdata
• Comprehensivescoringandfalsediscoveryrate(FDR)calculationalgorithmforMSndataanalysis
MolCellProteomics.16:2032-47,Nov2017.
KaiCheng
GlycoPAT:High-throughputglycoproteomicsanalysis
Tandem mass spectrometry in Orbitrap
Protease Digestion
(Trypsin,
Glu-C etc.)
nanoLC
MS1
Separation based on precursor m/z
MS2
fragmentation m/zà
Intensity
Sialic acid Galactose Mannose GlcNAc GalNAc Fucose
Mono. Sugar
SmallGlyPep:TheminimalrepresentationofglycopeptideforMS
O-glycan
G-V-S-L-M-N-F-T-K o
N-glycan
GVS{n{n{f}{h{s}}}{h{s}}}LM<o>N{n{n{h{h{n{h}}}{h{n{h}}}}}}FTK
O-glycan N-glycan
Linearized to SGP1.0
Symbol to letter
NeuAc Gal Man GlcNAc GalNAc Fuc
s h h n n f
SGP Sugar
n
n h
n n
h h
n h
h
O-glycan
G-V-S-L-M-N-F-T-K o
N-glycan
h n f
s s n
SmallGlyPep:TheminimalrepresentationofglycopeptideforMS
GVS{n{n{f}{h{s}}}{h{s}}}LM<o>N{n{n{h{h{n{h}}}{h{n{h}}}}}}FTK
O-glycan N-glycan
Products: GVS{n{h{s}}}LM<o>N{n{n{h{h{n{h}}}}}}FTK
+ {n{f}{h{s}}} + {h{n{h}}}
Fragmentation
O-glycan
G-V-S-L-M-N-F-T-K o
N-glycan
Linearized to SGP1.0
Symbol to letter
NeuAc Gal Man GlcNAc GalNAc Fuc
s h h n n f
SGP Sugar
n
n h
n n
h h
n h
h
O-glycan
G-V-S-L-M-N-F-T-K o
N-glycan
h n f
s s n
MethodsGlycoPAT:generalworkflow
MethodsGlycoPAT:GUI
GlycoPAT:ensemblescore(ES)andfalsediscoveryrate(FDR)
Crosscorr. %Ionmatch
Top10peaks P-value
EnsembleScoreFalsediscovery
rate
ESforcandidateglycopeptide
ESfordecoyglycopeptide
Falsediscoveryrate
Mass(m/z)
%In
tensity
Quad.II
GlycoPAT:scoringofHCDspectraBasigen|HCD
JProteomeRes.15(10):3904-3915,2016
*Simultaneousbreakageofpeptidebackboneandglycanstructures
GlycoPAT:scoringofCIDspectra
200 400 600 800 1000 1200 1400 1600 1800 20000
20
40
60
80
100
m/z
% In
tens
ity
+4
VVHAVEVALATFNAESNGSYL QLVEISR z = 5
m/zexp = 1176.327 m/ztheor = 1176.313
1306
.195
+3
1522
.594
+3
1644
.695
657.
327
366.
307
or
1397
.787
1468
.437
+3
+4
1233
.618
+4
1610
.451
+2
1325
.145
+4
1118
.389
+5
1073
.921
1741
.736
1874
.146
1766
.325
+3
1371
.69
+3
+3
+2
+3
1590
.663
+3
200 400 600 800 1000 1200 1400 1600 1800 20000
20
40
60
80
100
LCPDCPLLAPLNDSR z = 4 m/zexp = 1151.224 m/ztheor = 1151.208
657.
319
366.
216
or
1236
.204
1316
.278
1437
.795
+3
1645
.542
972.
716
1827
.978
+3
+3
528.
392
+2
+2
+2
+2
1043
.331
+3
1218
.528
+2
1073
.484
+2
1564
.223
1747
.43
+2
1078
.584
+4
1973
.7
+2
1944
.187
m/z %
Inte
nsity
+1
*Analysisofladder-likebreakdownofglycans
Analysisofglycopeptidesinwholeprostatecancerlysates
MolCellProteomics.2015Oct;14(10):2753-63
Workinprogress• Improvingcalculationspeed:FOCUS!!Don’tdoeverythingforeveryone!
Improvespeed:Thefragmentwarehouse
Before Now
RunfragmentAlgorithmfor
all
Candidateglycopeptide(SGP)
Getfragments
RunfragmentAlgorithm
AvailableinWarehouse?
Yes
No
warehouseRetrieve
Insert
Improvedspeed:Selectivescoring
Before Now
Generate25×decoys
Scoreall
Generate5×decoys Rapidscore
Looksgood?
Generate20×moredecoys
STOP!
Ifexpt.monoisotopicmass=theo.mass
No
Workinprogress• Improvingcalculationspeed:FOCUS!!Don’tdoeverythingforeveryone!• Streamliningresultvisualization:SIMPLIFY!!Tellthestoryinpictures!
CombineDrawglycanwithGlycoPAT
ConfirmMS1assignments&calculateAUC
Providemoredetails
Conclusions• GNAT-Web:Buildreactionnetworksefficiently– Usethisforinsilicodeterministicandstochasticsimulations
– Displayofexperimentaldatasets• DrawGlycan-SNFG:EasyandRobust• GlycoPAT:MSdataanalysistoolbox
– Improvecomputationaltime– Integrateanalysisfromdifferentfragmentationmodes
– Testinmorebiomedicalapplications
AcknowledgementsLab members: Anju Kelkar, Ph.D. Virginia del Solar Fernandez, Ph.D. Graduate students Ted Groth Changjie Zhang Kai Cheng Xinheng Yu Yuqi Zhu Yusen Zhou Arezoo Momeni Gabbie Pawlowski Collaborators: Alan Friedman and Jun Qu, Buffalo Anne Dell, Stuart Haslam Imperial College Funding support:
NIGMS: General Medicine
NHLBI Systems Biology Collaborations