Glycoinformatics tools to analyze and curate large scale ... · Glycoinformatics tools to analyze...

Post on 18-Jun-2020

0 views 0 download

transcript

Glycoinformaticstoolstoanalyzeandcuratelargescaleexperimentaldatasets

Sriram Neelamegham

Departments of Chemical & Biological Engineering, Biomedical Engineering and Medicine

State University of New York, Buffalo, NY

9:10-9:50amMarch6,2018Tokyo,Japan

Overviewofresearchinterests

SystemsBiologyInput-outputrelationship

Inflammation

Circulation.2003;107:929-934

Regenerativemedicine

ATVB,25:1321,2005.

Thrombosis

Input-outputresponse

•  TheirgenerationWet-lab:NextGenerationSeq.,LC-MSwithCRISPR-Cas9perturbations

•  Theirvisualization,analysisandsimulationDrylab:LC-MSdataanalysisprograms,Pathwaymaps

VirtualGlycome.org:Systemslevelviewofglycosylation

DNA

RNA

Protein

ktranslation,ENZ ktranslation,SCAFFOLD•  5’-capping&polyAtail•  Codonusage•  Ribosomebindingaffinity•  miRNA•  RNAbindingproteins

GlycoEnzyme

ktranscription•  TranscriptionFactoractivity•  Epigeneticregulation

Endoplasmicreticulum&Golgicisternae

(Cis,Media,Trans,TGN)leadingtosecretory

vesicles

kdegrad.,SCAFF.

kdegrad.,RNA

kdegrad.,ENZ

Feed

backregu

latio

n

kdegrad.,GLYCOP.

•  GolgiTransport•  Glycoenzymelocalization&reactionrates•  Sugar-nucleotidebiosynth.

kglycosylation

Glycoprotein

Neelamegham&MahalCurrOpinStructBiol.40:145-152,2016

Open-sourceintegrationofknowledgeacrossscales

•  GNAT-Web:Glycosylationnetworkanalysistoolbox

•  DrawGlycan-SNFG:SimpletooltoconvertIUPACstringstoSNFGsketches

•  GlycoPAT:High-throughputanalysisofLC-MSndata,withfocusonglycoProteomics

1.GNAT-WebGlycosylationNetworkAnalysisToolbox

•  DefineglycoEnzymesinsilico•  DevelopreactionnetworkfromRNA-SeqamdMSdataprocessing

•  Eventually,simulatereactionnetworkstobridgedataacrossscales

YusenZhou

LiuG,etal.Bioinformatics.24(23):2740-7,2008;Glycobiology.21(12):1541-53,2011;Bioinformatics.29(3):404-6,2013;PLoSOne.9(6):e100939,2014.

XMLbasedglycoenzymedefinitionGlyco-enzyme DB

Enzyme entry

General Information

Related BioPath

Related Database

Enzyme Specificity

Gene Symbol

Protein Name

Enzyme Class

CAZy Family

E.C.No

Reaction

KEGG ORTHOLOGY

KEGG PATHWAY

Organism

Uniprot

NCBI GeneID

GlyMap GeneID

DNA_RefSeq

Protein RefSeq

BRENDA

OMIM number

Glycan Type

Compartment

Substrate

Add/Remove

Preceding

Substrate Constraints

Product Constraints

Kinetic Parameters (MathML) Vm (Max velocity)

Km (Michaelis-Menten constant)

Units

MGAT4A

Enzymerule Value

Add GlcNAc(b1-4)

Substrate ^Man(a1-3)

Preceding GlcNAc(b1-2)

^: Caret is space for inclusion of arbitrary branches

References: a. Bennun et al. PLoS Comput Biol. 9(1):e1002813, 2013; b. Taniguchi, Fukuda, Narimatsu, Angata, Handbook of Glycosyltransferases and Related Genes. c. http://acgg.asia/db/ggdb/(glycogenedatabase)

Enzymespecificity(e.g.MGAT4)

add GlcNAc(β1-4) to substrate ‘^Man(a1-3)’, provided it is preceded by GlcNAc(b1-2)

Constraint: MGAT4 acts before addition of: a.  Galactose (Gal), i.e. Gal cannot exist in string or Gal#0 b.  Bisecting MGAT3, i.e. GlcNAc(b1-4)^Man(b1-4)#0

Maximum # = 0 indicates NOT; But it could be any other number as well

SubstConstraints Value

MaxSubsubst Gal#0&GlcNAc(b1-4)^Man(b1-4)#0

MGAT4A

Enzymerule Value

Add/Remove GlcNAc(b1-4)

Substrate ^Man(a1-3)

Preceding GlcNAc(b1-2)

^: Caret is space for inclusion of arbitrary branches

References: a. Bennun et al. PLoS Comput Biol. 9(1):e1002813, 2013; b. Taniguchi, Fukuda, Narimatsu, Angata, Handbook of Glycosyltransferases and Related Genes. c. http://acgg.asia/db/ggdb/(glycogenedatabase)

Enzymespecificity(e.g.MGAT4)

add GlcNAc(β1-4) to substrate ‘^Man(a1-3)’, provided it is preceded by GlcNAc(b1-2)

Constraint: MGAT4 acts before addition of: a.  Galactose (Gal), i.e. Gal cannot exist in string or Gal#0 b.  Bisecting MGAT3, i.e. GlcNAc(b1-4)^Man(b1-4)#0

Maximum # = 0 indicates NOT; But it could be any other number as well

SubstConstraints Value

MaxSubsubst Gal#0&GlcNAc(b1-4)^Man(b1-4)#0

Integrationintoenzymedatabase

<?xml version="1.0" encoding="UTF-8"?> <EnzymeDB> <Enzyme EnzClass=“…"> <GeneralInfo>…</GeneralInfo> <BioPath>…</BioPath> <RelatedDB>…<RelatedDB> <EnzSpecificity> … </EnzSpecificity> <EnzKinetics>…</ EnzKinetics > </Enzyme> <Enzyme EnzClass =“…"> … </Enzyme> … </EnzymeDB>

XML data structure for glycoenzyme

Example

Custom database generation

Obtain from existing databases

Enzyme specificity &

kinetics

View database elements

Ref:3rdEd.ofEssentialsofGlycobiology

Create pathways : forward

Specify starting material and enzymes

Constraints to limit network size

Enzyme MAN1A1 MAN1B1 MAN2A1 MGAT1 MGAT2 MGAT3 MGAT4A MGAT5 B4GALT1 B3GNT2

Create pathways :reverse

Specify ending glycans & enzymes

Enzyme MAN1A1 MAN1B1 MAN2A1 MGAT1 MGAT2 MGAT3 MGAT4A MGAT5

B4GALT1

Pathway: 4 compartment CSTR

0(CSTR)

Cis-

Dtransp. = 10.8/h

Medial- Trans- TGN

Gly1

Gly2

Speciesbalanceequation:𝑑[ 𝐺𝑙𝑦↓𝑖,𝑗 ]/𝑑𝑡 = 𝐷↓𝑡𝑟𝑎𝑛𝑠𝑝.  ×[𝐺𝑙𝑦↓𝑖,𝑗−1 ]− 𝑉↓𝑖,𝑗 ×[𝐺𝑙𝑦↓𝑖,𝑗 ]/𝑉𝑐×𝐾𝑚↓𝑖,𝑗 ×(1+∑𝑘↑▒[𝐺𝑙𝑦↓𝑘,𝑗 ]∕(𝐾𝑚↓𝑘,𝑗 )  ) − 𝐷↓𝑡𝑟𝑎𝑛𝑠𝑝.  ×[𝐺𝑙𝑦↓𝑖,𝑗 ]

in Reaction out

i=36 glycans j=4 compartment

Gly1,2=8.32 ×10-9 pmol/h.

Vc= 2.5 um3;

In silico simulation

•  Deterministic •  Stochastic

High mannose Bi- Tri- Tetra- Bisecting

Cis- 11.62% 54.64% 4.61% 0.06% 48.38%

Medial- 2.73% 80.26% 8.85% 0.07% 70.33%

Trans- 2.44% 82.61% 9.12% 0.07% 71.11%

TGN 2.43% 82.73% 9.13% 0.07% 71.11%

O-glycosylation

Glycolipid biosynth.

Other pathways

Forwardinference#ofspecies #ofreactions Time

144 300 ~8s405 692 ~30s916 3330 ~70s166 452 ~11s

Reverseinference160 526 ~12s181 611 ~14s356 1246 ~37s96 267 ~6s

Pathway generation times: short!

What else could this be useful for?

*CollaborationwithLaraMahal(NYU)

•  MappingRNA-SeqandGlycomicsdatatocontructpathwaymaps.

•  GlycoMir:TheglycogenemicroRNAtargets•  Functionalintegrationwithotherdatabases

2.DrawGlycan-SNFG

•  FromIUPACtoSymbolicNomenclatureforGlycans(SNFG)

•  Drawglycopeptides•  Drawglycanandpeptidefragmentation•  Otherfeatures…

Glycobiology.2017Mar15;27(3):200-205

KaiCheng

Advantages:straightforward,easytoread&write,adequateforcommonuse

IG(-N "b2 171.1" -C

"y8")EADFN[Gal(b1-4)GlcNAc(b1-6 -NR

"366.1")[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)

[Neu5Ac(a2-6 -NR "292.1" -R

"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R

"2077" -U

"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4 -NR

"2157" -R "1486")[Fuc(a1-6)]GlcNAc(b?-?)]RSK

Gal(b1-4)GlcNAc(b1-6)

[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)

[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)

]Man(b1-4)GlcNAc(b1-4)

[Fuc(a1-6)]GlcNAc(b?-?)

Gal(b1-4)GlcNAc(b1-6)

[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)

[Neu5Ac(a2-6 -NR "292.1" -R

"3350")Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(

b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b?-?)

Gal(b1-4)GlcNAc(b1-6)

[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)

[Neu5Ac(a2-6 -NR "292.1" -R

"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R

"2077" -U

"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4)

[Fuc(a1-6)]GlcNAc(b?-?)

IGEADFN[Gal(b1-4)GlcNAc(b1-6 -NR "366.1")

[Gal(b1-4)GlcNAc(b1-2)]Man(a1-6)

[Neu5Ac(a2-6 -NR "292.1" -R

"3350")Gal(b1-4)GlcNAc(b1-2 -NR "737.2" -R

"2077" -U

"S")Man(a1-3)]Man(b1-4)GlcNAc(b1-4 -NR

"2157" -R "1486")[Fuc(a1-6)]GlcNAc(b?-?)]RSK

DrawGlycan-SNFG:Renderglycansandglycopeptides

Fragmentationoptions

-Option Representation

1 -R Glycanreducingend

2 -NR Glycannon-reducingend

3 -N PeptidebackboneN-terminus

4 -C PeptidebackboneC-terminus

Monosaccharideoptions

-Option Representation

1 -U Annotateabovemonosac.

2 -D Annotatebelowmonosac.

3 -P Identifyaperpendicularmonosac.

4 -CHAR Introducearbitrarytextorpresentmonosac.intextform

Bondoptions-Option Representation

1 -BOLD Paintglycosidicbondbold

2 -ZIG Paintglycosidicbondzigzag

3 -WAVY Paintglycosidicbondwavy

4 -DASH Paintglycosidicbonddashed

5 -WEDGE Paintglycosidicbondwedge

Repeats,adductsandfuzzyoptions

-Option Representation

1 -RS Repeatingunitstart

2 -RE Repeatingunitends

3 -ADDUCTAddglycanadduct

4 -CURLY Ambiguousassignments/fuzzystructures

DrawGlycan:Web,GUI&Command-lineversion

VirtualGlycome.org/DrawGlycandrawglycan.sourceforge.net

3.GlycoPAT

•  Analyzehigh-throughputglycoproteomicsdata

•  Comprehensivescoringandfalsediscoveryrate(FDR)calculationalgorithmforMSndataanalysis

MolCellProteomics.16:2032-47,Nov2017.

KaiCheng

GlycoPAT:High-throughputglycoproteomicsanalysis

Tandem mass spectrometry in Orbitrap

Protease Digestion

(Trypsin,

Glu-C etc.)

nanoLC

MS1

Separation based on precursor m/z

MS2

fragmentation m/zà

Intensity

Sialic acid Galactose Mannose GlcNAc GalNAc Fucose

Mono. Sugar

SmallGlyPep:TheminimalrepresentationofglycopeptideforMS

O-glycan

G-V-S-L-M-N-F-T-K o

N-glycan

GVS{n{n{f}{h{s}}}{h{s}}}LM<o>N{n{n{h{h{n{h}}}{h{n{h}}}}}}FTK

O-glycan N-glycan

Linearized to SGP1.0

Symbol to letter

NeuAc Gal Man GlcNAc GalNAc Fuc

s h h n n f

SGP Sugar

n

n h

n n

h h

n h

h

O-glycan

G-V-S-L-M-N-F-T-K o

N-glycan

h n f

s s n

SmallGlyPep:TheminimalrepresentationofglycopeptideforMS

GVS{n{n{f}{h{s}}}{h{s}}}LM<o>N{n{n{h{h{n{h}}}{h{n{h}}}}}}FTK

O-glycan N-glycan

Products: GVS{n{h{s}}}LM<o>N{n{n{h{h{n{h}}}}}}FTK

+ {n{f}{h{s}}} + {h{n{h}}}

Fragmentation

O-glycan

G-V-S-L-M-N-F-T-K o

N-glycan

Linearized to SGP1.0

Symbol to letter

NeuAc Gal Man GlcNAc GalNAc Fuc

s h h n n f

SGP Sugar

n

n h

n n

h h

n h

h

O-glycan

G-V-S-L-M-N-F-T-K o

N-glycan

h n f

s s n

MethodsGlycoPAT:generalworkflow

MethodsGlycoPAT:GUI

GlycoPAT:ensemblescore(ES)andfalsediscoveryrate(FDR)

Crosscorr. %Ionmatch

Top10peaks P-value

EnsembleScoreFalsediscovery

rate

ESforcandidateglycopeptide

ESfordecoyglycopeptide

Falsediscoveryrate

Mass(m/z)

%In

tensity

Quad.II

GlycoPAT:scoringofHCDspectraBasigen|HCD

JProteomeRes.15(10):3904-3915,2016

*Simultaneousbreakageofpeptidebackboneandglycanstructures

GlycoPAT:scoringofCIDspectra

200 400 600 800 1000 1200 1400 1600 1800 20000

20

40

60

80

100

m/z

% In

tens

ity

+4

VVHAVEVALATFNAESNGSYL QLVEISR z = 5

m/zexp = 1176.327 m/ztheor = 1176.313

1306

.195

+3

1522

.594

+3

1644

.695

657.

327

366.

307

or

1397

.787

1468

.437

+3

+4

1233

.618

+4

1610

.451

+2

1325

.145

+4

1118

.389

+5

1073

.921

1741

.736

1874

.146

1766

.325

+3

1371

.69

+3

+3

+2

+3

1590

.663

+3

200 400 600 800 1000 1200 1400 1600 1800 20000

20

40

60

80

100

LCPDCPLLAPLNDSR z = 4 m/zexp = 1151.224 m/ztheor = 1151.208

657.

319

366.

216

or

1236

.204

1316

.278

1437

.795

+3

1645

.542

972.

716

1827

.978

+3

+3

528.

392

+2

+2

+2

+2

1043

.331

+3

1218

.528

+2

1073

.484

+2

1564

.223

1747

.43

+2

1078

.584

+4

1973

.7

+2

1944

.187

m/z %

Inte

nsity

+1

*Analysisofladder-likebreakdownofglycans

Analysisofglycopeptidesinwholeprostatecancerlysates

MolCellProteomics.2015Oct;14(10):2753-63

Workinprogress•  Improvingcalculationspeed:FOCUS!!Don’tdoeverythingforeveryone!

Improvespeed:Thefragmentwarehouse

Before Now

RunfragmentAlgorithmfor

all

Candidateglycopeptide(SGP)

Getfragments

RunfragmentAlgorithm

AvailableinWarehouse?

Yes

No

warehouseRetrieve

Insert

Improvedspeed:Selectivescoring

Before Now

Generate25×decoys

Scoreall

Generate5×decoys Rapidscore

Looksgood?

Generate20×moredecoys

STOP!

Ifexpt.monoisotopicmass=theo.mass

No

Workinprogress•  Improvingcalculationspeed:FOCUS!!Don’tdoeverythingforeveryone!•  Streamliningresultvisualization:SIMPLIFY!!Tellthestoryinpictures!

CombineDrawglycanwithGlycoPAT

ConfirmMS1assignments&calculateAUC

Providemoredetails

Conclusions•  GNAT-Web:Buildreactionnetworksefficiently– Usethisforinsilicodeterministicandstochasticsimulations

– Displayofexperimentaldatasets•  DrawGlycan-SNFG:EasyandRobust•  GlycoPAT:MSdataanalysistoolbox

–  Improvecomputationaltime–  Integrateanalysisfromdifferentfragmentationmodes

– Testinmorebiomedicalapplications

AcknowledgementsLab members: Anju Kelkar, Ph.D. Virginia del Solar Fernandez, Ph.D. Graduate students Ted Groth Changjie Zhang Kai Cheng Xinheng Yu Yuqi Zhu Yusen Zhou Arezoo Momeni Gabbie Pawlowski Collaborators: Alan Friedman and Jun Qu, Buffalo Anne Dell, Stuart Haslam Imperial College Funding support:

NIGMS: General Medicine

NHLBI Systems Biology Collaborations