1 Bioinformatics in Metabolomics Shigehiko Kanaya NAra Institute of Science and Technology Graduate...

1

Bioinformatics in MetabolomicsShigehiko KanayaNAra Institute of Science and TechnologyGraduate School of Information Science; Comparative Genomics Lab.

[1] Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS

[2] Bio-Database developed by our lab.2.1 Species-metabolite relation database (KNApSAcK)2.2 Easy Gene Classifier to Functional Group

2

Data Processing from data acqisition of a time series experiment to description of cellular conditions

0.1

1

10

0 200 400 600 800

Time (min)

OD

600

T1T2

T3T4

T5T6 T7 T8

(a) Time series experiments

MM+1

M/2

Metabolite-derivative group (Isotope ions and multivalent ions)

(e) Assessment of cellular condition by metabolite composition

sM

Mk

Mk

ss

j

j

x

xx

xx

xx

xx

xxx

.............

..................

........

..........

..........

....................

..........

.....

22

11

21

221

11211

m/z

Tim

e p

oin

t

(b) Data preprocessing and constructing data matrix

(d) Annotation of ions as metabolites

(c) Classification of ions into metabolite-derivative group

Detectedm/z

Theoreticalm/z

Molecular formula

Exact mass Error Candidate Species

72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli

143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli

662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli

664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli

.....

..........

..........

.....

..... ..........

.......... .....

.....

.....

.....

.....

..........

.....

.....

.....

DrDMASS

DPClus

KNApSAcK DB

3

FT-ICR/MS(Fourier transform ion cyclotron resonance MS)

FT-ICR/MS canoffer extremely high levels of resolution and sensitivity.

Metabolomics research has been performed byGC-MS, LC-MS, CE-MS, NMR.

High accurate mass

Assign to molecular formula

i.e.) Experimetal m/z = 662.1037

NAD+ theoretical m/z = 662.1019 (Δ= 0.0018, 2.7ppm)

4

0

600

±1 ±0.1 ±0.01 ±0.001

Error

# of

can

dida

tes

C10H10O6 MW:226.0477380528

251

323

Chorismic acid Isochorismic acid

597

Predicted number of molecular formula by high accurate mass

5

Data set (E.coli time-series)

0 200 400 600 800

E.coli W3110

LB medium 2000ml

Agitation 300rpm

Air 2l/h

Temperature 37 ℃

Start pH 7.4

0.001

10

1

0.1

0.01

Conditions

OD

600

Time (min)

T1

T3 T4T5

T6 T7 T8

T2

･ Collect cells by membrane filter･ Extract metabolites by methanol

8 time points

6

DrDMASS+

(i)

(ii)

(iii) (iv)

Sample A1

Sample B3

Sample B2

Sample A2

Sample A3

Sample B1

(ii) Multivariate data processing (i) Peak Correction

NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211

Sample B3

Sample A2

Sample A1

m/z

Sample A3

Sample B2

Sample B1

(iii) Unsupervised learning PCA, BL-SOM

(iv) Supervised learning PLS

KNApSAcK

Bioinformatics

KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206

7

DrDMASS+

(i)

(ii)

(iii) (iv)

Sample A1

Sample B3

Sample B2

Sample A2

Sample A3

Sample B1


NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211

Sample B3

Sample A2

Sample A1

m/z

Sample A3

Sample B2

Sample B1



KNApSAcK

(i) Peak correction (ii) Multivariate data processing


8

Oikawa et al, (2006)

Scan 2

Scan 1

Scan 3

Scan 2

Scan 1

(i) m/z correction among each scan(ii) Peak matching among 10scans (multivariate data processing)

IMC peakMetabolite

peak

Scan 3

0

0.5×10-7

1.0×10-6

3.5×10-6

2.5×10-6

2.0×10-6

1.5×10-6

Peak1 Peak2 Peak3 IMC Peak4

Coe

ffic

ien

t of

var

iati

on (

CV

)

Before correcting

After correcting

(i) Peak correction (ii) Multivariate data processing

Validation of data processing (b) Data preprocessing and constructing data matrix

9

time

10

1-1

1-2

1-3

1-4,5

1-6

2-1

2-2

2-3

3

45

6

78

9

10

11

PG5

PG7

PG9 PG3

PG1

PG6

PG2

PG4

PG10

PG8

M-1

M-2 M-3

M-4

M-5

M-6

M-7

M-8

M-9M-10

M-11

M-12

M-13

M-14

M-15

M-16

M-17

(c) Classification of ions into metabolite-derivative group (DPClus)

11

(d) Annotation of ions as metabolites using KNApSAcK DB

Detectedm/za

Theoreticalm/z

Molecular formula

Exact mass Error Candidate Species

72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli

143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli

253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius

253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius

281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli

C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum

C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius

297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius



321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12

346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli

C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli

C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli

401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli

402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli

426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli

C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli

C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli

454.0391 455.0464 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18

C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18

458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16

495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov.

C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov.

505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli

547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli

565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli

C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli

606.0775 607.0848 C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli

C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli

618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 ADP-L-glycero-beta-D-manno-heptopyranose

Escherichia coli

662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli

664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli

741.4729 742.4801 C32H62N12O8 742.4814 0.0012 Argimicin A Sphingomonas sp.

786.4712 787.4785 C41H65N5O10 787.4731 0.0054 BE 32030B Nocardia sp. A32030

853.3166 854.3239 C41H46N10O9S 854.3170 0.0069 Argyrin G Archangium gephyra Ar 8082

C45H56Cl2N2O10 854.3312 0.0073 Decatromicin B Actinomadura sp. MK73-NF4

C39H50N8O12S 854.3269 0.0030 Napsamycin C Streptomyces sp. HIL Y-82,11372

12

DrDMASS+

(i)

(ii)

(iii) (iv)

Sample A1

Sample B3

Sample B2

Sample A2

Sample A3

Sample B1


NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211

Sample B3

Sample A2

Sample A1

m/z

Sample A3

Sample B2

Sample B1



KNApSAcK

(iii) Unsupervised learning (PCA)


13

-4.0

0.0

2.0

-8.0 0.0 PC1 (94.3%)

PC

2 (2

.4%

) T8

T7

T6

T2

T3

T4T1

T5

OD

600

0.01

0.1

1

10

0 200 400 600 800

T1

T2T3

T4T5

T6 T7 T8

Time (min)

PCA analysis

12.0

Metabolic profiling could distinguish between exponential and stationary phases.(220 independent ions)

Which metabolite is representative at each stage?

220 dims. → 2 dims.

Exponential-phase Stationary-phase

(iii) Unsupervised learning (PCA)

The first two principal components, which can explain 96.7% of total variance, are enough to examine the differences in 8 time points. （ SUM=1 ）

14

DrDMASS+

(i)

(ii)

(iii) (iv)

Sample A1

Sample B3

Sample B2

Sample A2

Sample A3

Sample B1


NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211

Sample B3

Sample A2

Sample A1

m/z

Sample A3

Sample B2

Sample B1



KNApSAcK

(iv) Supervised learning


15

PLS XY

factors/predictors

Ob

servations

responses

N=8

M=220K=1

N=8

PLS - Is supervised regression method. - Can extract important combinations of variables. - Can work with many responses.

PLS(Partial Least Squares) (iv) Supervised learning

OD600 = a1 x1 +…+ aj xj +….+ aM xM

xj, the quantity for jth

We tried to estimate the cell condition based on a function of the composition of metabolites.

16

Partial Least Square Modeling

Ny

y

y

y

...

...

2

2

1

NMNjNN

iMijii

Mj

Mj

xxxx

xxxx

xxxx

xxxx

......

..................

......

..................

......

......

21

21

222221

111211

-------- Time ------

OD600 ----- Metabolite quantity data -----

y = a0 x0 ＋ a1 x1 +…+ aj xj +….+ aM xM

01),(1

21

1 1

M

j

M

k

N

iikik

kk

wxyww

Gw

w

Optimization of wk for correration between x ｋ and y

2

1 1

1

M

k

N

iiki

N

iiki

k

xy

xyw

17

Partial Least Square Modeling

qW)W(Pa 1 t

2

1 1

1

M

k

N

iiki

N

iiki

k

xy

xyw

Xwt 1

Minimization of square of error.

y = a0 x0 ＋ a1 x1 +…+ aj xj +….+ aM xM

Minimization of square of error.

18

Advantages of PLS

y = a0 x0 ＋ a1 x1 +…+ aj xj +….+ aM xM

01

),(

1

21

1 1

M

j

M

k

N

iikik

k

k

wxyww

Gw

w

0)....(

),(

1

212211

N

iiMii

k

k

xaxaxaya

Ga

a

PLS MRA(重回帰）

# of samples << # of variables # of samples > # of variablesx Correlation of variables

Ny

y

y

y

...

...

2

2

1

NMNjNN

iMijii

Mj

Mj

xxxx

xxxx

xxxx

xxxx

......

..................

......

..................

......

......

21

21

222221

111211

19

OD600 = a1 x1 +…+ aj xj +….+ aM xM

aj > 0, stationary-phase dominant metabolites

xj , the quantity for jth

aj < 0, exponential-phase dominant metabolites

Observed OD600 value

Pre

dic

ted

OD

600 v

alu

e

0.0 5.0

5.0

0.0

r = 0.97

T1T2

T3

T4

T5

T6

T7

T8

PLS regression modeling

Our constructed model - Could work well because of r = 0.97. - Is informative to clarify the relation between a growth stage and metabolic profile.

20

0.1

0.0

aj

-0.15

Stationary-phase dominantExponential-phase dominant

OD600 = a1 x1 +…+ aj xj +….+ aM xM

aj > 0, stationary phase-dominant metabolites


aj < 0, exponential phase-dominant metabolites

Coefficients in the constructed model

The ions with the negative and positive coefficients contribute to the constructed model, negatively and positively, and are dominant in exponential and stationary phase, respectively.

21

DrDMASS+

(i)

(ii)

(iii) (iv)

Sample A1

Sample B3

Sample B2

Sample A2

Sample A3

Sample B1


NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211

Sample B3

Sample A2

Sample A1

m/z

Sample A3

Sample B2

Sample B1



KNApSAcK

KNApSAcK search


22

0.1

0.0

ajUDP-glucose, UDP-galactose

NAD

Parasperone A

UDP-N-acetyl-D-glucosamineUDP-N-acetyl-D-mannosamine

ADP, Adenosine 3',5'-bisphosphate, dGDP

UDP

omega-Cycloheptyl-alpha-hydroxyundecanoate

Octanoic aciddTMP, dGMP, 3'-AMP

NADH

Argyrin G

dTDP

ATP, dGTP

Lenthionine

omega-CycloheptylnonanoatedTDP-6-deoxy-L-mannoseomega-Cycloheptylundecanoate, cis-11-Octadecanoic acid

ADP-(D,L)-glycero-D-manno-heptose

Glyoxylate

719.4868 (PG1)

761.5293 (PG4)

747.5183 (PG3)

733.5056 (PG2)

omega-Cycloheptyl-alpha-hydroxyundecanoate

-0.15

Stationary-phase dominantExponential-phase dominant

OD600 = a1 x1 +…+ aj xj +….+ aM xM

aj > 0, stationary phase-dominant metabolites


aj < 0, exponential phase-dominant metabolites

Coefficients in the constructed model

Red: E.coli metabolitesBlack: Other bacterial metabolites

MS/MS analyses

MS/MS analyses

23

100

0

Ion

inte

nsit

y

253.2181[R2O]-

255.2337[R1O]- 391.2260

[M-C3H6O2 - H - R2OH]-

465.2628[M - H - R2OH]-

483.2735[M - R2]-

719.4868[M -H]-

255.2338[R1O]-

267.2339[R2O]-

391.2268[M-C3H6O2 - H - R2OH]-

465.2639[M - H - R2OH]-

483.2741[M - R2]-

733.5056[M - H]-

255.2345[R1O]-

281.2502[R2O]-

391.2281[M-C3H6O2 - H - R2OH]-

465.2659[M - H - R2OH]-

483.2744[M - R2]-

747.5183[M - H]-

C

O

C15H31R1= R2= C

O

C15H29

C

O

C15H31R1= R2=

C

O

C15H31R1= R2=

C

O

C16H31

C

O

C17H33

MS/MS analyses

100 200 300 400 500 600 700 800 m/z

255.2342[R1O]-

295.2654[R2O]-

391.2271[M-C3H6O2 - H - R2OH]-

465.2651[M - H - R2OH]-

483.2772[M - R2]-

761.5293[M -H]-

C

O

C15H31R1= R2= C

O

C18H35

100

0

Ion

inte

nsit

y

100

0

Ion

inte

nsit

y

100

0

Ion

inte

nsit

y

719.4868 (PG1)

761.5293 (PG4)

747.5183 (PG3)

733.5056 (PG2)

24

Summary of phosphatidylglycerols detected in this study

C

O

C15H31

C

O

C15H29PG1

ID Combination of three substructures (X1, X2, X3)

PG2

PG3

PG4

C

O

C16H31

C

O

C17H33

C

O

C18H35

P

O

OOH

CH2 CHOH CH2OH

(b) Relation of mass differences among PG1 to 10

(a) Elucidated structures (PG1 to PG4)

PG530:1(14:0,16:1)

PG132:1(16:0,16:1)

PG334:1(16:0,18:1)

PG631:0(14:0,c17:0)

PG233:0(16:0,c17:0)

PG434:5(16:0,c19:0)

PG734:2(16:1,18:1)

PG936:2(18:1,18:1)

PG835:1(16:1,c19:1)

PG1037:1(18:1,c19:0)

(Cluster 1)

28.0281

14.0170

(Cluster 2)

14.0187 14.0110

14.0181

28.0315

28.0298 28.0237

2.0138

2.0051

28.0330

28.0314

14.0197

CH

CH2

CH2 O

O

O X3

X2

X1

CFA CFA CFA

CFA CFA∆(CH2)2

US

US

∆(CH2)2

∆(CH2)2

∆(CH2)2

∆(CH2)2

∆(CH2)2

25

Cyclopropane fatty acid (CFA) formation

O

O C15H31

O

O

OX3

O

O C15H31

O

O

OX3

O

O C15H31

O

O

OX3

O

O C15H31

O

O

OX3

PG1

PG2

PG3

PG4

T1 T2 T3 T4 T5 T6 T7 T84.0

0.0

-8.0

PG2/PG1

PG4/PG3 CFA formation occurs as the cells enter into stationary phase.

Rat

io o

f re

lati

ve io

n in

ten

sity

Constructed model using PLS regression would be useful for extracting of characteristic variables. CFA formation of PGs occurs, as E.coli enters stationary phase.

26

[1] Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS

[2] Bio-Database developed by our lab.2.1 Species-metabolite relation database (KNApSAcK)2.2 Easy Gene Classifier to Functional Group

27

[1]KNApSAcK

28

KNApSAcK link versionhttp://kanaya.naist.jp/knapsack_jsp/top.html

http://kanaya.naist.jp/knapsack_jsp/top.html

29

KNApSAcK （ http:/kanaya.naist.jp/KNApSAcK ）(Since 2004)

Authors who utilize KNApSAcK DB ( Thanks!)Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) 　　 (Red, in Japan)Takahashi, H., Anal. Bioanal Chem. (in press) (2008)Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008)Iijima, Y., et al., Plant J., 54, 949-962, (2008)Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008)Dunn, W.B., Physical Biol., 1-24, 5, (2008)Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007)Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007)Ohta, D., et al., Anal.Biol. Chem.(2007)Nakamura, Y., et al., Planta, (2007)Suzuki, H., et al., Phytochemistry, (2007)Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007)Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007)Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007)Gaida, A., and Neumann, S., J. Int. Bioinf., (2007)Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006)Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006)Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006)(WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis ）　 http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/(KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack（ LECO 社マニュアル）

http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/

http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack

30

http://en.wikibooks.org/wiki/Metabolomics/Databases

31

Linked by KEGG DBhttp://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack

32

KNApSAcK – Lupin Alkaloidshttp://kanaya.naist.jp/knapsack_jsp/lupin/top.html

http://kanaya.naist.jp/knapsack_jsp/lupin/top.html

33

[2] Other DB developed in our groupFunction annotation DB for Arabidopsis thaliana

http://kanaya.naist.jp/arabidopsis/top.jsp

Functional annotations 　　　 14502 genesCellular Localization inf. 　　 2242 genes

http://kanaya.naist.jp/arabidopsis/top.jsp

34

Categorization of genes into functional classes

35

Categorization of gene pairs into pairs of functional classes

36

[3] DB for Edible Organisms http://kanaya.naist.jp/LunchBox/top.jsp

http://kanaya.naist.jp/LunchBox/top.jsp

37

Allium cepa

Link to KNApSAcK 　 DB

38

Time series change of total number of detected ions

0.1

1

10

0 800

Time (min)

OD600

Number of detected ions

120

01.0

0.01.0

0.01.0

0.0

1.0

0.0

1.0

0.0

Relative ion intensity

T1T2

T3T4

T5T6 T7 T8

Cluster 5

Cluster 3

Cluster 1

Cluster 2

Cluster 4

(a)

(b)

(c)

Date post:	20-Dec-2015
Category:	Documents
View:	214 times
Download:	0 times

1 Bioinformatics in Metabolomics Shigehiko Kanaya NAra Institute of Science and Technology Graduate...

Documents