Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
1
Bioinformatics in MetabolomicsShigehiko KanayaNAra Institute of Science and TechnologyGraduate School of Information Science; Comparative Genomics Lab.
[1] Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS
[2] Bio-Database developed by our lab.2.1 Species-metabolite relation database (KNApSAcK)2.2 Easy Gene Classifier to Functional Group
2
Data Processing from data acqisition of a time series experiment to description of cellular conditions
0.1
1
10
0 200 400 600 800
Time (min)
OD
600
T1T2
T3T4
T5T6 T7 T8
(a) Time series experiments
MM+1
M/2
Metabolite-derivative group (Isotope ions and multivalent ions)
(e) Assessment of cellular condition by metabolite composition
sM
Mk
Mk
ss
j
j
x
xx
xx
xx
xx
xxx
.............
..................
........
..........
..........
....................
..........
.....
22
11
21
221
11211
m/z
Tim
e p
oin
t
(b) Data preprocessing and constructing data matrix
(d) Annotation of ions as metabolites
(c) Classification of ions into metabolite-derivative group
Detectedm/z
Theoreticalm/z
Molecular formula
Exact mass Error Candidate Species
72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli
143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli
662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli
664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli
.....
..........
..........
.....
..... ..........
.......... .....
.....
.....
.....
.....
..........
.....
.....
.....
DrDMASS
DPClus
KNApSAcK DB
3
FT-ICR/MS(Fourier transform ion cyclotron resonance MS)
FT-ICR/MS canoffer extremely high levels of resolution and sensitivity.
Metabolomics research has been performed byGC-MS, LC-MS, CE-MS, NMR.
High accurate mass
Assign to molecular formula
i.e.) Experimetal m/z = 662.1037
NAD+ theoretical m/z = 662.1019 (Δ= 0.0018, 2.7ppm)
4
0
600
±1 ±0.1 ±0.01 ±0.001
Error
# of
can
dida
tes
C10H10O6 MW:226.0477380528
251
323
Chorismic acid Isochorismic acid
597
Predicted number of molecular formula by high accurate mass
5
Data set (E.coli time-series)
0 200 400 600 800
E.coli W3110
LB medium 2000ml
Agitation 300rpm
Air 2l/h
Temperature 37 ℃
Start pH 7.4
0.001
10
1
0.1
0.01
Conditions
OD
600
Time (min)
T1
T3 T4T5
T6 T7 T8
T2
・ Collect cells by membrane filter・ Extract metabolites by methanol
8 time points
6
DrDMASS+
(i)
(ii)
(iii) (iv)
Sample A1
Sample B3
Sample B2
Sample A2
Sample A3
Sample B1
(ii) Multivariate data processing (i) Peak Correction
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211
Sample B3
Sample A2
Sample A1
m/z
Sample A3
Sample B2
Sample B1
(iii) Unsupervised learning PCA, BL-SOM
(iv) Supervised learning PLS
KNApSAcK
Bioinformatics
KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206
7
DrDMASS+
(i)
(ii)
(iii) (iv)
Sample A1
Sample B3
Sample B2
Sample A2
Sample A3
Sample B1
(ii) Multivariate data processing (i) Peak Correction
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211
Sample B3
Sample A2
Sample A1
m/z
Sample A3
Sample B2
Sample B1
(iii) Unsupervised learning PCA, BL-SOM
(iv) Supervised learning PLS
KNApSAcK
(i) Peak correction (ii) Multivariate data processing
KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206
8
Oikawa et al, (2006)
Scan 2
Scan 1
Scan 3
Scan 2
Scan 1
(i) m/z correction among each scan(ii) Peak matching among 10scans (multivariate data processing)
IMC peakMetabolite
peak
Scan 3
0
0.5×10-7
1.0×10-6
3.5×10-6
2.5×10-6
2.0×10-6
1.5×10-6
Peak1 Peak2 Peak3 IMC Peak4
Coe
ffic
ien
t of
var
iati
on (
CV
)
Before correcting
After correcting
(i) Peak correction (ii) Multivariate data processing
Validation of data processing (b) Data preprocessing and constructing data matrix
9
time
10
1-1
1-2
1-3
1-4,5
1-6
2-1
2-2
2-3
3
45
6
78
9
10
11
PG5
PG7
PG9 PG3
PG1
PG6
PG2
PG4
PG10
PG8
M-1
M-2 M-3
M-4
M-5
M-6
M-7
M-8
M-9M-10
M-11
M-12
M-13
M-14
M-15
M-16
M-17
(c) Classification of ions into metabolite-derivative group (DPClus)
11
(d) Annotation of ions as metabolites using KNApSAcK DB
Detectedm/za
Theoreticalm/z
Molecular formula
Exact mass Error Candidate Species
72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli
143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli
253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius
253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius
281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli
C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum
C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius
297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
297.2467 298.2540 C18H34O3 298.2508 0.0032 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
297.2516 298.2589 C18H34O3 298.2508 0.0081 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12
346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli
C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli
C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli
401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli
402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli
426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli
C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli
C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli
454.0391 455.0464 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18
C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18
458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16
495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov.
C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov.
505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli
547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli
565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli
C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli
606.0775 607.0848 C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli
C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli
618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 ADP-L-glycero-beta-D-manno-heptopyranose
Escherichia coli
662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli
664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli
741.4729 742.4801 C32H62N12O8 742.4814 0.0012 Argimicin A Sphingomonas sp.
786.4712 787.4785 C41H65N5O10 787.4731 0.0054 BE 32030B Nocardia sp. A32030
853.3166 854.3239 C41H46N10O9S 854.3170 0.0069 Argyrin G Archangium gephyra Ar 8082
C45H56Cl2N2O10 854.3312 0.0073 Decatromicin B Actinomadura sp. MK73-NF4
C39H50N8O12S 854.3269 0.0030 Napsamycin C Streptomyces sp. HIL Y-82,11372
12
DrDMASS+
(i)
(ii)
(iii) (iv)
Sample A1
Sample B3
Sample B2
Sample A2
Sample A3
Sample B1
(ii) Multivariate data processing (i) Peak Correction
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211
Sample B3
Sample A2
Sample A1
m/z
Sample A3
Sample B2
Sample B1
(iii) Unsupervised learning PCA, BL-SOM
(iv) Supervised learning PLS
KNApSAcK
(iii) Unsupervised learning (PCA)
KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206
13
-4.0
0.0
2.0
-8.0 0.0 PC1 (94.3%)
PC
2 (2
.4%
) T8
T7
T6
T2
T3
T4T1
T5
OD
600
0.01
0.1
1
10
0 200 400 600 800
T1
T2T3
T4T5
T6 T7 T8
Time (min)
PCA analysis
12.0
Metabolic profiling could distinguish between exponential and stationary phases.(220 independent ions)
Which metabolite is representative at each stage?
220 dims. → 2 dims.
Exponential-phase Stationary-phase
(iii) Unsupervised learning (PCA)
The first two principal components, which can explain 96.7% of total variance, are enough to examine the differences in 8 time points. ( SUM=1 )
14
DrDMASS+
(i)
(ii)
(iii) (iv)
Sample A1
Sample B3
Sample B2
Sample A2
Sample A3
Sample B1
(ii) Multivariate data processing (i) Peak Correction
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211
Sample B3
Sample A2
Sample A1
m/z
Sample A3
Sample B2
Sample B1
(iii) Unsupervised learning PCA, BL-SOM
(iv) Supervised learning PLS
KNApSAcK
(iv) Supervised learning
KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206
15
PLS XY
factors/predictors
Ob
servations
responses
N=8
M=220K=1
N=8
PLS - Is supervised regression method. - Can extract important combinations of variables. - Can work with many responses.
PLS(Partial Least Squares) (iv) Supervised learning
OD600 = a1 x1 +…+ aj xj +….+ aM xM
xj, the quantity for jth
We tried to estimate the cell condition based on a function of the composition of metabolites.
16
Partial Least Square Modeling
Ny
y
y
y
...
...
2
2
1
NMNjNN
iMijii
Mj
Mj
xxxx
xxxx
xxxx
xxxx
......
..................
......
..................
......
......
21
21
222221
111211
-------- Time ------
OD600 ----- Metabolite quantity data -----
y = a0 x0 + a1 x1 +…+ aj xj +….+ aM xM
01),(1
21
1 1
M
j
M
k
N
iikik
kk
wxyww
Gw
w
Optimization of wk for correration between x k and y
2
1 1
1
M
k
N
iiki
N
iiki
k
xy
xyw
17
Partial Least Square Modeling
qW)W(Pa 1 t
2
1 1
1
M
k
N
iiki
N
iiki
k
xy
xyw
Xwt 1
Minimization of square of error.
y = a0 x0 + a1 x1 +…+ aj xj +….+ aM xM
Minimization of square of error.
18
Advantages of PLS
y = a0 x0 + a1 x1 +…+ aj xj +….+ aM xM
01
),(
1
21
1 1
M
j
M
k
N
iikik
k
k
wxyww
Gw
w
0)....(
),(
1
212211
N
iiMii
k
k
xaxaxaya
Ga
a
PLS MRA(重回帰)
# of samples << # of variables # of samples > # of variablesx Correlation of variables
Ny
y
y
y
...
...
2
2
1
NMNjNN
iMijii
Mj
Mj
xxxx
xxxx
xxxx
xxxx
......
..................
......
..................
......
......
21
21
222221
111211
19
OD600 = a1 x1 +…+ aj xj +….+ aM xM
aj > 0, stationary-phase dominant metabolites
xj , the quantity for jth
aj < 0, exponential-phase dominant metabolites
Observed OD600 value
Pre
dic
ted
OD
600 v
alu
e
0.0 5.0
5.0
0.0
r = 0.97
T1T2
T3
T4
T5
T6
T7
T8
PLS regression modeling
Our constructed model - Could work well because of r = 0.97. - Is informative to clarify the relation between a growth stage and metabolic profile.
20
0.1
0.0
aj
-0.15
Stationary-phase dominantExponential-phase dominant
OD600 = a1 x1 +…+ aj xj +….+ aM xM
aj > 0, stationary phase-dominant metabolites
xj , the quantity for jth
aj < 0, exponential phase-dominant metabolites
Coefficients in the constructed model
The ions with the negative and positive coefficients contribute to the constructed model, negatively and positively, and are dominant in exponential and stationary phase, respectively.
21
DrDMASS+
(i)
(ii)
(iii) (iv)
Sample A1
Sample B3
Sample B2
Sample A2
Sample A3
Sample B1
(ii) Multivariate data processing (i) Peak Correction
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211
Sample B3
Sample A2
Sample A1
m/z
Sample A3
Sample B2
Sample B1
(iii) Unsupervised learning PCA, BL-SOM
(iv) Supervised learning PLS
KNApSAcK
KNApSAcK search
KNApSAcK search# of metabolites 20752# of species-metabolite pairs41206
22
0.1
0.0
ajUDP-glucose, UDP-galactose
NAD
Parasperone A
UDP-N-acetyl-D-glucosamineUDP-N-acetyl-D-mannosamine
ADP, Adenosine 3',5'-bisphosphate, dGDP
UDP
omega-Cycloheptyl-alpha-hydroxyundecanoate
Octanoic aciddTMP, dGMP, 3'-AMP
NADH
Argyrin G
dTDP
ATP, dGTP
Lenthionine
omega-CycloheptylnonanoatedTDP-6-deoxy-L-mannoseomega-Cycloheptylundecanoate, cis-11-Octadecanoic acid
ADP-(D,L)-glycero-D-manno-heptose
Glyoxylate
719.4868 (PG1)
761.5293 (PG4)
747.5183 (PG3)
733.5056 (PG2)
omega-Cycloheptyl-alpha-hydroxyundecanoate
-0.15
Stationary-phase dominantExponential-phase dominant
OD600 = a1 x1 +…+ aj xj +….+ aM xM
aj > 0, stationary phase-dominant metabolites
xj , the quantity for jth
aj < 0, exponential phase-dominant metabolites
Coefficients in the constructed model
Red: E.coli metabolitesBlack: Other bacterial metabolites
MS/MS analyses
MS/MS analyses
23
100
0
Ion
inte
nsit
y
253.2181[R2O]-
255.2337[R1O]- 391.2260
[M-C3H6O2 - H - R2OH]-
465.2628[M - H - R2OH]-
483.2735[M - R2]-
719.4868[M -H]-
255.2338[R1O]-
267.2339[R2O]-
391.2268[M-C3H6O2 - H - R2OH]-
465.2639[M - H - R2OH]-
483.2741[M - R2]-
733.5056[M - H]-
255.2345[R1O]-
281.2502[R2O]-
391.2281[M-C3H6O2 - H - R2OH]-
465.2659[M - H - R2OH]-
483.2744[M - R2]-
747.5183[M - H]-
C
O
C15H31R1= R2= C
O
C15H29
C
O
C15H31R1= R2=
C
O
C15H31R1= R2=
C
O
C16H31
C
O
C17H33
MS/MS analyses
100 200 300 400 500 600 700 800 m/z
255.2342[R1O]-
295.2654[R2O]-
391.2271[M-C3H6O2 - H - R2OH]-
465.2651[M - H - R2OH]-
483.2772[M - R2]-
761.5293[M -H]-
C
O
C15H31R1= R2= C
O
C18H35
100
0
Ion
inte
nsit
y
100
0
Ion
inte
nsit
y
100
0
Ion
inte
nsit
y
719.4868 (PG1)
761.5293 (PG4)
747.5183 (PG3)
733.5056 (PG2)
24
Summary of phosphatidylglycerols detected in this study
C
O
C15H31
C
O
C15H29PG1
ID Combination of three substructures (X1, X2, X3)
PG2
PG3
PG4
C
O
C16H31
C
O
C17H33
C
O
C18H35
P
O
OOH
CH2 CHOH CH2OH
(b) Relation of mass differences among PG1 to 10
(a) Elucidated structures (PG1 to PG4)
PG530:1(14:0,16:1)
PG132:1(16:0,16:1)
PG334:1(16:0,18:1)
PG631:0(14:0,c17:0)
PG233:0(16:0,c17:0)
PG434:5(16:0,c19:0)
PG734:2(16:1,18:1)
PG936:2(18:1,18:1)
PG835:1(16:1,c19:1)
PG1037:1(18:1,c19:0)
(Cluster 1)
28.0281
14.0170
(Cluster 2)
14.0187 14.0110
14.0181
28.0315
28.0298 28.0237
2.0138
2.0051
28.0330
28.0314
14.0197
CH
CH2
CH2 O
O
O X3
X2
X1
CFA CFA CFA
CFA CFA∆(CH2)2
US
US
∆(CH2)2
∆(CH2)2
∆(CH2)2
∆(CH2)2
∆(CH2)2
25
Cyclopropane fatty acid (CFA) formation
O
O C15H31
O
O
OX3
O
O C15H31
O
O
OX3
O
O C15H31
O
O
OX3
O
O C15H31
O
O
OX3
PG1
PG2
PG3
PG4
T1 T2 T3 T4 T5 T6 T7 T84.0
0.0
-8.0
PG2/PG1
PG4/PG3 CFA formation occurs as the cells enter into stationary phase.
Rat
io o
f re
lati
ve io
n in
ten
sity
Constructed model using PLS regression would be useful for extracting of characteristic variables. CFA formation of PGs occurs, as E.coli enters stationary phase.
26
[1] Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS
[2] Bio-Database developed by our lab.2.1 Species-metabolite relation database (KNApSAcK)2.2 Easy Gene Classifier to Functional Group
27
[1]KNApSAcK
28
KNApSAcK link versionhttp://kanaya.naist.jp/knapsack_jsp/top.html
29
KNApSAcK ( http:/kanaya.naist.jp/KNApSAcK )(Since 2004)
Authors who utilize KNApSAcK DB ( Thanks!)Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) (Red, in Japan)Takahashi, H., Anal. Bioanal Chem. (in press) (2008)Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008)Iijima, Y., et al., Plant J., 54, 949-962, (2008)Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008)Dunn, W.B., Physical Biol., 1-24, 5, (2008)Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007)Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007)Ohta, D., et al., Anal.Biol. Chem.(2007)Nakamura, Y., et al., Planta, (2007)Suzuki, H., et al., Phytochemistry, (2007)Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007)Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007)Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007)Gaida, A., and Neumann, S., J. Int. Bioinf., (2007)Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006)Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006)Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006)(WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis ) http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/(KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack( LECO 社マニュアル)
30
http://en.wikibooks.org/wiki/Metabolomics/Databases
31
Linked by KEGG DBhttp://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack
32
KNApSAcK – Lupin Alkaloidshttp://kanaya.naist.jp/knapsack_jsp/lupin/top.html
33
[2] Other DB developed in our groupFunction annotation DB for Arabidopsis thaliana
http://kanaya.naist.jp/arabidopsis/top.jsp
Functional annotations 14502 genesCellular Localization inf. 2242 genes
34
Categorization of genes into functional classes
35
Categorization of gene pairs into pairs of functional classes
36
[3] DB for Edible Organisms http://kanaya.naist.jp/LunchBox/top.jsp
37
Allium cepa
Link to KNApSAcK DB
38
Time series change of total number of detected ions
0.1
1
10
0 800
Time (min)
OD600
Number of detected ions
120
01.0
0.01.0
0.01.0
0.0
1.0
0.0
1.0
0.0
Relative ion intensity
T1T2
T3T4
T5T6 T7 T8
Cluster 5
Cluster 3
Cluster 1
Cluster 2
Cluster 4
(a)
(b)
(c)