+ All Categories
Home > Documents > Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c...

Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c...

Date post: 09-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
15
Max-Planck-Institut f ¨ ur Mathematik in den Naturwissenschaften Leipzig Flux-based classification of reactions reveals a functional bow-tie organization of complex metabolic networks by Shalini Singh, Areejit Samal, Varun Giri, Sandeep Krishna, Nandula Raghuram, and Sanjay Jain Preprint no.: 43 2013
Transcript
Page 1: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

Max-Planck-Institut

fur Mathematik

in den Naturwissenschaften

Leipzig

Flux-based classification of reactions reveals a

functional bow-tie organization of complex

metabolic networks

by

Shalini Singh, Areejit Samal, Varun Giri, Sandeep Krishna,

Nandula Raghuram, and Sanjay Jain

Preprint no.: 43 2013

Page 2: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa
Page 3: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

Flux-based classification of reactions reveals a functional bow-tie organization ofcomplex metabolic networks

Shalini Singh1,2, Areejit Samal1,3,4, Varun Giri1, Sandeep Krishna5, Nandula Raghuram6, and Sanjay Jain1,7,8∗1Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India2 Department of Genetics, University of Delhi, South Campus, New Delhi, India

3Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany4Laboratoire de Physique Theorique et Modeles Statistiques,

CNRS and Univ Paris-Sud, UMR 8626, F-91405 Orsay, France5National Centre for Biological Sciences, UAS-GKVK Campus, Bangalore 560065, India

6School of Biotechnology, GGS Indraprastha University, Dwarka, New Delhi 110078, India7 Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India and

8Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

Unraveling the structure of complex biological networks and relating it to their functional role isan important task in systems biology. Here we attempt to characterize the functional organizationof the large-scale metabolic networks of three microorganisms. We apply flux balance analysis tostudy the optimal growth states of these organisms in different environments. By investigating thedifferential usage of reactions across flux patterns for different environments, we observe a strikingbimodal distribution in the activity of reactions. Motivated by this, we propose a simple algorithmto decompose the metabolic network into three sub-networks. It turns out that our reaction classifierwhich is blind to the biochemical role of pathways leads to three functionally relevant sub-networksthat correspond to input, output and intermediate parts of the metabolic network with distinctstructural characteristics. Our decomposition method unveils a functional bow-tie organizationof metabolic networks that is different from the bow-tie structure determined by graph-theoreticmethods that do not incorporate functionality.

PACS numbers: 82.39.Rt 87.18.Vf 87.18.-h

I. INTRODUCTION

Biological systems provide many examples of the intri-cate relationship between the structure and functional-ity of complex networks [1–7]. Cellular metabolism is acomplex biochemical network of several hundred metabo-lites that are processed and interconverted by enzyme-catalyzed reactions [8–13]. Metabolic networks have adynamic flexibility that enables organisms to survive un-der diverse environmental conditions. A key goal ofsystems biology is to unveil the functional organizationof metabolic networks explaining their system-level re-sponse to different environments. To this end, we haveattempted to decompose metabolic networks into func-tionally relevant sub-networks. Flux balance analysis(FBA) has been widely used to harness the knowledge oflarge-scale metabolic networks and investigate genotype-phenotype relationships [14–16]. FBA has been success-ful in predicting the growth and deletion phenotypes oforganisms [17–19]. Reaction fluxes carry informationabout the flows on metabolic networks and, as such, de-scribe the functional use of the network by the organism.In this paper, we have used this information to decom-pose the network into functionally relevant sub-networks.The paper is organized as follows: In section II we

describe the modelling framework in which we studymetabolic networks. In section III we discuss the clas-sification of active reactions in metabolic networks into

[email protected]

three categories by an algorithm that is blind to theirbiochemical roles. Section IV shows that the three cate-gories are functionally relevant for the organism. In sec-tion V we compare the bow-tie architecture obtained byour functional classification of reactions with that ob-tained by graph-theoretic methods that do not employfunctional information. In the last section we concludewith a summary.

II. THE MODELLING FRAMEWORK

A. Flux balance analysis (FBA)

Flux balance analysis (FBA) is a computational ap-proach widely used to analyze the capabilities of genome-scale metabolic networks [14–16]. The stoichiometric ma-trix S encapsulates the stoichiometric coefficients of dif-ferent metabolites involved in various reactions of themetabolic network. The stoichiometric matrix S = (Spj)has dimensions P × N , where P denotes the number ofmetabolites and N denotes the number of reactions inthe metabolic network. Spj is the number of moleculesof the metabolite p produced in reaction j (if metabolitep is consumed in reaction j, Spj is negative). The stoi-chiometric matrix for a hypothetical reaction network isshown in Fig. 1. FBA primarily uses structural infor-mation of the metabolic network contained in the ma-trix S to predict the possible steady state flux distribu-tion of all reactions and the maximum growth rate of anorganism. In any metabolic steady state, the metabo-lites achieve a dynamic mass balance wherein the vector

Page 4: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

2

TABLE I. Comparison of the three metabolic networks: E. coli, S. cerevisiae and S. aureus.

Property E. coli S. cerevisiae S. aureus

Number of metabolites 761 1061 648

Number of reactions in the model 931 1149 641

Number of one-sided reactions in the equivalent network 1167 1576 863

Number of external metabolites 143 116 84

Number of organic external metabolites (carbon sources) 131 107 68

Number of biomass metabolites 49 42 56

Number of feasible minimal environments 89 43 27

Number of active reactions 585 482 418

Number of reactions in category I 185 89 84

Number of reactions in category IIa 147 117 194

Number of reactions in category IIb 42 46 28

Number of reactions in category III 211 230 112

v of fluxes through the reactions satisfies the followingequation representing the stoichiometric and mass bal-ance constraints:

S.v = 0. (1)

Equation 1 is an under-determined linear system of equa-tions relating various reaction fluxes in genome-scalemetabolic networks leading to a large solution space ofallowable fluxes. The space of allowable solutions canbe reduced by incorporating thermodynamic and enzymecapacity constraints. To obtain a particular solution, lin-ear programming is used to find a set of flux values - aparticular flux vector v - that maximizes a biologicallyrelevant linear objective function Z. The linear program-ming formulation of the FBA problem can be written as:

max Z = max {cTv|S.v = 0,a ≤ v ≤ b}, (2)

where vectors a and b contain the lower and upperbounds of different fluxes in v and the vector c corre-sponds to the coefficients of the objective function Z.In FBA, the objective function Z is usually taken to bethe growth rate of the organism. The environment, ormedium, is defined in this approach by the components ofa and b corresponding to the transport reactions, whichdetermine, in particular, the set of metabolites whoseuptake is allowed.

B. Large-scale metabolic networks

In this work, we have analyzed the large-scalemetabolic networks of three microorganisms: Escherichiacoli (version iJR904 [20]), Saccharomyces cerevisiae (ver-sion iND750 [21]) and Staphylococcus aureus (versioniSB619 [22]). Table I gives the number of metabolitesand reactions in the metabolic networks of these threeorganisms. The metabolic networks contain internal andtransport reactions. Internal reactions occur within thecell boundary. Transport reactions represent processesinvolving import or export of metabolites across the cell

boundary. Each model also contains a pseudo biomassreaction that simulates the drain of various biomass pre-cursor metabolites for growth in the specific organism.Starting from the published metabolic network, we ob-tain an equivalent reaction network as follows: Every re-versible reaction in the network is converted into two one-sided (irreversible) reactions so that all reaction fluxes inthe equivalent system are non-negative. A few reactionsappear in duplicate in these networks, and only a singlecopy of each reaction is kept in the equivalent network.The equivalent metabolic network is a reaction set con-sisting of N unique one-sided reactions where N is 1167,1576 and 863 for E. coli, S. cerevisiae and S. aureus,respectively (cf. Table I).

C. Feasible minimal environments and associatedflux vectors

In this work, we have considered ‘minimal’ aerobic en-vironments – minimal in the sense that each environmentcontains a single organic external metabolite that is thesole source of carbon, and single inorganic sources foreach of the elements nitrogen, phosphorus, sulphur, oxy-gen, sodium, potassium and iron, apart from hydrogenions and water. Aerobic means that molecular oxygen isavailable in the external medium. Furthermore the min-imal environments differ from each other solely in theirorganic carbon source; the set of inorganic sources is thesame for all the minimal environments considered herefor any given organism. Thus the number of environ-ments we consider for each organism coincides with thenumber of organic external metabolites (carbon sources)in its metabolic network (cf. Table I). We further assumethat each environment contains a limited amount of theorganic carbon source and unlimited amounts of the inor-ganic metabolites, namely, ammonia (source of nitrogen),pyrophosphate (source of phosphorus), sulphate (sourceof sulphur), molecular oxygen, ions of sodium, potas-sium, iron and hydrogen, and water molecules. From thisset of minimal environments, we used FBA to determine

Page 5: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

3

Reaction Network

R1: 2A + B C + 3D

R2: A + 3B C + E

R6: C + 4E 3B + D

R3: A 2B + D + E

R5: D + 2B C + 2E

R4: 4B D + A

-420110E

1-11103D

-110011C

3-2-42-3-1B

001-1-1-2A

R6R5R4R3R2R1

Stoichiometric Matrix

FIG. 1. Example of stoichiometric matrix for a hypothetical reaction network. The hypothetical reaction networkhas 6 reactions involving 5 metabolites. The rows of the stoichiometric matrix correspond to various metabolites and thecolumns correspond to various reactions in the metabolic network.

the subset of minimal environments supporting growthin the metabolic networks of E. coli, S. cerevisiae andS. aureus. A minimal environment was termed as feasi-ble if the growth rate predicted by FBA was found to benonzero for that environment. The number M of feasibleminimal environments in E. coli, S. cerevisiae and S. au-reus was obtained to be 89, 43 and 27, respectively (cf.Table I) [23]. For each organism, and for each feasibleminimal environment for that organism, we obtained anN -dimensional optimal flux vector v using FBA whosecomponent vj gives the flux of reaction j. For every or-ganism this led to a set of M flux vectors correspond-ing to the M feasible minimal environments, which werestored in the form of a matrix V=(vαj ) of dimensionsN×M where the rows (j=1,2,. . .,N) correspond to dif-ferent reactions in network and columns (α=1,2,. . .,M)to different feasible minimal environments. vαj is definedas the flux of reaction j in the optimal flux vector v ob-tained for environment α.

D. Active reactions

A given reaction j is termed as active in an environ-ment α if vαj >0. The activity m of a reaction denotesthe number of minimal environments in which the reac-tion is active. The activity m for a reaction ranges from0 to M with M equal to 89, 43 and 27 for E. coli, S.cerevisiae and S. aureus, respectively. A reaction j istermed as active in an organism if m≥1 (i.e., if it is ac-tive in at least one feasible minimal environment for thatorganism). The number of active reactions in E. coli,S. cerevisiae and S. aureus was obtained to be 585, 482and 418, respectively (cf. Table I). This paper primarilyfocuses on decomposing this set of active reactions intofunctionally relevant sub-networks.

III. CLASSIFICATION OF ACTIVEREACTIONS

We ask the question: How does the activity of a re-action vary across different environments? To addressthis question, we determine the frequency distribution ofthe activity of reactions in an organism. Fig. 2 showsthe histogram of the activity of reactions in the E. colimetabolic network. The distribution is bimodal. Mostreactions in E. coli are either once-active (m=1) or al-ways active (m=89); the number of reactions for anygiven intermediate activity m in the range 1<m<89 issmall. Thus, the largest number of active reactions in themetabolic network are used in either one environment orin all environments. The histograms of activity of reac-tions in S. cerevisiae and S. aureus also have a patternsimilar to that in E. coli (cf. Fig. 2). The frequency dis-tribution of activity of reactions in the three organismssuggests a natural classification of active reactions intothree categories:

(a) Category I reactions or once-active reactions(m=1)

(b) Category II reactions or always active reactions(m=M)

(c) Category III reactions with intermediate activity(1<m<M)

A. Sub-classification based on correlation ofreaction fluxes

Clustering of gene expression data using the correla-tion coefficient has been successful in predicting regula-tory modules associated with a biological function acrossdiverse conditions [24]. We used the correlation coeffi-cient to identify sets of reactions whose fluxes are cor-related across different environments. We used the setof M flux vectors corresponding to M feasible minimal

Page 6: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

4

FIG. 2. (Color online) The histogram of activity of reactions in the E. coli metabolic network. The bars show thenumber of reactions that have an activity m where m ranges from 1 to 89 feasible minimal environments in the E. coli metabolicnetwork. The green bar represents 185 category I reactions which are once-active. The pink bar represents 147 category IIareactions (a subset of 189 always active category II reactions) that have fluxes perfectly correlated across environments. Thedeep blue bar represents 42 category IIb reactions that account for the remaining category II reactions. The light blue barsaccount for 211 category III reactions with intermediate activity. Insets: Histograms of activity of reactions in S. cerevisiaeand S. aureus. The three categories of reactions in S. cerevisiae and S. aureus were defined in a manner similar to E. coli.

environments contained in the matrix V = (vαj ) to ob-tain the matrix C = (Cjk) where Cjk is the correlationcoefficient between two active reactions j and k and isgiven by:

Cjk =1

M

M∑α=1

vαj vαk

ϕjϕk, (3)

where ϕj =

√√√√ 1

M

M∑α=1

vαj2.

If Cjk = 1 then reactions j and k are perfectly corre-lated with each other in the given set of environments.Perfect clusters in metabolic networks are maximal setsof reactions that are perfectly correlated to each otherpairwise. Perfect clusters are similar to enzyme subsets[25, 26], correlated reaction sets [27, 28] or fully cou-pled sets [29] which have been used to detect modules inmetabolic networks.We use Eq. 3 to identify perfect clusters in metabolic

networks of E. coli, S. cerevisiae and S. aureus. In par-ticular, a large perfect cluster of 147 reactions was foundin E. coli that is a subset of category II reactions. We re-fer to this subset of perfectly correlated reactions withincategory II as category IIa reactions. The remaining 42category II reactions that are always active but not per-fectly clustered with category IIa reactions are part ofcategory IIb. Similarly, large perfect clusters of sizes 117and 194 were found in category II reactions of S. cere-visiae and S. aureus, respectively. In Fig. 2, category IIaand IIb reactions are shown in pink and blue colours, re-spectively. We have shown elsewhere that perfect clusters

are metabolic modules that can be explained by studyingthe connectivity of their constituent metabolites [23].

As mentioned earlier we obtained the flux vectors bymaximizing the objective function Z that corresponds tothe growth rate of the cell. In FBA cell growth stands forthe production of all the ‘biomass metabolites’ in spec-ified ratios that correspond to the composition of theaverage cell under consideration. The role of growthmaximization is to obtain an explicit flux vector for eachmedium. While the magnitudes of the components ofv obtained by maximization of the growth rate dependupon the precise ratios, the activity of a reaction, as de-fined above, depends not on the actual magnitude of thecorresponding component of v, but only on whether themagnitude is zero or nonzero. The latter does not de-pend upon the precise ratios of the biomass metabolitesin the objective function, but only on the set of metabo-lites that are present in the objective function. Thus ourclassification results are quite robust to the perturbationof the ratios in the objective function, as long as the setof biomass metabolites is held fixed (details not shown).

Note that we have used a single optimal flux vector vobtained using FBA for each of the M feasible minimalenvironments to determine the activity of a reaction andthe set of active reactions in the metabolic network of anorganism. However, it is well known that there exist mul-tiple flux vectors or alternate optimal solutions in mostlarge-scale metabolic networks that maximize growth ina given environment [28, 30–32]. In principle, due to thepresence of alternate optima, the set of active reactionscan change depending on the choice of the flux vectors.In Appendix A, we show the robustness of our reaction

Page 7: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

5

FIG. 3. (Color online) Standard deviation versus mean flux of active reactions in the E. coli metabolic network.The plot shows standard deviation σ versus mean flux ⟨v⟩ of the 585 active reactions in E. coli metabolic network acrossM = 89 feasible minimal environments on a logarithmic scale. The green, pink, dark blue and cyan dots represent categoryI, IIa, IIb and III reactions, respectively. The three categories of reactions show up quite distinctly (upper line, category I;lower line, category IIa; with category IIb and category III in between the two lines). The upper line is the expected curve

σ = (M − 1)1/2⟨v⟩ for category I reactions. The lower line is the expected curve σ = b⟨v⟩ for perfectly correlated category IIareactions with b = 0.98 ± 0.1 obtained via best fit to the data. Insets: Scatter plots of σ versus ⟨v⟩ of active reactions in S.cerevisiae and S. aureus metabolic networks.

categories to the presence of alternate optima.

B. Scatter plot of standard deviation versus meanflux of reactions across environments discriminates

between the three categories

For each active reaction, following Almaas et al [33],we have calculated the mean flux ⟨v⟩ and the standarddeviation σ around this mean by averaging the flux of thereaction over M feasible minimal environments. Fig. 3shows the scatter plot of σ versus ⟨v⟩ for active reactionsin E. coli. It is evident that the distribution of pointsis different for the various categories we have defined.All category I points lie on the upper line, all categoryIIa points lie on the lower line, while category IIb andcategory III points lie largely in between the two lines.The upper line in Fig. 3 is the expected curve σ = (M −1)1/2⟨v⟩ for category I reactions and the lower line is thecurve σ = b⟨v⟩, where b is obtained via best fit of data forcategory IIa reactions. Appendix B gives the derivationof the relation between σ and ⟨v⟩ for category I and IIa

reactions. Our classification of reactions into the threecategories did not use the actual values of the fluxes ofthe reactions, but only the information about whetherthe flux was zero or nonzero in a particular medium. Fig.3 uses information about the actual flux values. It showsthat the different categories of reactions are distinct fromeach other by virtue of the statistical properties of theirmagnitudes as well.

IV. FUNCTIONAL RELEVANCE OF THETHREE CATEGORIES OF REACTIONS

Until now our classification of active reactions into thethree categories was solely motivated by the activity ofreactions in E. coli, S. cerevisiae and S. aureus with twovery prominent peaks for once-active and always activereactions (cf. Fig. 2). However, we now show that ourthree categories I, II, and III obtained using a computa-tional algorithm blind to the biochemical nature of path-ways correspond to the input, output and intermediatesub-networks, respectively. Thus, each category of reac-

Page 8: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

6

Carboxylates

Amino Acids and Amines

SugarsPentoses

Sugars and Sugar-Alcohols

Sugar Acids

Nucleosides and Nucleotides

Fatty Acids and Small MoleculesMiscellaneous food sources from different biochemical classes whose input pathways are connected

Aminosugars

12ppd-S[e]

12PPDt

2ddglcn[e]

DDGLCNt2r

3hcinnm[e]

HCINNMt2r

3hpppn[e]

HPPPNt2r

4abut[e]

ABUTt2

acac[e]

ACACt2

acald[e]

ACALDt

ac[e]

ACt2r

acgam[e]

ACGApts

acmana[e]

ACMANApts

acnam[e]

ACNAMt2

adn[e]

ADNt2r

akg[e]

AKGt2r

ala-D[e]

DALAt2r

ala-L[e]

ALAt2r

alltn[e]

ALLTNt2r

arab-L[e]

ARBt2r

arg-L[e]

ARGORNt7

asn-L[e]

ASNt2r

asp-L[e]

ASPt2

but[e]

BUTt2r

cit[e]

CITt7

cytd[e]

CYTDt2r

dad-2[e]

DADNt2

dcyt[e]

DCYTt2

dgsn[e]

DGSNt2

dha[e]

DHAt

din[e]

DINSt2

duri[e]

DURIt2

etoh[e]

ETOHt2r

fru[e]

FRUptsFRUpts2

fuc-L[e]

FUCt

fum[e]

FUMt2_2

g6p[e]

G6Pt6_2

galct-D[e]

GALCTt2r

galctn-D[e]

GALCTNt2r

gal[e]

GALt2

galt[e]

GALTpts

galur[e]

GALURt2r

gam[e]

GAMpts

glc-D[e]

GLCt2 GLCpts

glcn[e]

GLCNt2r

glcr[e]

GLCRt2r

glcur[e]

GLCURt2r

gln-L[e]

GLNabc

glu-L[e]

GLUt4

glyald[e]

GLYALDt

glyc3p[e]

GLYC3Pt6

glyc[e]

GLYCt

glyclt[e]

GLYCLTt2r

gly[e]

GLYt2r

gsn[e]

GSNt2

hdca[e]

HDCAt2

idon-L[e]

IDONt2r

ins[e]

INSt2r

lac-D[e]

D-LACt2

lac-L[e]

L-LACt2r

lcts[e]

LCTSt

mal-L[e]

MALt2_2

malt[e]

MALTabc

malthx[e]

MALTHXabc

maltpt[e]

MALTPTabc

malttr[e]

MALTTRabc

maltttr[e]

MALTTTRabc

man6p[e]

MAN6Pt6_2

man[e]

MANpts

melib[e]

MELIBt2

mnl[e]

MNLpts

ocdca[e]

OCDCAt2

pppn[e]

PPPNt2r

pro-L[e]

PROt2r

pyr[e]

PYRt2r

rib-D[e]

RIBabc

rmn[e]

RMNt

sbt-D[e]

SBTpts

ser-D[e]

DSERt2r

ser-L[e]

SERt4

sucr[e]

SUCpts

tartr-L[e]

TARTRt7

thr-L[e]

THRt2r

tre[e]

TREpts

trp-L[e]

TRPt2r

ttdca[e]

TTDCAt2

uri[e]

URIt2r

xtsn[e]

XTSNt2r

xyl-D[e]

XYLt2

cittartr-L

TARTD

oaa

3hcinnm

3HCINNMH

dhcinnm

DHCINDO

hkntd

HKNTDH

fum op4en

FRD2

succ

acakgmal-L lac-L glyclt

3hpppn

3HPPPNH

dhpppn

pppn

PPPNDO

cechddd

DHPPD

acac

ACACCT

aacoa

but

BUTCT

btcoa

FAO4

ASNN

asp-L

asn-L

ASNS1

ala-L

ala-D

ALAR

arg-L

ARGDCAST

agm

AGMT

ptrc urea

PTRCTA

4abutn

ABUTD

4abut

sucarg

succoa

SADH

sucorn

SOTA

sucgsa

SGSAD

sucglu

SGDS

glu-L

gln-L

GLUN

alltn

ALLTN

UGLYCH

glx

urdglyc

alltt

ALLTAH

glythr-L pro-L

PROD2

1pyr5c

gal g6pglc-D

suc6p

FFSD

fru

XYLI2i

lcts

LACZ

melib

GALS3

tre6p

TRE6PH

man6p

maltpt

malthx

MLTP2

g1p

maltmaltttr malttr

xyl-D

XYLI1

xylu-D

XYLK

xu5p-D

arab-L

ARAI

rbl-L

RBK_L1

ru5p-L

RBP4E

rib-D

RBK

r5p f1p

FRUK

fdp

f6p

mnl1p

M1PD

sbt6p

SBTPD

glcr

GLCRD

5dh4dglc

galct-D

GALCTD IDOND

5dglcn

5DGLCNR

idon-L

glcn

glcur

GUI1

fruur

MANAO

mana

MNNH

2ddglcn

galur

GUI2

tagur

TAGURr

altrn

ALTRH

ins

adn

ADA

dad-2

PUNP2

uri

cytd

CYTD CYTDK2

gdp

NDPK1

gtp

dcyt

DCYTD

duri 2dr1p ade

ADD

hxan

gsn

PUNP3

r1p gua

xtsn dgsndin

hdca

AACPS3FAO2

palmACPaccoa

ttdca

AACPS1 FAO1

myrsACP

ocdca

FAO3

etoh

ADHEr

glyc

glyald

ALCD19

acalddhaglyc3p

12ppd-S

LCAR

lald-L

rmn

RMI

rml

RMK

rml1p

RMPA

dhap

fuc-L

FCI

fcl-L

FCLK

fc1p

FCLPA

galt1p

GLTPD

tag6p-D

PFK_2

tagdp-D

TGBPA

g3p

galctn-D

GALCTND

2dh3dgal

DDGALK

2dh3dgal6p

DDPGALA

pyr

acnam

ACNML

acmana

AMANK

acmanap

lac-D

LDH_D

ser-D

SERD_D

ser-L

SERD_L

trp-L

TRPAS2

INDOLEt2r

indole[e]

indole

acgam6pgam6p

ORNTA

glu5sa

orn

MTHFC

methf

MTHFD

10fthf

DADK

damp

NTD6

dadp

DCTPD

dutp

dctp

NDPK6

XPPT

xmp

prpp xan

dudp

DHORD5

orot

dhor-S

FIG. 4. (Color online) Category I reactions in E. coli. This figure shows the bipartite graph of 185 category I reactions in E.coli. Rectangles represent reactions and ovals metabolites. External nutrient metabolites (organic carbon sources) are depictedin green and biomass metabolites in pink. For convenience, we have chosen to omit the high degree currency metabolites (suchas ATP) from the figure in order to reduce clutter and focus on the biochemically relevant transformation in each reaction.Abbreviation of metabolites and reactions are as in iJR904 model [20]. The figure was drawn using Graphviz software [34]. Thehigh resolution electronic version of this figure can be zoomed in to read node labels and biochemical categories of boxes. Wehave classified the external metabolites and grouped together their input pathways in boxes based on biochemical similarity.

Sugars

Monosaccharides

Disaccharides

g6p[e]

G6Pt6_2

gal[e]

GALt2

glc-D[e]

GLCt2 GLCpts

lcts[e]

LCTSt

melib[e]

MELIBt2

sucr[e]

SUCpts

t re [e ]

TREpts

gal g6pglc-D

suc6p

FFSD

fru

XYLI2i

lcts

LACZ

melib

GALS3

t r e 6 p

TRE6PH

FIG. 5. (Color online) A small portion of category I sub-network in E. coli showing sugar input pathways. Thefigure shows category I reactions in the input pathways for external nutrient metabolites classified into the biochemical category‘Sugars’. Two kinds of sugars are shown here: monosaccharides and disaccharides. The input pathways for 7 external sugarmetabolites fan-in downstream into 3 monosaccharide metabolites which occur at the boundary between category I and IIIsub-networks. Conventions are the same as in Figure 4.

tions is a sub-network with a distinct functional role inmetabolism.

A. Category I: Fan-in of input pathways

Fig. 4 shows the sub-network of all 185 category Ireactions in E. coli. The figure shows a number of essen-tially linear paths of one to about five reactions startingfrom an external nutrient metabolite, often convergingto some other metabolite. These are the input pathwaysof those metabolites, typically starting from their trans-port reaction that brings them into the cell, and sub-sequent catabolic reactions that break them down intoother metabolites. Input pathways of 86 out of the 89 ex-ternal nutrient metabolites (carbon sources) characteriz-ing different feasible minimal environments are contained

in category I, thereby implying that category I essentiallycovers all the input pathways of metabolism. Similarly,we find that category I reactions in S. cerevisiae and S.aureus contain input pathways for most external nutri-ent metabolites characterizing different feasible minimalenvironments. Thus, category I essentially correspondsto input part of the metabolic network.

Fig. 5 shows a portion of category I reactions belong-ing to sugar input pathways in E. coli where several ex-ternal sugar metabolites converge downstream into a fewintermediate metabolites. Thus, the input pathways incategory I exhibit the fan-in property whereby diverseexternal nutrient metabolites are first catabolized intoa smaller set of intermediate metabolites before beingdrawn into the interior of the metabolic network. Usuallythe external nutrients whose input pathways converge toa common metabolite belong to the same biochemical

Page 9: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

7

class (cf. Figures 4 and 5). Fig. 4 contains a num-ber of disconnected subgraphs each describing the inputpathways of one or more biochemically similar metabo-lites; these disconnected paths get connected to the largermetabolic network via further downstream reactions thatbelong to other categories and are not shown in Fig. 4.

B. Category II: Output biosynthetic pathways

A key biological function of the metabolic networkis to convert nutrient metabolites in the environmentinto biomass metabolites required for growth and main-tenance of the cell. The biomass metabolites, which in-clude all the amino acids, nucleotides, lipids and certaincofactors, may be considered to be the output of themetabolic network. Category II reactions are always-active and have a nonzero flux for any feasible minimalenvironment. We found that the category II sub-networkhas biosynthetic pathways for 30 out of the 49 biomassmetabolites in E. coli. These pathways are typically thesole production pathways of those biomass metabolitesin E. coli [23]. Thus, this sub-network is at the outputend of the metabolism.Of the 189 category II reactions in E. coli, 147 reactions

belong to category IIa, whose fluxes are perfectly corre-lated across the different minimal environments. Fig. 6shows the graph of the category IIa sub-network in E.coli, which is the single largest perfect cluster of reac-tions. The remaining 42 reactions in category II consti-tute the category IIb; these are always active but notperfectly correlated with category IIa reactions and witheach other. Thus, the fluxes of category IIb reactionsvary in a more complicated manner across minimal en-vironments. Categories IIa and IIb exist with similarproperties in the metabolic networks of the other two or-ganisms (cf. Table I). In our previous work, we haveshown that most of the category II reactions are essen-tial for growth irrespective of the environment [23]. Theset of category II reactions is a superset of reactions inthe activity core found earlier by Almaas et al [35] whichare reactions always used across minimal as well as richenvironments.

C. Category III: Intermediate pathways betweeninput and output

Fig. 7 shows the sub-network of category III reactionsin E. coli, which are neither once-active nor always active;the activity of these reactions depends on the availabilityof nutrients in a more complicated manner. Category IIIreactions may be considered to constitute the intermedi-ate part of the network. By comparing the structures ofthe three categories, it is evident that category III has ahighly reticulate and complex architecture compared tocategories I and II. There is a functional reason for the ob-served complexity in the category III sub-network. Thebiomass metabolites collectively contain several different

types of chemical structures (moieties), and the E. colimetabolic network is capable of producing these biomassmetabolites from different minimal environments, eachcontaining a different (and single) carbon source. A typ-ical external carbon source has one or a few moieties withdifferent nutrients containing different subsets of moi-eties. Category I reactions transport the carbon sourcesinto the cell and break it down into a small set of moi-eties. The function of category III reactions is to startwith a small set of moieties and produce all the moietiesrequired for biomass production. This requires a complexset of internal transformations and the exact set of trans-formations required depends on the nature of the inputmoieties. Thus, the activity of category III transformingreactions depends upon the biochemical nature of avail-able nutrients in different minimal environments. We findthat category III contains most of the reactions in cen-tral metabolism such as the citric acid cycle. A similararchitecture of the category III sub-network was foundin the metabolic networks of the other two organisms aswell. Some of the biomass metabolites are produced incategory III itself. For the other biomass metabolites cat-egory III produces precursors which are then taken up inthe biosynthetic pathways of category II to produce thebiomass metabolites.

V. COMPARISON OF FUNCTIONAL BOW-TIEDECOMPOSITION WITH GRAPH-THEORETIC

BOW-TIE DECOMPOSITION

Ma and Zeng [11, 36] have used graph-theoretic mea-sures to reveal a bow-tie architecture of metabolic net-works similar to that seen in World Wide Web (WWW)[37], wherein the network can be decomposed into anin-component, out-component and a giant strong com-ponent. Given a directed graph, a strong component isa maximal subgraph such that for any pair of nodes iand j in the subgraph there exists a directed path fromi to j and from j to i within the subgraph. In general,a directed graph can have many strong components, andthe strong component with the largest number of nodesis designated as the giant strong component (GSC). Theassociated in-component consists of nodes which have ac-cess to GSC nodes via some directed path, but cannot bereached from any GSC node via a directed path. The out-component consists of nodes which can be reached fromthe GSC nodes via some directed path, but lack accessto any GSC node via a directed path. A picture of theideal graph-theoretic bow tie is shown in Fig. 8.

In this work, we have decomposed the metabolic net-work into three categories using a simple algorithm basedon activity patterns of reactions across different minimalenvironments. Our categorization reveals a functionalbow-tie architecture wherein the input pathways (cate-gory I reactions) fan into intermediate metabolism (cate-gory III reactions) which forms the knot of a bow-tie andfrom where the output pathways (category II reactions)for various biomass components fan out.

Page 10: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

8

L-Tyrosine and L-Phenylalanine

dTTP L-Isoleucine

UDPglucoseand glycogen

L-Histidine

L-LeucineL-Lysine and Peptidoglycan subunit of E.coli

5-Methyltetrahydrofolate, L-Methionineand Spermidine L-Cysteine

Phosphatidylserine, Phosphatidylethanolamine, Phospatidylglycerol, Cardiolipin and lipopolysaccharide (E.coli)

dATP Currency Reactions

TYRTA

tyr-L

3 4 h p p

PPND

p p h n

PPNDH

PHETA1

phe-L

phpyr

CHORM

chor

ADCS

NDPK4

d t t p

dtdp

DTMPK

d t m p

TMDS

dhf

DHFR

d u m p

DHFS

glu-Ld h p t

DHPS2

6 h m h p t p p4 a b z

HPPK2

6 h m h p t

GCALDD

glyclt

gcald

DHNPA2

d h n p t

DNMPPA

dhpmp

DNTPPA

a h d t

GTPCI

ADCL

pyr

4adcho ILETA

ile-L

3 m o p

DHAD2

2 3 d h m p

KARA2i

2 a h b u t

ACHBS

pyr2 o b u t

GALU

udpg

LPSSYN_EC

g1p

GLGC

adpglc

GLCS1

glycogen

HISTD

his-L

histd

HISTP

hisp

HSTPT

imacp

IGPDH

eig3p

IG3PS

aicar

prlp

PRMICIi

prfp

PRAMPC

prbamp

PRATPP

prba tp

ATPPRT

prpp

NNDPR

NADS1

dnad

NNAT

nicrnt

quln

QULNS

dhapiasp

ASPO3

asp-L

ASP1DC

DPCOAK

dpcoa

PTPATi

p a n 4 p

PPCDC

4ppcys

PPNCL2

4 p p a n cys-L

PNTK

pnto-R

PANTS

ala-B pant-R

DPR

2 d h p

MOHMT

3 m o b

IPPS

DHAD1

2 3 d h m b

3c3hmp

accoa

IPPMIb

2ippm

IPPMIa

3c2hmp

IPMD

3c4mop

OMCDC

4 m o p

LEUTAi

leu-L

KARA1i

alac-S

ACLS

pyr

PPTGS

peptido_ECudcpdp

u a a g m d a

UDCPDP

udcpp

PAPPT3

UAGPT3

uacgam

UAGCVTUAGAAT

u a g m d a

u g m d a

UGMDDS

alaalau g m d

UAAGDS

u a m a g 26dap-M

DAPDC

UAMAGS

u a m a glu-D

UAMAS

ala-L u a m r

UAPGR

uaccg

pep

UAGDP

acgam1p

G1PACT

accoag a m 1 p

PGAMT

g a m 6 p

GLUR

glu-L

ALAALAr

ala-D

DAPE

26dap-LL

SDPDS

succ

s l26da

SDPTA

sl2a6o

THDPS

succoat h d p

DHDPRy

23dhdp

DHDPS

pyra spsa

lys-L

MTRI

5 m d r u 1 p

MDRPD

5mdr1p

MTRK

5 m t r

MTAN

a d e

5 m t a

SPMS

spmd

pt rc a m e t a m

ADMDCr

a m e t

METAT

met-L

UNK3

2 k m b

DKMPPD

dkmpp

METS

5mthf hcys-L

MTHFR2 CYSTL

pyr

cyst-L

SHSL1

succ

cys-L suchms

HSST

succoahom-L

CYSS

a c cys-L

acse r

SERAT

accoaser-L

SADT2

gdp a p s

s o 4

ADSK

paps

PAPSR

pap s o 3

BPNT SULR

h 2 s

SULabc

so4[e]

A5PISO

a r a 5 p

ru5p-D

KDOPS

KDOPP

kdo

kdo8p

KDOCT2

PEPT_EC

12dgr_EC

cdpea

pe_EC

DAGK_EC

pa_EC

DASYN_EC

S7PI

gmhep7p

s 7 p

GMHEPK

GMHEPPA

gmhep1p

g m h e p 1 7 b p

GMHEPAT

pep

ckdo

MOAT

MOAT2

kdolipid4

lipidA

kdo2lipid4

EDTXS1

u 3 a g a

3hmrsACP

U23GAAT

UHGADA

LPADSS

lipidAds

lipidX

u 2 3 g a

USHD

TDSK

a cu 3 h g a

kdo2lipid4L

ddcaACP

KAS16

EDTXS2

lipa

myrsACP

AGMHE

adphep-L,D

adphep-D,D

cdp lps_EC

cdpdag1

PGSA_EC PSSA_EC

CLPNS_EC

glycclpn_EC

pg_EC

C120SN

actACP

C181SNC141SN C161SN

malACP

octeACP

PASYN_EC

tdeACP hdeACP

PGPP_EC

pgp_EC

glyc3p

glyc3pmyrsACP palmACP

PSD_EC

ps_EC

ser-L

NDPK8

da tp

dadp

FMNAT

fmn

RBFK

ribflv

RBFSb

4 r 5 a u

RBFSa

dmlz

db4p

PMDPHT

5aprbu

DB4PS

ru5p-D

APRAUR

5 a p r u

DHPPDA2

25drapp

GTPCII2NADK

FIG. 6. (Color online) Category IIa reactions in E. coli. This figure shows the graph of 147 category IIa reactions inE. coli whose reaction fluxes are perfectly correlated across minimal environments. Conventions are the same as in Figure 4.The preponderance of biomass metabolites (pink ovals) in this figure signifies that these reactions are at the output end of themetabolic network. The reactions have been grouped together into boxes based on common biosynthetic pathways.

In our functional bow-tie decomposition, the three cat-egories I, II and III of reactions discussed above broadlycorrespond to the in-component, out-component andGSC, respectively, of the graph-theoretic bow-tie decom-position by Ma and Zeng [11, 36]. However, the cor-responding sets of reactions in the two decompositionsdiffer in detail. For example, we find that the end prod-ucts of several (and often long) chains of reactions in thecategory II sub-network are re-cycled resulting in feed-back loops. Such feedback loops in the category II sub-network presumably minimize wastage and could be in-strumental in producing the biomass metabolites in thedesired ratios. An example of such a feedback loop incategory II sub-network is the one involving metabolite5mdr1p (which can be seen in the electronic version ofFig. 6 upon zooming). The biosynthetic pathways in-volved in such feedback loops appropriately belong to theoutput part of metabolism because they connect the pre-cursor metabolites to the outputs. However, the graph-theoretic bow-tie decomposition would classify such cat-egory II reactions in feedback loops into the GSC. Thus,our functional bow-tie decomposition based on fluxes ofreactions across different environments gives a better in-sight and is biochemically more realistic. The picture ofthe metabolic network our decomposition reveals is sim-ilar in spirit to the one envisioned by Csete and Doyle[12].

VI. DISCUSSION AND CONCLUSIONS

In this paper, we have performed flux balance analy-sis (FBA) for the metabolic networks of three microor-ganisms: E. coli, S. cerevisiae and S. aureus to obtainfluxes of reactions in the network under diverse envi-ronmental conditions. We have followed a purely algo-rithmic approach leveraging on the predicted fluxes ofreactions across different minimal environments to de-compose the metabolic network into functionally relevantsub-networks. We find that the activity of a reactiongiven by the number of minimal environments for whichit has a nonzero flux is an important indicator of the func-tional role of a reaction. We have classified the reactionsinto three functional categories based on their activity.Category I contains once-active reactions which are usedin only one minimal environment. Most reactions belong-ing to the category I sub-network are uptake pathwaysfor external nutrients in feasible minimal environments,and the primary function of these reactions is to catab-olize external nutrients into simpler metabolites whichcan be further processed by intermediary metabolism.Category II contains always active reactions which areused in all minimal environments. The category II sub-network is critical for the survival of the organism andaccounts for the majority of the biosynthetic pathwaysfor the production of the biomass metabolites at the out-put end of metabolic network. Category III contains re-actions which are used in an intermediate number of min-

Page 11: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

9

ALATA_L

pyr

ala-L

ALARVPAMT

ALATA_LPDH

TRPAS2

PFL

ala-D

ASNS2

asn-L

asp-L

ASPT

ASPTA

ARGSS

ADSS

PRASCS

ASPCT

fum

o a a

ASPTA MCITS

PPCK

CS

val-L

3 m o b

VALTA

MICITD

micit

2mcacn

MCITL2

LCAD

lac-L

lald-L

L-LACD2

PPM2

2dr5p

2dr1p

DURIPP

DRPA

PPM

r 5 p

r 1 p

PYNP2r

PUNP5

PUNP1

PUNP3

PPMRPI

TKT1

g3p

acald

FBA

F6PA

GAPD TPI

TALA TKT1

TKT2

ACALDi

THRAr

DHAPT

dhap

G3PD2TPI

d h a

UDPG4E

udpg

UGLT

udpgal

GALKr

gal1p

gal

g1p

PGMT

GLYCDx

glyc

GLYK

glyc3p

G3PD2

GLCRAL

2 h 3 o p p a n

5dh4dglc

HPYRI

hpyr

HPYRRx HPYRRy

GNK

6pgc

glcn

GND

DDGLK

2ddg6p

2ddglcn

EDA

MLTP1

mal t t t r

ma l tp t

AMALT2

MLTG2

glc-D

HEX1

mal t

MLTG1

mal t t r

MAN6PI

f6p

m a n 6 p

GF6PTAF6PA PFK PGITALA

HPPPNDO

hkndd

dhpppn

HKNDDH

o p 4 e n

succ

OP4ENH

SUCOAS

4 h 2 o p n t n

HOPNTAL

accoa

MALS

ACGS

KAS15

ACOATA

AGDC

a c

g a m 6 p

acgam6p

G6PDA

g6p

PGMTPGI G6PDH2r

2mcit

ppcoa

MCITD

OBTFL

for

2 o b u t

FDH2 GART

glyc-R

GLYCK

ICL

glx

GLXCL

icit

ACONTICDHyr

mal-L

ME2ME1 MDH

pep

PPCENO

SSALx

sucsal

SSALy

G5SADs

1pyr5c

glu5sa

P5CRP5CD

acglu

glu-L

GLU5K GLNS

ACGK

acg5p

AGPR

acg5sa

ACOTA

acorn

ACODA

o r n

OCBTORNDC

argsuc

citr-L

ARGSL

arg-L

cbp

ABTA

4 a b u t

G5SD

glu5p

pro-L

pt rc

PTRCORNt7

gln-L

GMPS2GLUPRT

PRFGS

CTPS2ANS

ACONT

cit

CITL

akg

ICDHyr

TEST_AKGDH

succoa

SUCOAS

MTHFC

10fthf

me th f

FTHFD

GARFT

MTHFD

mlthf

GLYCL

thf

GHMT2

GARFT

gly

PRAGSr

GLUDy GLUDy

ser-L

PGCD

3 p h p

3pg

PGM PGK

PSERT

PSP_L

pser-L

ENO

2pg

PGM

FBA

fdp

FBP

13dpg

GAPDPGK

PPS PYK

ACACT1r

aacoa

C140SN

ACP myrsACP

actACP

C160SN

malACP

KAS14

acACP

palmACP

DADA

din

dad-2

PUNP6

ADK3

DURIPP

u r a

duri

DURIK1

URAt2r

DGK1

dgmp

NTD8

dgdp

NDPK5

INSK

imp

ins

IMPD

GSNK

gmp

GMPR

g s n

dgtp

RNDR2

trdox

gdp t rdrd

RNDR4 RNTR2

dudp

udp

URIDK2r

g tp

d u m p

URIK2

u m p

NTD2

uri

PYNP2r

dgsn

PUNP4

NTD10

x t s n

xmp

PUNP7

h x a n g u a

GUAD

a d n

a d e

x a n

XANt

THD2 TEST_NADTRHD

PGL

6pgl

ru5p-D

RPERPI

xu5p-D

RPE

TKT2

e 4 p

s 7 p

dcamp

ADSL1r

ADSL2r

aicar

25aics

5aizc

ga r

p r a m

AIRC3

5caiz

prpp

ORPT

ANPRT

AIRC2

hco3air

fpram

fgam

PRAIS

cbasp

DHORTS

dhor-S

DHORD2

oro t

oro t5p

OMPDC

ctp

u t p

AMANAPE

a c m a n a p

trp-L

indole

TRPS3

3ig3p

PRAIi

2cpr5p

p r a n

IGPS

a n t h

chor

THRAr

thr-L

HSK

p h o m

hom-L

THRS

ura[e ]

xan[e]

H2Ot H2Ot NH3t NH3t NAt3_1

SUCCt2_2

succ[e]

UREAt

urea [e ]

u r e a

ptrc[e]

orn[e]

PTRCt2r

FIG. 7. (Color online) Category III reactions in E. coli. This figure shows the network of reactions that are active in two ormore minimal environments considered, but not in all the environments. Conventions are the same as in Figure 3. Comparingthis graph of category III reactions with category I and IIa reactions (cf. Figures 4 and 5), it is evident that category IIIsub-network has a highly reticulate structure with many loops.

Page 12: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

10

FIG. 8. (Color online) The ideal graph-theoretic bow-tie for a directed bipartite graph. The figure depicts the idealbow-tie decomposition of a directed bipartite graph into three components: in, out and giant strong component correspondingto shaded regions green, pink and blue, respectively. Ovals represent objects (e.g., metabolites in the metabolic network) andrectangles processes (e.g., chemical reactions) that modify or combine objects to produce other objects. The figure showspathways starting from the input nodes in the in component (green ovals in green region) and converging to an irreduciblesubgraph representing the giant strong component (blue region). Output paths fan out from the giant strong component andterminate in the output nodes in the out component (red ovals in pink region).

imal environments, and is responsible for generating the‘precursor’ molecules that are eventually converted intobiomass metabolites by Category II reactions. We findthat while category I and II sub-networks are dominatedby simple linear pathways, the structure of the categoryIII sub-network is highly reticulate. In summary, our de-composition method for large-scale metabolic networksbased on activity of reactions captures the proposed func-tional bow-tie organization by Csete and Doyle: the in-put pathways (category I reactions) for nutrients in theenvironment fan into intermediate metabolism (categoryIII reactions) which forms the knot of bow-tie from wherethe output biosynthetic pathways (category II reactions)for biomass components fan out. Our results are validfor metabolic networks of three phylogenetically differ-ent organisms (two distinct prokaryotes and a eukaryote),which suggests that the observed functional bow-tie or-ganization could be quite common in living systems.Our functional classification of reactions uses an im-

portant additional piece of information that the purelygraph-theoretic classification does not, namely, the listof the biomass metabolites that are the outputs ofmetabolism. The question arises as to whether the classi-fication predicted by the graph-theoretic approach couldbe significantly improved by including this information(say, by somehow tagging the biomass metabolites in thegraph). We think that this is unlikely. There does notseem to be any obvious method of utilizing this informa-

tion in a purely topological analysis of the network. Onemight consider declaring these tagged metabolites to bepresent only at the output end of the network and thusexclude them (by hand) from the intermediate pathways.However, we note that while biosynthetic pathways of 30of the biomass metabolites were found in category II re-actions, several of the biomass metabolites were synthe-sized in the category III reactions. The latter metabolitessuch as alanine and valine are thus not only the outputsof metabolism, they also play an important role in theintermediate pathways required for the interconversionand synthesis of other metabolites. Thus a declarationsuch as the above would not be appropriate.

We remark that in the present work we have classifiedonly the reactions of the metabolic network into threebroad categories: input, output and intermediate. Theclassification of metabolites is more subtle and we in-tend to report on this in another contribution. Whilesome metabolites participate in reactions belonging toonly one of the three categories, several participate inreactions belonging to more than one category. The lat-ter includes the currency metabolites such as ATP, ADP,NADP, NADPH, etc. It is important to note that ourflux-based categorization of reactions does not involve thea priori exclusion of the high degree currency metabolitesas was needed in the graph-theoretic bow-tie decomposi-tion of the metabolic network [11, 36].

Cellular metabolism is only one of a large class of func-

Page 13: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

11

tional systems where inputs are transformed into out-puts through ‘reactions’ or processes involving disinte-grations, conversions, recombinations, etc. Other exam-ples include any complex manufacturing facility, or evena production economy as a whole. Communication sys-tems also share some of the features. The patterns offlows across the network as captured by the fluxes of thereactions carry important information about network ar-chitecture and functionality. The methods presented herecould be useful in studying these patterns in fields otherthan cellular metabolism.

Appendix A: Robustness of categorization ofreactions to alternate optimal solutions

In this work, flux balance analysis (FBA) was used toobtain a particular flux vector v or optimal solution thatmaximizes the objective function taken as the growthrate in a given minimal environment. However, for large-scale metabolic networks, there exist multiple flux vectorsv or alternate optimal solutions that maximize growthin a given minimal environment, i.e., there are many fluxvectors v with exactly the same value of the objectivefunction but use different alternate pathways in the net-work [28, 30–32]. FBA finds one of many possible alter-nate optima for a given minimal environment that max-imizes growth. In the main text, we have used a singleoptimal flux vector v for each of the M feasible minimalenvironments to determine the activity of a reaction andthe set of active reactions in the metabolic network of anorganism. Since, in principle, the activity of a reactioncan change depending on the particular flux vector con-sidered, we study the robustness of our categorization ofreactions to the presence of alternate optima.Flux variability analysis (FVA) [31] can be used to de-

termine the set of reactions whose fluxes vary across al-ternate optima for a given minimal environment. Specif-ically, FVA determines the maximum and minimum fluxvalue that each reaction can take across alternate optimafor a given minimal environment. FVA involves the fol-lowing steps:

(a) Determine using FBA the maximum value of theobjective function Z or growth rate vαbiomass in agiven minimal environment α.

(b) Fix the flux of the biomass reaction equal tovαbiomass.

(c) Change the objective function Z to be the flux ofa reaction j.

(d) Using linear programming determine the maximumflux value vαj,max of reaction j in the minimal en-vironment α, constraining the biomass reaction tohave a flux equal to vαbiomass.

(e) Using linear programming determine the minimumflux value vαj,min of reaction j in the minimal en-vironment α, constraining the biomass reaction tohave a flux equal to vαbiomass.

(f) The range vαj,min to vαj,max gives the variability offlux of reaction j across different alternate optima.

(g) The above steps c, d, e and f can be repeated forevery reaction j in the metabolic network to de-termine the flux variability of each reaction acrossalternate optima for a given minimal environmentα.

We have used FVA to determine vαj,max and vαj,min foreach reaction j and for each feasible minimal environ-ment α in the E. coli metabolic network. A reactionj is designated as blocked if vαj,max=0 for all M feasibleminimal environments [29, 38]. We found 329 blocked re-actions in the E. coli metabolic network. The remaining838 reactions, for which vαj,max>0 for at least some envi-ronment α are designated as potentially active reactions.This set includes the 585 active reactions considered inthe main text. We define a reaction j as essential fora given minimal environment α if vαj,min>0. 484 reac-tions were found to be essential for some α in the E. colimetabolic network which are a subset of the 585 activereactions considered in the main text. We now classifythese 484 reactions into the following three categories:

(a) Essential category I: Reactions which satisfyvαj,min>0 for exactly one minimal environment. Wefound 162 reactions in the E. coli metabolic net-work to be in Essential category I. Of these, 153reactions belong to category I of the main text.

(b) Essential category II: Reactions which satisfyvαj,min>0 for all M minimal environments. Wefound 171 reactions in the E. coli metabolic net-work to be in Essential category II. All of thesebelong to category II of the main text.

(c) Essential category III: Reactions which satisfyvαj,min>0 for m minimal environments where1<m<M . We found 151 reactions in the E. colimetabolic network to be in Essential category III.Of these, 145 belong to category III of the maintext.

Thus we find that the classification discussed in the maintext which uses a particular flux vector correctly predictsthe essential category I, II or III of 469 out of the 484essential reactions.

Appendix B: Relation between standard deviation σand mean flux ⟨v⟩ for category I and category IIa

reactions

In Fig. 3, we plot the standard deviation σ versus themean flux ⟨v⟩ for active reactions in a metabolic networkacross its M feasible minimal environments. Here, wederive the relation between mean flux ⟨v⟩ and standarddeviation σ for reactions in category I and category IIashown as upper and lower lines, respectively, in Fig. 3.

Page 14: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

12

1. Category I reactions

In a given organism any reaction belonging to categoryI has activity m=1, and is active for a single environment(say α0). The mean flux ⟨vj⟩ of a category I reaction jacross M feasible environments is given by:

⟨vj⟩ =1

M

M∑α=1

vαj

=vα0j

M, (B1)

where vαj is the flux of reaction j in the environment α(α = 1, 2, . . . ,M). vα0

j is the flux of reaction j in the onlyfeasible minimal environment α0 where the reaction hasnonzero value and in all other feasible minimal environ-ments the flux of reaction j is 0.Thus, the standard deviation σj for a category I reac-

tion j is given by:

σj =

√√√√ 1

M

M∑α=1

(vαj − ⟨vj⟩)2

=

√1

M[(M − 1)⟨vj⟩2 + (vα0

j − ⟨vj⟩)2]

=√M − 1⟨vj⟩, (B2)

where we have used the result in Eq. B1.

2. Category IIa reactions

The fluxes of reactions in category IIa are perfectlycorrelated with each other. This means that the fluxesof category IIa reactions are proportional to each otherhaving the same proportionality constant for all minimalenvironments. Thus, for a minimal environment α, wecan write the flux of category IIa reaction j as:

vαj = cαv0j , (B3)

where cα is a constant for the minimal environment αand v0j is some number. For any two reactions j and

k in category IIa with fluxes correlated across minimalenvironments, we have:

vαjvαk

=cαv0jcαv0k

=vα

j

vα′

k

, (B4)

where α and α′ are two different feasible minimal envi-ronments for the organism.

The mean flux of reaction j is:

⟨vj⟩ =1

M

M∑α=1

vαj

= v0j1

M

M∑α=1

= v0j ⟨c⟩, (B5)where ⟨c⟩ is the mean of cα across the set of feasibleminimal environments.

The standard deviation σj for category IIa reaction jis given by:

σj =

√√√√ 1

M

M∑α=1

(vαj − ⟨vj⟩)2

= v0j

√√√√ 1

M

M∑α=1

(cα − ⟨c⟩)2

= v0jσc

=σc⟨vj⟩⟨c⟩

= b⟨vj⟩, (B6)

where we have used the result in Eq. B5.

ACKNOWLEDGMENTS

SS and VG acknowledge support from UniversityGrants Commission (UGC), AS from Council for Scien-tific and Industrial Research (CSIR), and SJ from De-partment of Biotechnology (DBT), India.

[1] L. Hartwell, J. Hopfield, S. Leibler, and A. Murray, Na-ture 402, C47 (1999).

[2] S. Bornholdt, H. Schuster, and J. Wiley, Handbook ofgraphs and networks, Vol. 2 (Wiley Online Library, 2003).

[3] A. Barabasi and Z. Oltvai, Nature Reviews Genetics 5,101 (2004).

[4] A. Wagner, Robustness and evolvability in living systems(Princeton University Press Princeton, NJ:, 2005).

[5] K. Sneppen and G. Zocchi, Physics in molecular biology(Cambridge University Press, 2005).

[6] U. Alon, An introduction to systems biology: design

principles of biological circuits, Vol. 10 (Chapman &Hall/CRC, 2006).

[7] K. Kaneko, Life: An introduction to complex systems bi-ology, Vol. 171 (Springer Heidelberg, Germany:, 2006).

[8] R. Heinrich and S. Schuster, The regulation of cellularsystems, Vol. 416 (Chapman & Hall New York, 1996).

[9] H. Jeong, B. Tombor, R. Albert, Z. Oltvai, andA. Barabasi, Nature 407, 651 (2000).

[10] A. Wagner and D. Fell, Proceedings of the Royal Societyof London. Series B: Biological Sciences 268, 1803 (2001).

[11] H. Ma and A. Zeng, Bioinformatics 19, 1423 (2003).

Page 15: Max-Planck-Institut fur Mathematik¨ in den ...7 Jawaharlal Nehru Centre for Advanced Scienti c Research, Bangalore 560064, India and 8 Santa Fe Institute, 1399 Hyde Park Road, Santa

13

[12] M. Csete and J. Doyle, Trends in Biotechnology 22, 446(2004).

[13] B. Palsson, Systems biology: properties of reconstructednetworks (Cambridge University Press, 2006).

[14] N. Price, J. Reed, and B. Palsson, Nature Reviews Mi-crobiology 2, 886 (2004).

[15] A. Feist and B. Palsson, Nature biotechnology 26, 659(2008).

[16] M. Oberhardt, B. Palsson, and J. Papin, Molecular Sys-tems Biology 5 (2009).

[17] J. Edwards, R. Ibarra, B. Palsson, et al., Nature Biotech-nology 19, 125 (2001).

[18] R. Ibarra, J. Edwards, and B. Palsson, Nature 420, 186(2002).

[19] D. Segre, D. Vitkup, and G. Church, Proceedings of theNational Academy of Sciences 99, 15112 (2002).

[20] J. Reed, T. Vo, C. Schilling, B. Palsson, et al., GenomeBiol 4, R54 (2003).

[21] N. Duarte, M. Herrgard, and B. Palsson, Genome Re-search 14, 1298 (2004).

[22] S. Becker and B. Palsson, BMC Microbiology 5, 8 (2005).[23] A. Samal, S. Singh, V. Giri, S. Krishna, N. Raghuram,

and S. Jain, BMC bioinformatics 7, 118 (2006).[24] M. Eisen, P. Spellman, P. Brown, and D. Botstein, Pro-

ceedings of the National Academy of Sciences 95, 14863(1998).

[25] T. Pfeiffer, F. Montero, S. Schuster, et al., Bioinformatics15, 251 (1999).

[26] J. Stelling, S. Klamt, K. Bettenbrock, S. Schuster,E. Gilles, et al., Nature 420, 190 (2002).

[27] J. Papin, N. Price, and B. Palsson, Genome Research12, 1889 (2002).

[28] J. Reed and B. Palsson, Genome Research 14, 1797(2004).

[29] A. Burgard, E. Nikolaev, C. Schilling, and C. Maranas,Genome Research 14, 301 (2004).

[30] S. Lee, C. Phalakornkule, M. Domach, and I. Gross-mann, Computers & Chemical Engineering 24, 711(2000).

[31] R. Mahadevan, C. Schilling, et al., Metabolic engineering5, 264 (2003).

[32] A. Samal, Systems and synthetic biology 2, 83 (2008).[33] E. Almaas, B. Kovacs, T. Vicsek, Z. Oltvai, and

A. Barabasi, Nature 427, 839 (2004).[34] J. Ellson, E. Gansner, L. Koutsofios, S. North, and

G. Woodhull, in Graph Drawing (Springer, 2002) pp.594–597.

[35] E. Almaas, Z. Oltvai, and A. Barabasi, PLoS Computa-tional Biology 1, e68 (2005).

[36] H. Ma, X. Zhao, Y. Yuan, and A. Zeng, Bioinformatics20, 1870 (2004).

[37] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Ra-jagopalan, R. Stata, A. Tomkins, and J. Wiener, Com-puter networks 33, 309 (2000).

[38] S. Schuster and R. Schuster, Journal of MathematicalChemistry 6, 17 (1991).


Recommended