Optimization and Control for Metabolic Networks
Alexandre Jo ao Borralho Domingues(Licenciado)
Dissertacao para obter o grau de Mestre em
Engenharia Electrot ecnica e Computadores
Juri
Presidente: Professor Doutor Carlos Jorge Ferreira Silvestre
Orientador: Professor Doutor Joao Manuel Lage de Miranda Lemos
Co-orientador: Professora Doutora Susana de Almeida Mendes Vinga Martins
Vogal: Professor Doutor Antonio Pedro Rodrigues de Aguiar
Novembro de 2009
Acknowledgments
This work would not have been possible without the help of Prof. Joao Miranda Lemos, who
provided me the technical basis, pointed the right directions and was always patient with the many
problems encountered, and Prof. Susana Vinga, who always gave me all the possible support and
helpful comments. Thank you for all the help and for giving me this opportunity.
This dissertation was performed under the framework of project DynaMo (PTDC/EEA-ACR/-
69530/2006), I would like to thank all the KDBIO group. This work also had a big contribution
of Dr. Ana Rute Neves, Prof. Helena Santos and Dr. Paula Gaspar, from ITQB, who provided the
data and valuable information.
Thank you Joana for encouraging me to do this Master and supporting me in all the bad mood
days. Thank you for always being caring, it is a gift to have you in my life.
Finally, a big thank you to my parents and Ines for supporting me in every possible way.
Abstract
The quick evolving area of Systems Biology aims to provide deeper understanding of biologi-
cal systems at system level. A common application is the systematization of metabolic networks
using mathematical models. Valid models can avoid time consuming and expensive experiments
when testing and acquiring data from these networks. The increasing availability of these models
and data poses new challenges in what concerns optimization. Due to the high level of complexity
and uncertainty associated to these networks the suggested models often lack detail and liability,
required to determine the proper optimization strategies. A possible approach to overcome this
limitation is the combination of both kinetic and stoichiometric models. The work reported ad-
dresses the optimization and control of metabolic networks along such lines.
In the first part of this dissertation three control optimization methods, Direct Optimization and Bi-
level optimization using two different inner-optimization procedures, with different levels of com-
plexity and assuming various degrees of process information, are presented and their results
compared using a prototype network. The results obtained show that the bi-level optimization
provides a good approximation to networks with incomplete kinetic information.
The process of formulating Metabolic Network models and the estimation of its parameters is
complex and there is no defined framework to obtain valid solutions. On the second part of this
dissertation, a procedure to estimate parameters using data sets from different experiments is
presented. The procedure is illustrated by a case study on the effect of Nisin on Mannitol produc-
tion by Lactococcus lactis. The obtained results are encouraging, providing a consistent estimate
of the model parameters.
Keywords
Metabolic Networks, Optimization, Control, Parameter Identification, Modeling.
iii
Resumo
A area emergente de Biologia de Sistemas procura aprofundar o conhecimento de Sistemas
Biologicos ao nıvel das suas componentes estruturais. Uma das aplicacoes comuns e a sistemati-
zacao de redes metabolicas usando modelos matematicos.
A formulacao de modelos matematicos para redes metabolicas pode evitar experiencias caras e
demoradas necessarias para testar estas redes. A crescente disponibilidade destes modelos e
respectivos dados coloca novos desafios no que diz respeito a optimizacao destas redes e produ-
tos. Devido a grande complexidade e incerteza associadas a estas redes os modelos sugeridos
padecem muitas vezes de falta de detalhe e fiabilidade, indispensaveis para a definicao de es-
trategias de controlo. Uma abordagem possıvel para ultrapassar esta limitacao e a combinacao
de modelos cineticos e estoiquiometricos. O trabalho apresentado aborda a optimizacao e con-
trolo de redes metabolicas seguindo estas linhas.
Na primeira parte desta dissertacao, tres metodos de optimizacao do controlo, com diferentes
niveis de complexidade e assumindo diferentes niveis de informacao acerca da rede, sao apre-
sentados. Os seus resultados sao comparados, usando para tal uma rede prototipo.
O processo de formulacao destes modelos para redes metabolicas e a respectiva estimacao dos
seus parametros e complexa e nao existe nenhuma abordagem sistematica definida para obter
solucoes validas. Na segunda parte desta dissertacao, e apresentado um procedimento para
estimar parametros, usando conjuntos de dados de experiencias diferentes, garantindo a con-
sistencia das estimativas. Este procedimento e ilustrado pelo estudo do efeito da inducao com
Nisina na producao de Manitol na Lactococcus lactis.
Palavras Chave
Redes Metabolicas, Optimizacao, Controlo, Identificacao de parametros, Modelacao.
v
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Synthetic problem 7
2.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Metabolic network modeling tools . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Prototype network model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 The optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 The control function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Pontryagin’s Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Flux Balance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Geometric Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Direct optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Bi-Level Optimization algorithm structure . . . . . . . . . . . . . . . . . . . 15
2.3.3 Inner-optimization using Geometric Programming . . . . . . . . . . . . . . . 18
2.3.4 Inner-optimization using Linear Programming . . . . . . . . . . . . . . . . . 18
2.3.5 Pontryagin’s Maximum Principle: Computational implementation . . . . . . 19
2.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Direct optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Bi-Level Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.3 PMP: Computational implementation results . . . . . . . . . . . . . . . . . . 28
3 Model for Mannitol production with Nisin induction 33
3.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
vii
Contents
3.1.1 Mannitol model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Mannitol model with Nisin induction . . . . . . . . . . . . . . . . . . . . . . 35
3.1.3 Data sets description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.4 Parameters estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Parameter estimation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Estimation using one data set . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.2 Estimation using multiple data sets . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3 Estimation using the Nisin data sets . . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Further notes on estimation strategies . . . . . . . . . . . . . . . . . . . . . 42
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Identification of set δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Identification of set δ using Nisin data sets . . . . . . . . . . . . . . . . . . . 43
3.3.3 Identification of set σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Optimizing Mannitol production using Optimal Control 49
4.1 Control using a step function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Conclusions 55
viii
List of Figures
2.1 Prototype network: The maximization of the final value of u5 depends on the profile
of the function f(t). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Inner-Optimization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Bi-Level optimization formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Result of the simulation using Direct optimization. . . . . . . . . . . . . . . . . . . . 21
2.5 Comparison of three f(t) profiles. The solid line is the optimal treg obtained in the
Direct optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Temporal variation of metabolites x2, x4, and outputs u3 and u5 for three values of
treg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Comparison of the temporal variation of u1, u3 and u5 with a fixed f(n) . . . . . . . 26
2.8 Result of the optimization using the Inner Optimization with Geometric Program-
ming (left) and Linear Programming (right). . . . . . . . . . . . . . . . . . . . . . . 27
2.9 Control function, Hamiltonian derivative and u5 evolution on several iterations. . . . 30
3.1 Detail of a metabolic pathway of Lactococcus lactis [1] . . . . . . . . . . . . . . . . 35
3.2 Mannitol Model without Nisin induction . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Aspect of a Hill Function with n = 20 and θ = 5 . . . . . . . . . . . . . . . . . . . . 36
3.4 Mannitol Model with Nisin induction . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Data Sets for Mannitol production. Vertical dashed lines represent the time of in-
duction of Nisin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Parameter estimation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Estimation of δ using the data set without Nisin. . . . . . . . . . . . . . . . . . . . . 43
3.8 Estimation of set δ using all the data sets. Each Nisin data set is modeled with an
independent σ set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.9 Estimation of σ using the Nisin data sets and a fixed δ. . . . . . . . . . . . . . . . . 46
3.10 Plot of the obtained Hill-type functions for each Nisin data set. . . . . . . . . . . . . 48
ix
List of Figures
x
List of Tables
2.1 Parameters used in the prototype network. . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Results for the Direct Optimization using the Discrete form of the control function . 24
3.1 Estimation of the parameters of set δ using the data set without Nisin. . . . . . . . 43
3.2 Fine tuning of set δ using all data sets. . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 The three independent σ sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Common subset of σ obtained in the estimation using the Nisin data sets. . . . . . 47
3.5 Independent σ parameters obtained for each of the Nisin data sets. . . . . . . . . . 47
4.1 Three independent step function parameters, obtained on the first estimation with
all data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Three independent step function parameters, obtained on the second estimation
with the Nisin data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 The common step function parameters, obtained on the third estimation #1 with the
Nisin data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Three independent values for tnisin, obtained on the third estimation #1 with the
Nisin data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 The common step function parameters, obtained on the third estimation #2 with the
Nisin data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Three independent values for tnisin, obtained on the third estimation #2 with the
Nisin data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
xi
List of Acronyms
PMP Pontryagin’s Maximum Principle
FBA Flux Balance Analysis
dFBA Dynamic Flux Balance Analysis
GP Geometric Programming
LP Linear Programming
GMA General Mass Action
BST Biochemical Systems Theory
F6P Fructose 6-phosphate
1Introduction
Contents1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1
1. Introduction
1.1 Motivation
The emergent area of Systems Biology is gradually changing the paradigms associated to the
study of Biological Systems. Systems Biology studies the various parts of a biological system,
not as individual components, but as parts of the same system that interact to achieve a global
function or characteristic.
The general nature of Systems Biology leads to different points of view. While some describe Sys-
tems Biology as a main field of study, others consider it a paradigm. From an engineering point
of view it is common to be described as the application of dynamical systems theory to molecular
biology.
A frequent area of application of Systems Biology is the systematization of Biological Systems by
proposing mathematical models to describe the interactions of the molecular components. These
models together with the available genetic engineering tools open exciting new areas of research.
The advances in genetic engineering have made available a wide selection of tools to manipulate
organisms. Methods such as gene knock down/up [2], where the manifestation of a certain gene
can be decreased/increased, gene knock-outs [3], where the manifestation of a certain gene is
silenced, gene substitutions, among others, provide degrees of freedom when manipulating an
organism.
These tools are now common in genetic engineering and have proven to be useful in many situa-
tions. They can, for instance, be used to improve desired characteristics of certain organisms, a
common example being the manipulation of metabolic networks in order to maximize the normal
product yield or even redirect the production to a flux that was residual or non-significant in the
original network. Such an example is provided in [1], where a genetically modified strain of Lac-
tococcus lactis was able to produce Mannitol.
Even though genetic engineering tools are robust enough, the high complexity and uncertainty
associated to living organisms, and corresponding metabolic networks, makes it extremely diffi-
cult to determine what are the required manipulations and conditions needed to attain a certain
objective.
Since an heuristic approach to such problems does not allow to explore the maximum potential
of metabolic engineering, these tools are now combined with methods from classical engineering
areas, such as Electrical Engineering, and Physics, among others. Tools that have been used
for several years in technical contexts are now being applied to genetic engineering under new
paradigms and conditions, in turn rising new obstacles that need to be solved when they are ap-
plied to Metabolic Engineering.
The combination of efforts from areas so diverse as Electronics, Control, Biology, among others,
is very exciting and have already introduced a variety of very interesting results, extending from
the modification of small organisms to the manipulation of actual ecosystems [4].
2
1.2 Problem formulation
1.2 Problem formulation
The work described in this thesis addresses two different problems.
A common situation in metabolic network engineering arises when a certain organism is natu-
rally, or after genetic manipulation, capable of producing a product of interest and this product
competes with the natural objective of the cell [1] [5] [6]. Since the primary objective of most of
the living organisms is assumed to be the assurance of the continuity of the species, the natural
objective of the cell is normally assumed to be the production of biomass or the formation of a
growth precursor [7, 8].
The first part of this dissertation addresses an optimization problem related to a situation where
the trade-off between biomass formation and product production can be explored with a control
variable, e.g. pH, temperature or gene inductors.The objective function is the maximization of the
final concentration of a metabolite whose formation competes with the natural objective of the cell
(e.g. maximization of biomass).
In order to solve this optimization problem a proper model of the organism is required. Unfortu-
nately, the identification of the kinetic parameters of a metabolic network is still very difficult and
represents an area of research by itself. A possible solution for the optimization problem is the
combination of kinetic information with stoichiometric information, that depends only on the stoi-
chiometry of the reactions.
In the first part of this work, a synthetic problem is formulated. A prototype network with the
described behavior is taken as example and the corresponding optimization problem is solved
assuming three different levels of information about the network kinetics.
The second problem relates with Mannitol production in a modified strain of L.lactis. In [1] the
strain L. lactis FI10089mtlD+Pase+ was created. This strain is able to produce Mannitol, whose
formation competes with the natural pathway of the organism to produce biomass. This strain has
also a Nisin inducible gene that allows to control the over expression of two enzymes responsible
for the formation of Mannitol.
The maximization of the Mannitol production, controlled by the inductor, can be achieved in an
heuristic approach by repeating the experiment several times. Due to the high costs involved, high
complexity and long time scales associated with each practical experience this method is far from
ideal. Since the pathway that leads to the production of both Mannitol and Biomass is partially
known, a mathematical model is proposed in order to explore, in a systematic way, the trade-off
between the production of Mannitol and the formation of Biomass controlled by Nisin.
1.3 State of the art
Although Metabolic Engineering [9] has developed very powerful approaches to optimize biotech-
nological processes, the systematic use of mathematical models and optimal control methods is
3
1. Introduction
still limited and poses many open issues.
Interesting examples are provided, some at the genome level, by [6, 10–12]. In [6] the use of
a bi-level optimization method, including a linear programming problem in the inner level and a
nonlinear optimization problem in the outer level, presents the interesting feature of not requiring
full model knowledge. This optimization method was used on an in silico model of E. coli and
tested in vivo with promising results.
The work in [13] and [8] focus on techniques to determine dynamic distributions of fluxes on
metabolic network models where not all the kinetics are known.
The use of Nisin as an inductor to control a certain product yield has been tested several times.
In [14] an optimization strategy, relying on practical experiments, is formulated to maximize yield
by controlling variables such as the pH, type of neutralizing agent, fermentation temperature or
point of induction, among others.
1.4 Contributions
The major contributions of this thesis consist in the development of two case studies on
metabolic network modeling, control and optimization.
In the first section, three different methods are compared on a common basis. Two of these meth-
ods assume complete knowledge of the dynamic equations of a network model. While one relies
on an Optimal Control approach, the other makes a steady-state optimization using Geometric
Programming (GP) [15]. These methods provide a baseline performance with which the results
obtained by other methods may be compared.
As such, the third approach assumes only a partial knowledge of the network kinetic model and
relies on a bi-level optimization. Furthermore, using Pontryagin’s Maximum Principle (PMP) [16],
it is shown that, for the class of problems considered, the manipulated variable may only assume
values at the extremes of the optimization interval.
In the second section, a procedure to consistently estimate parameters using data sets from dif-
ferent experiments is presented. This is illustrated by a case study on parameter estimation in
metabolic networks using data taken in different conditions. An initial model for Mannitol produc-
tion is suggested and a sub-model is later added to account for the Nisin induction.
Although the initial model does not predict Nisin induction, the data taken using induction is also
used to identify the model’s parameters. The resulting model can later be used to optimize the
product yield without the need of complex and expensive experiences.
1.5 Document structure
The thesis is organized as follows: after the Introduction (Chapter 1) in which the problem is
introduced and motivated and the state of the art revised, the synthetic problem is formulated and
4
1.5 Document structure
three possible optimization methods are presented in Chapter 2.
Chapter 3 presents the Mannitol production problem in L. lactis and a model is suggested. In
Chapter 4 the Mannitol production problem is further detailed with the study of a possible control
strategy.
Finally, conclusions are drawn in Chapter 5.
5
1. Introduction
6
2Synthetic problem
Part of this chapter was published in:
Domingues A., J.M. Lemos and S. Vinga. (2009)
Optimization strategies for metabolic networks.
In Proc. of the European Control Conference (ECC’09).
August 23-26, Budapest, Hungary.
Contents2.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1
7
2. Synthetic problem
2.1 Problem description
In this chapter, a prototype synthetic network, where the formation of two of its metabolites
compete with each other, is taken as example. The ratio between the formation of the two metabo-
lites is controlled by a single function. After ensuring that the network has the required character-
istics, three different optimization strategies to maximize one of the products yield are explored.
Each of the optimization strategies assume a different level of information on the network.
2.1.1 Metabolic network modeling tools
Metabolic networks can be as diverse as life itself. While an heuristic approach to explore
them can give valuable information on their structure and molecular mechanisms, a structured
approach based on a mathematical description is fundamental to gain deeper insight.
Thus, the implementation of mathematical models to metabolic networks is a valuable approach.
Due to the high level of uncertainty and complexity associated to these networks there is no de-
fined framework to create these models.
A common procedure is to adopt a top-down approach, where all the possible information is gath-
ered about the biological system structure. This information is then translated into mathematical
model equations, to yield a system of non-linear differential equations.
A common methodology to establish these equations is to use the Biochemical Systems Theory
(BST) framework where each flux is approximated by a power law, that corresponds to a Taylor se-
ries expansion in logarithmic space [17]. The fluxes from BST can be expressed using S-Systems
[18–20] or General Mass Action (GMA) [21].
In order for the model to properly describe the system, the set of parameters has to be estimated.
The parameter identification procedure consists in minimizing an objective function, usually the
weighted sum of squares of the residuals between simulated (parameter dependent) and experi-
mental data points.
Another possible approach to obtain valid models is to use the stoichiometry of the reactions in-
volved in the metabolic network. This method has the advantage of being simpler to obtain, since
there are already large and reliable databases [22] with the reactions involved in several networks,
but does not account for the dynamic nature of the organisms. Thus, regulatory mechanisms are
hard to predict using stoichiometric models.
2.1.2 Prototype network model
A graphical representation of the used network is shown in Fig. 2.1.
This network is an adaptation of a previously suggested one [18]. The stoichiometric model is
8
2.1 Problem description
Figure 2.1: Prototype network: The maximization of the final value of u5 depends on the profile of thefunction f(t).
described by the following set of ordinary differential equations:
du1
dt= k − v1
dx2
dt= v1 − v2(1 − f) − v3f
du3
dt= v2(1 − f) (2.1)
dx4
dt= v3f − v4
du5
dt= v4
Here ui, i = 1, 3, 5 and xi, i = 2, 4 are metabolite concentrations at the network nodes, vi,
i = [1, . . . , 4] are fluxes associated to the metabolic network branches and k is a constant pa-
rameter that represents the yield of u1. In the equations, f represents a time dependent control
function f(t) that allows to redirect the flux between the branches x2 → u3 and x2 → x4. A de-
tailed description of this function is made in Section 2.2.1.
Fig. 2.1 shows a positive feedback from u3 to the flux v3. Stoichiometric models do not pre-
dict feedbacks, as the stoichiometry of the reactions remain unchanged. The solution to model
this feedback will be presented in Section 2.3.4, where the implementation of Flux Balance
Analysis (FBA) is presented.
In the framework of S-systems [18] the system is described by:
du1
dt= k − β1u
h11
1
dx2
dt= α2u
g21
1 − β2uh23
3 xh22
2
du3
dt= α3(1 − f)xg32
2 (2.2)
dx4
dt= α4fu
g43
3 xg42
2 − β4xh44
4
du5
dt= α5x
g54
4
In this framework the kinetic orders are denoted gij if they refer to fluxes that enter a node or
metabolite (V +i ), and by hij if they refer to a fluxes that leave the node or metabolite (V −
i ). Finally,
αi and βi are constant parameters.
Their values were adapted from the initial model [18] and adjusted to obtain the desired response.
9
2. Synthetic problem
Table 2.1: Parameters used in the prototype network.
Param. Value Param. Value
α2 8 h11 0.5α3 4.0556 h22 1.4224α4 1.8397 h23 0.6109α5 4.0556 h44 0.5829β1 1 g21 0.5β2 5.1179 g32 0.4171β4 4.0556 g42 2.8274k 0.8 g43 1.4646
g54 0.5
Table 2.1 shows the list of parameters. To distinguish between metabolites concentrations and
inputs/outputs different letters were used. Thus, x# represents the concentration of a metabolite
and u# an input/output.
The degradation of metabolite x2 depends on flux v3 and v2. In (2.2) the two fluxes were
lumped and are expressed as β2uh23
3 xh22
2 . If mass conservation was imposed, the two lumped
fluxes from dx2
dtshould equal the sum of v2 and v3 from the equations for du3
dtand dx4
dtrespectively.
Thus, β2uh23
3 xh22
2 = α3(1 − f)xg32
2 + α4fug43
3 xg42
2 .
Although mass conservation principle is a fundamental principle on a biochemical system, it was
not forced in this model for the sake of simplicity.
Assuming that u3 represents a precursor of the cellular objective (such as growth) and u5 the
desired product, if f(t) is biased towards the branch of v2 this yields the formation of u3 but little or
no production of u5. If f(t) is biased towards the branch of v3 the production of u5 will be affected
by the low concentration of u3 (since there is a forward feedback).
Thus, there is an optimal profile for f(t) to maximize the concentration of u5 at the pre-defined
final time tfinal.
2.1.3 The optimization problem
The optimization problem consists in selecting f(t) for t ∈ [0, tfinal] such that the cost
function:
J(f) = u5(tfinal) (2.3)
is maximized under the constraint that f ∈ [0, 1], ∇ t ≥ 0.
This translates in the maximization of the desired product yield at the end of the experiment.
2.2 Optimization Methods
The solution of the optimization problem formulated in Section 2.1.3 is now considered ac-
cording to three different approaches.
10
2.2 Optimization Methods
Before presenting the different optimization strategies, the control function is described in detail
and PMP is invoked to show that the optimal control function has a particular form. A short intro-
duction to FBA and GP is also made for a better understanding of the optimization algorithms.
All the software was implemented on MATLAB, using standard functions and functions from the
freely available Systems Biology Toolbox [23], Linear Programming (LP) problems were solved
using the function linprog and non-linear problems using the function fmincon. For GP problems,
functions from the GGPLAB [15] package were used. The simulations were run on a laptop with
a 1.6gHz processor and 512mB of Ram.
2.2.1 The control function
The function f is the control function that is selected in order to maximize the product yield.
Two forms for this function are used.
The first form, discrete form, is represented as f(n) and divides the time interval in N seg-
ments. At each segment the function can assume any value inside the admissible upper and
lower bounds, defined to be 1 and 0 respectively.
The discrete form is described as:
f(n) = fn for n =tfinal
Ni, i = [1 . . . N ], fn ∈ [0, 1] (2.4)
The second form, step form, is represented as f(t). Starts at its minimum admissible value and
then switches to its maximum value at a certain time instant, which will be called time of regulation
(treg) throughout the rest of this thesis. Thus, the step form is described as:
f(t) =
{0 if t ≤ treg
1 if t > treg
, t ∈ [0, tfinal] (2.5)
2.2.2 Pontryagin’s Maximum Principle
The general tool to solve dynamic optimization problems such as the one considered here is
Pontryagin’s Maximum Principle PMP [16].
Let x be the state of a dynamical system with control inputs u such that:
x = F (x, u), x(0) = x0, u(t) ∈ U, t ∈ [0, T ] (2.6)
where U is the set of valid control inputs and T is the final time, assumed here to be constant.
The control function u must be chosen in order to maximize the functional J, defined by:
J(u) = ψ(xi(T )) +
T∫
0
L(x(t), u(t))dt (2.7)
where ψ is the cost associated with the terminal condition of the system and L the Lagrangian.
For that sake define the adjoint equations, with final conditions,
λ = −LTx − fT
x λ, λ(T ) = ψx(x = x(T )) (2.8)
11
2. Synthetic problem
and the Hamiltonian,
H(λ(t), x(t), u(t), t) = λTF (x(t), u(t)) + L(x(t), u(t)) (2.9)
where the co-state λ satisfies the adjoint equation (2.8) with suitable final time conditions [16] and
x verifies (2.6) with u being the optimal control.
According to PMP, a necessary condition for the optimal control is that, along the optimal solution
for the state x, co-state λ and control u the Hamiltonian H is maximum with respect to u.
Comparing the cost (2.3) with the generalized case (2.7) and taking into consideration that,
in the case at hand, given by (2.1), the dynamics vector field depends linearly on the control, it
follows that
H(λ, x, u) = λTφ(x)u (2.10)
where φ(x) is a function that does not depend on u. Since, according to (2.10), the Hamiltonian
is linear in u, its maximum is obtained at the boundary of the admissible control set U .
Hence, this shows that, for the metabolic network (2.1), the control that optimizes (2.3) only
assumes the values f = 0 or f = 1.
2.2.3 Flux Balance Analysis
The difficulties associated with the creation of dynamic models, based on network kinetics,
promote the use of stoichiometric models and simpler methods for analysis of metabolic capabili-
ties of cellular systems. FBA has proven useful in the study of metabolic systems [7, 13, 24] and
is part of the optimization process of the current study.
Stoichiometric models describe the organisms through a set of chemical reactions (metabolism),
the rates of each of this reaction being called a flux. Assuming, as explained before, that the
main objective of a given organism is to grow, the problem that flux balance analysis addresses
is, given a set of reactions, to find what is the combination of fluxes that maximizes the growth
rate.
The first step on FBA is the reconstruction of the metabolic network, such as in Fig. 2.1. Mass
balance equations are written for every metabolite as in (2.1), and known constraints (such as
lower and upper bounds for fluxes) are included (2.11).
α ≤ vi ≤ β (2.11)
The system can be written, in a generic form as (2.12).
dX
dt= S.v (2.12)
Here X is a vector with the concentration of each metabolite, S is a matrix describing the stoi-
chiometry of the catabolic reactions, v a vector of the n metabolic reactions rates (fluxes) and α
12
2.2 Optimization Methods
and β are the lower and upper constraints for each flux.
In the case at hand (2.1), considering only the boxed metabolites from Fig. 2.1, (2.12) becomes:
[dx2
dtdx4
dt
]=
[1 −(1 − f) −f 00 0 f −1
]
v1v2v3v4
(2.13)
If it is assumed that the system has achieved steady-state, thus removing the ability to describe
transient states or regulatory mechanisms, (2.12) becomes,
S.v = 0
which is typically an undetermined equation since there are more fluxes than metabolites.
While there are a multitude of solutions for this problem, since the fluxes can be organized in
several ways, only one or a small set of solutions will maximize the growth rate.
A valid solution is found by solving a LP problem with a proper objective function. In optimal
environmental conditions, with enough substrate, it is valid to assume that the cellular objective is
the maximization of biomass [24]. Thus the objective function of the LP can be a flux or a function
of fluxes known to be related to growth precursors.
In the case of (2.13) there are four undetermined fluxes and two equations, thus, a valid flux
distribution is obtained maximizing flux v2, and subsequently maximizing the biomass precursor
u3.
The FBA framework has been extended [8, 13] to incorporate the dynamics of the network.
Dynamic Flux Balance Analysis (dFBA) can predict the reprogramming of a metabolic network
and model the dynamics of certain metabolites over time. This is done by solving the steady-state
problem at several time instants and integrating the known fluxes during each time interval. FBA
and the principle of dFBA are used as part of the optimization procedures described below.
2.2.4 Geometric Programming
Geometric Programming GP is a powerful mathematical optimization tool that can be used in
problems where the objective and constraint functions have a special form [15]. GP is of particular
interest because it can solve large scale problems with extreme efficiency and reliability [25]. It
has been shown [26] that a problem formulated in S-Systems form can be solved with GP after a
minimum adaptation.
Let x = (x1, . . . , xn) be a vector of n real positive variables x1, . . . , xn. A function f(x) with the
form
f(x) = cxa1
1 xa2
2 · · ·xan
n
where c > 0 and ai ∈ R, is called a monomial function [26]. A sum of one or more monomials
is called a posynomial function [26] and any monomial is also a posynomial. The standard GP
13
2. Synthetic problem
problem is formulated as:
minimize f0(x)
subject to fi(x) ≤ 1; i = 1, . . . ,m,
gi(x) = 1; i = 1, . . . , p, (2.14)
where fi and f0 are posynomial functions, gi are monomials, and xi are the variables to be
optimized. Given that monomials are closed under multiplication and division (if f and g are both
monomials then so are f × g and f ÷ g) [26].
Transforming an S-Systems model (in steady state) to be used in a GP problem constraints is
straightforward. For that sake, start from the standard form of S-Systems:
dxi
dt= αiΠx
gij
j − βiΠxhij
j (2.15)
Assuming steady-State::
0 = αiΠxgij
j − βiΠxhij
j (2.16)
This expression is re-arranged (2.18) to yield the form of a GP problem constraint (2.14):
αiΠxgij
j = βiΠxhij
j (2.17)
αiΠxgij
j
βiΠxhij
j
= 1 (2.18)
GP is used below as part of one of the optimization procedures considered.
2.3 Optimization Strategies
The control function, described in Section 2.2.1, is now optimized in order to obtain a maxi-
mum yield of u5, at the end of the run-time (tfinal), in the Prototype model (Section 2.1.2). Three
different methods, with different levels of information about the network, are presented to attain
this goal.
The first method, direct optimization, is used as a benchmark to compare the results of the next
methods.
The last two methods rely on a Bi-level optimization and illustrate a possible solution to the op-
timization problem when the information about the network is incomplete. The three methods
are tested with both forms of the control function, step and discrete, and their results compared.
Finally, a numerical analysis solution for PMP and the respective computational implementation
are presented.
2.3.1 Direct optimization
The first method, Direct Optimization, is used mainly as a benchmark, to compare the results
of the following methods. Since it is assumed that all the information about the network kinetics
14
2.3 Optimization Strategies
is known, the system of differential equations, described in (2.2) is used. The initial conditions for
every integration was set to [u1, x2 u3 x4, u5] = [0.8 0 1 0 0]. The optimization was made for both
forms of the control function.
On the first optimization the step form (2.5) of the control function f(t) was used. The step form
imposes that the branch v2 is active in the beginning (f(t) = 0), building up biomass, switching,
at treg, to branch v3 (f(t) = 1) and activating the production of u5.
Given a function that receives treg as input and outputs the final yield of u5, this optimization tests
all the possible values of treg and returns the function:
J(treg) = u5(tfinal)
The value of treg that results on a maximum product yield is thus determined.
This optimization can be done manually, by testing the several possible values of treg and plotting
the results or by passing the function as an argument to an optimization function in MATLAB,
such as fmincon. The run time for the optimization is dependent on the constraints applied to treg.
Assuming that treg is forced to be an integer and treg ∈ [0, 30] the optimization takes less than a
minute to finish.
In order to show that the optimal transition on the step form of the control function is f(t) =
0 → f(t) = 1 a simulation was run with the inverse profile (2.19).
f(t) =
{1 if t ≤ treg
0 if t > treg
, tǫ[0, tfinal] (2.19)
On the second optimization the discrete form (2.4) of the control function was used. An opti-
mization algorithm was run to determine the optimal value for each interval of f(n).
Increasing the number of intervals results in an increased time resolution for f(n) but also in-
creases the computation time. In this optimization, f(n) can assume any real value between 0
and 1 for every time interval. These extra degrees of freedom highly increase the computational
time to obtain a valid result.
The algorithm was tested with several initial conditions to f(n), the initial conditions have shown
to have a major influence both on the computational time and on the convergence of the algorithm
to the optimal results.
A manual implementation of this optimization is not viable. The optimization was tested with two
MATLAB functions. fmincon, from the standard optimization toolbox, that finds the minimum of a
constrained nonlinear multi variable function, and simannealingSB from Systems Biology Toolbox
[23] that performs simulated annealing optimization.
2.3.2 Bi-Level Optimization algorithm structure
The Bi-Level optimization [5, 6] was structured as a general algorithm to accommodate missing
information on the kinetics of networks. In order to apply the algorithm to the prototype network,
15
2. Synthetic problem
it is assumed that the two metabolites and the four fluxes inside the box in Fig. 2.1 are a part of
the network that might not be fully described in terms of its kinetics.
Given a certain control function f(t), in order to obtain the final yield of u5 it is necessary to
have an estimation of the temporal variation of the metabolite concentration or flux distribution.
In the Bi-level optimization algorithm, this problem is solved by an inner optimization process that
allows us to obtain the product yield, u5(tfinal), given a certain f(t), taking into account a valid
approximation of the network dynamics over the simulation time. The Bi-Level Optimization is
tested with two different levels of information on the network, which affect the inner-optimization
type.
In section 2.3.3 it is assumed that the kinetic parameters are known but the system is simulated
in steady-state. While this situation is not likely to happen in a real life problem, it is useful to
test the algorithm and to serve as a guideline in real problems. Assuming that the system is in
steady-state, the boxed metabolites concentrations are calculated at each time instant by solving
a GP problem.
Section 2.3.4 presents a real life valid situation, where no kinetics information is available for the
boxed metabolites/fluxes. The missing kinetic information is replaced with stoichiometric data and
FBA is used to obtain a valid flux distribution at each time instant.
The first step of the inner optimization process is to define the initial conditions of the input u1
and outputs u3, u5, Since there is a constant yield of substrate, u1(0) was set to zero, u3(0) was
set to 1, so there is an initial amount of biomass. Finally, u5(0) was also set to zero.
Given the initial conditions for the input and outputs, and depending on the method, a valid distri-
bution for the fluxes (v1, v2, v3, v4) or a valid concentration of the metabolites (x2, x4) is obtained
by solving an LP or a GP respectively, with a proper objective function.
After obtaining the flux distribution/metabolite concentrations, new values for the input/outputs can
be calculated by integrating their expressions in the considered time interval.
During this time interval the function f(t) and the values of the fluxes/metabolites are kept con-
stant. This process is repeated from t = 0 to t = tfinal. The time interval for the integration was
defined to be 1 second. The inner optimization process is shown in Fig. 2.2.
The inner-optimization is subject to a non-linear outer-optimization which will optimize the
control function f(t) in order to obtain a maximum yield of u5. Depending on the optimization
function used, the outer optimization runs the inner optimization every time it needs to evaluate
the value of the final yield of u5. In this way, the bi-level optimization maximizes the final yield of
u5 while guaranteeing a valid temporal flux/metabolite concentration distribution.
The bi-level optimization algorithm is schematically represented in Fig 2.3
16
2.3 Optimization Strategies
Figure 2.2: Inner-Optimization algorithm
Figure 2.3: Bi-Level optimization formulation
17
2. Synthetic problem
2.3.3 Inner-optimization using Geometric Programming
On the first implementation of the Bi-Level optimization algorithm the dynamics of the boxed
metabolites from Fig. 2.1 are used but, following the algorithm structure, steady-state is assumed.
Thus, x2 and x4 from (2.2) become:
dx2
dt= α2u
g21
1 − β2uh23
3 xh22
2 = 0 (2.20)
dx4
dt= α4u
g43
3 xg42
2 (u) − β4xh44
4 = 0 (2.21)
The equations are then manipulated to have a valid form for a GP problem constraint:
α2ug21
1
β2uh23
3 xh22
2
= 1 (2.22)
α4ug43
3 xg42
2 (u)
β4xh44
4
= 1 (2.23)
In this implementation of the algorithm, the inner optimization problem determines the profile of
the metabolites, instead of fluxes, due to the nature of the equations.
The metabolite concentrations are calculated at the beginning of each time interval by solving the
GP problem with the objective function being flux v2, given by α3xg32
2 . The obtained concentrations
are then used with (2.2) to integrate the values of u1, u3 and u5 at the beginning of each interval.
2.3.4 Inner-optimization using Linear Programming
On the second implementation it is assumed that only stoichiometric information is available
for the reactions inside the box of Fig. 2.1. Assuming steady state, the equations of x2 and x4
become:
dx2
dt= v1 − v2(1 − f) − v3f = 0
dx4
dt= v3(f) − v4 = 0
Fig. 2.1 shows a forward feedback from u3 (Biomass) to flux v3, since stoichiometric models
do not account for feedbacks, the effect of u3 can not be integrated directly in the equations.
Assuming that the forward feedback leads to an over expression of flux v3, then a valid solution
is to model the forward feedback as a variation of the constraints applied to flux v3. Thus, the
constraints applied to fluxes [v1 v2 v3 v4] to solve the FBA problem are:
Lower Bounds = [0 0 0 0]
Upper bounds = [100 1.85 (1.5u3) 35]
The initial guess for the upper bounds were taken from the maximum fluxes obtained in the direct
optimization, and adapted to yield the expected behavior.
Setting flux v2 (precursor of Biomass formation) as the objective function, the FBA problem is
solved with the previous equations and constraints to obtain a valid and unique flux distribution at
18
2.3 Optimization Strategies
each time step. In the context of the inner-optimization, these fluxes are then used to calculate the
values of the input/outputs. Due to the simple nature of the prototype network, the concentrations
of u1, u3 and u5 can be calculated directly by replacing the obtained fluxes in (2.1), and therefore,
the equations in (2.2) are not used in this case.
In a more complex case, a relation between the dynamics of the input/outputs and the fluxes
distributions would be needed. For instance, in E. coli a valid relation between the product con-
centration variation (metabolite) and the growth rate (flux) is dProductdt
= (GrowthRate)×Biomass
[5], the same relation applies to the Biomass variation.
2.3.5 Pontryagin’s Maximum Principle: Computational imple mentation
As seen in Section 2.2.2, the optimal control function for the type of optimization problem
considered will only assume values in the borders of the allowed range.
In this section, PMP is applied to the system considered and the computational implementation is
described.
In the case at hand, we are interested in maximizing the final value of the state u5. Since the
Lagrangian (L) is zero, (2.7) becomes J(u) = ψ(xi(T )). Thus, the functional J to be maximized
is:
ψ(x(T )) = u5(Tfinal) (2.24)
as shown before in 2.3.
Taking into account that, L = 0 the adjoint equations are reduced to
λ = −fTx λ (2.25)
The network is described by the system of ordinary differential equations in (2.2), if we consider
the state model in the form of f(x, u), fx(x, u), where u is the control function, becomes:
fx(x, u) =
∂f1
∂x1
∂f1
∂x2
∂f1
∂x3
∂f1
∂x4
∂f1
∂x5
∂f2
∂x1
∂f2
∂x2
∂f2
∂x3
∂f2
∂x4
∂f2
∂x5
∂f3
∂x1
∂f3
∂x2
∂f3
∂x3
∂f3
∂x4
∂f3
∂x5
∂f4
∂x1
∂f4
∂x2
∂f4
∂x3
∂f4
∂x4
∂f4
∂x5
∂f5
∂x1
∂f5
∂x2
∂f5
∂x3
∂f5
∂x4
∂f5
∂x5
(2.26)
=
−β1h11xh11−11 0 0 0 0
α2g21xg21−11 −β2x
h23
3 h22xh22−12 −β2x
h22
2 h23xh23−13 0 0
0(α3g32x
g32−12
)(1 − u) 0 0 0
0 α4xg43
3 ug42xg42−12 α4ux
g42
2 g43xg43−13 −β4h44x
h44−14 0
0 0 0 α5g54xg54−14 0
(2.27)
and
19
2. Synthetic problem
fTx (x, u) =
−β1h11xh11−11 α2g21x
g21−11 0 0 0
0 −β2xh23
3 h22xh22−12
(α3g32x
g32−12
)(1 − u) α4x
g43
3 ug42xg42−12 0
0 −β2xh22
2 h23xh23−13 0 α4ux
g42
2 g43xg43−13 0
0 0 0 −β4h44xh44−14 α5g54x
g54−14
0 0 0 0 0
(2.28)
Thus
λ1 = β1h11xh11−11 λ1 − α2g21x
g21−11 λ2
λ2 = β2x3h23h22x
h22−12 λ2 −
(α3g32x
g32−12
)(1 − u)λ3 − α4x
g43
3 ug42xg42−12 λ4
λ3 = β2xh22
2 h23xh23−13 λ2 − α4ux
g242g43x
g43−13 λ4
λ4 = β4h44xh44−14 λ4 − α5g54x
g54−14 λ5
λ5 = 0
(2.29)
The terminal conditions for the co-states λ are
λn(T ) =∂ψ
∂x
∣∣x=x(T ) = [0 0 0 0 1] (2.30)
Since L = 0 the Hamiltonian (2.9) is given by λT f(x).
Substituting in the expression and after some manipulation, becomes:
H(λ(t), x(t), u(t), t) = (2.31)
λ1
(k − β1 − xh11
1
)+ λ2
(α2x
g21
1 − β2xh23
3 xh222
)+ λ5 (α5x
g54
4 ) +
λ3α3xg32
2 − α4β4xh44
4 +
(λ4α4xg43
3 xg42
2 − λ3α3xg32
2 )u
that depends linearly on the control function u, as expected.
The derivative of the Hamiltonian in order to the control function is:
Hu = λT fu
= −λ3α3xg32
2 + λ4α4xg43
3 xg42
2(2.32)
The algorithm to compute the optimal control function is:
1. Given the initial conditions for x1...5 and an initial estimate for the function f(n) (2.5), inte-
grate the state equations (2.2) from t = 0 till t = tfinal, where tfinal in this experimental
procedure is typically 30.
2. Integrate the system of adjoint equations (2.29) from t = tfinal until t = 0, with final condi-
tions from (2.30).
3. Update the control function, by calculating δu(t) = KHu(t), where K is a small value (typ-
ically 0.1 . . . 0.001) and adding this variation to the previous control function, u(t)new =
u(t)old + δu(t)
20
2.4 Results and Discussion
4. If the stop condition is not reached (number of iterations or minimum δu(t)) go back to the
first step
Starting with a rough estimate of the control function, the algorithm, estimates, at each itera-
tion, a variation that will approximate the control function of the optimal control. This variation is
added to the previous control function and the operation is repeated. The iterative process stops
when a certain stop condition is reached.
2.4 Results and Discussion
In this section, the results of the various optimizations are presented as well as some con-
siderations about the adjustments needed to obtain valid results when necessary. Since we are
dealing with a prototype network the scales do not have any physical meaning. Thus, the units in
the time scales were purposely omitted and the values obtained for product yields are absolute
values or normalized values.
2.4.1 Direct optimization
Direct optimization used model (2.2) with the set of parameters from Table 2.1 and was first
tested for the step form (2.5) of the control function.
On a first approach, all possible integer values of treg were tested in the interval treg = [1, 30].
Thus, 30 possible values for treg were tested. The optimization took around 3min to run, Fig. 2.4
plots the resulting function J(treg) = u5(tfinal).
It is clear from the figure that there is an optimal time of regulation to maximize the yield of u5.
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Time of Regulation (Treg)
Fin
al P
rodu
ct C
once
ntra
tion
(u5(
final
)
Direct Optimization
Figure 2.4: Result of the simulation using Direct optimization.
The optimal time of regulation is treg = 9 with a final yield of u5 = 293.
The existence of a maximum may be interpreted as follows:
21
2. Synthetic problem
If f(t) switches from 0 to 1 before treg is reached, the formed biomass will not be enough to
maximize u5(tfinal). On the other hand, if f(t) switches from 0 to 1 after treg, there will be enough
biomass but the time will not be enough to produce the maximum possible amount of u5.
In order to increase the time resolution, the time variation for treg between adjacent intervals
was decreased. An optimization was run for treg − treg+1 = 0.5, treg − treg+1 = 0.25 and treg −
treg+1 = 0.125 where 60, 120 and 240 possible values for treg were tested, respectively.
The results were similar to Fig. 2.4, with an optimal time of regulation of treg = 9 and the same
maximum yield of u5.
A second optimization was performed with the profile for f(t) shown in (2.19). This profile
forces branch x2 → x4 to be active in the beginning, switching then to branch x2 → u3, as ex-
pected, the obtained u5 yield was always low and no optimal treg was observed.
To better illustrate the behavior of the prototype network, simulations were made for treg = 4,
treg = 9 and treg = 14. The obtained optimal treg = 9 is compared with lower and upper values in
order to show the different temporal evolution of the metabolites.
Fig. 2.5 plots J(treg) = u5(tfinal) for treg = 4, treg = 9 and treg = 14. As expected, the function
f(t) with treg = 9 has the higher product yield.
Figure 2.5: Comparison of three f(t) profiles. The solid line is the optimal treg obtained in the Direct opti-mization.
Fig. 2.6 plots the temporal variation of metabolites x2, x4, and outputs u3 and u5 for the three
values of treg. It can be seen that, for treg = 14 the final concentration of Biomass (u3) is high but
there is not enough time for the production of x4 and, subsequently, u5. On the other hand, for
22
2.4 Results and Discussion
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8X
2
Time0 5 10 15 20 25 30
0
10
20
30
40
U3 −
> B
iom
ass
Time
0 5 10 15 20 25 30−5
0
5
10
15
20
X4
Time0 5 10 15 20 25 30
−100
0
100
200
300
Time
U5 −
> F
inal
Pro
duct
Yie
ld
Treg = 4Treg = 9Treg = 14
Figure 2.6: Temporal variation of metabolites x2, x4, and outputs u3 and u5 for three values of treg
treg = 4 the formation of x4 starts earlier but the lack of biomass does not allow a big production
of u5.
Direct optimization was then tested with the discrete form of the control function (2.4). Using
this form, the optimization is not as straightforward as with the step form. The run-time and result
is highly dependent on the number N of intervals used and the initial prediction for f(n).
For a higher number of intervals it was frequent for the optimization function to freeze, which can
be related with the heavy computational load.
There were also several situations where the return values were far from the optimal, that probably
correspond to local minimums. The option to output the temporary results of the function was
set to ON, when available. Another preventive measure was the use of variables to store the
temporary results of the functions, to restore in case of unexpected interruption of the function.
Table 2.2 resumes the results obtained for several values of N , for three different initial condi-
tions.
In general the obtained results were inside the expected values. For t << treg all the simula-
tions converged to 0, with t >> treg all simulations converged to 1. This is in concordance with
PMP and also with the assumption that the optimal switching is 0 → 1.
The switching time for all simulations was always centered around t = 9. The critical time points
are the ones next to treg and the result of the optimization is highly dependent on the initial condi-
tions and number of intervals.
The dependency on the initial conditions is related to the algorithms used by the optimization
23
2. Synthetic problem
Table 2.2: Results for the Direct Optimization using the Discrete form of the control function
Segments uinitial uoptimal u5(tfinal)
2 [0 0] [0.0765 1.0] 2552 [0.5 0.5] [0.1601 1.0] 2592 [1 1] [0.1254 1.0] 2583 [0 0 0] [0 0.9569 1.0] 2863 [0.5 0.5 0.5] [0 1 1] 2873 [1 1 1] [0 1 1] 2874 [0 0 0 0] [0 0.413 1.0 1.0] 2854 [0.5 0.5 0.5 0.5] [0 0.4925 1.0 1.0] 2864 [1 1 1 1] [0 0.452 1.0 1.0] 2865 [0 0 0 0 0] [0 0.413 1.0 1.0] 2855 [0.5 0.5 0.5 0.5 0.5] [0 0.1220 1.0 1.0 1.0] 292.95 [1 1 1 1 1] [0 0.1742 1.0 1.0 1.0] 2936 [0 0 0 0 0 0] [0 0 1 1 1 1] 2956 [0.5 0.5 0.5 0.5 0.5 0.5] [0 0 1 1 1 1] 2956 [1 1 1 1 1 1] [0 0 1 1 1 1] 29515 [0n=1...15] [0n=1...4 0.13 0 1n=7,...,15] 294.815 [0.5n=1...15] [0n=1...4, 0.5, 0.5, 1n=7,...,15] 29315 [1n=1...15] [0n=1...4 0.8 0.15 1n=7,...,15] 293
functions. It is easy to understand that initial conditions close to the optimal result are less prone
to lead the algorithm to local minimums and a valid result is obtained in less time.
The dependency on the number of intervals can also be related to the optimization algorithm,
since the number of degrees of freedom increases, but it is also highly connected with the tempo-
ral resolution. Since we are running the system from 0 to 30 each time step of the control function
will correspond to a time interval of 30N
. Thus when integrating the equations, the control function
will be constant on every 30N
time steps.
As we saw in the results for the step form f(t) the optimal treg is around 9, so if N is such that30N
∗ i ≈ 9, where i is an integer, then it is more likely that the transition will be f(n) = 0 →
f(n+ 1) = 1.
While there is some variation on the results, they all fall within one of three cases:
• The optimization resulted in an optimal function where the transition was f(n) = 0 → f(n+
1) = 1 specially when the number of intervals was low (< 15) and 30N
∗ i ≈ 9, such as in f(n)
with 3, 6 or 16 intervals. While this is the best possible scenario, with an immediate switch
from 0 to 1, this result only appeared for a relatively small group of N values.
• In some cases, f(n) assumes a value different than 0 or 1 during one or more time samples
near t = treg, this happens mostly for higher number of intervals. These cases are due to
convergence problems on the optimization algorithm, thus, forcing those samples to be 1 or
0 will result in a higher value for u5(tfinal). Such an example can be seen optimizing f(n)
24
2.4 Results and Discussion
with N = 15 intervals. The output of the optimization is:
f(n) = [0n=1...4, 0.5, 0.5, 1n=7,...,15]
Forcing the function f(n) to
f(n) = [0i=1,...,5, 1i=6,...,15]
results on a slightly higher yield of u5.
• Finally, in some cases, f(n) assumes values different than 0 or 1 near treg and forcing those
values to 0 or 1 will not increase the final yield. It is important to note that, in these situations,
the difference between u5(tfinal) with the f(n) calculated by the algorithm and the f(n) with
forced 0’s and 1’s is relatively small. For example, for N = 30,
f(n) = [0n=1...7 0.1 0.5 0.5 0.5 0.8 1n=13...30]
with a final product yield of 294.8774, forcing f(n) function to
f(n) = [0n=1...9 1n=10...30]
will result in a yield of 293.7503 which is approximately only 0.4% smaller than the previous.
This means that both solutions are in the optimal region of f(n) and algebraic problems on
the algorithm might be responsible for this problem.
The computational time varied with both initial conditions and number of intervals. For in-
stance, with N = 2 the computational time was around 90 seconds, with N = 6 the computational
time was 498 seconds for f(n)initial = 0 and 1168 seconds for f(n)initial = 1.
2.4.2 Bi-Level Optimization
Before testing the Bi-Level optimization it is important to guarantee that the Inner-Optimization,
described in Section 2.3.2, is able to give a valid estimation of the temporal variation of the input
and outputs of the network.
The inner-optimization was tested with a fixed discrete form f(n) function for the two cases, GP
and LP. The used f(n) was:
f(n) = [0n=1...15 1n=16...30]
The results were compared with the integration of the complete model with the same control func-
tion. Figure 2.7 shows the results obtained, while the values of u1 and u3 are absolute values, u5
was normalized. It can be seen from the figure that the temporal variations in the three cases are
very similar. The substrate u1 variation, with LP is the only that does not have the same profile.
In the case of u5, the normalized variation for the two inner-optimizations overlap, hence only one
line is seen in the plot.
25
2. Synthetic problem
0 5 10 15 20 25 300.6
0.65
0.7
0.75
0.8
Time
u 1 −>
Sub
stra
te
0 5 10 15 20 25 300
20
40
Time
u 3 −>
Bio
mas
s
0 5 10 15 20 25 30
0
0.5
1
Time
u 5 −>
Pro
duct
yie
ld
Complete modelInner−Opt w/ GPInner−Opt w/ LP
Figure 2.7: Comparison of the temporal variation of u1, u3 and u5 with a fixed f(n)
From this preliminary results, the inner optimization seems to be a valid option to obtain an ap-
proximation of the temporal variation of the inputs/outputs.
The Bi-Level optimization was then tested with the two forms of the control function. Once
again, for the step form of f(t) the optimization consisted in testing all the possible values of
treg and plotting the function J(treg) = u5(tfinal). In comparison with the direct optimization,
described in the previous section, this optimization does not use the whole model but uses the
inner-optimizations instead.
Fig. 2.8 plots the normalized curves for J(treg) = u5(tfinal) for the two optimizations, GP and
LP. Comparing Fig. 2.8 with Fig. 2.4 it can be seen that the profiles remain similar. The final
product yield, u5(tfinal), increases with treg until the optimal value is reached, then it starts de-
creasing.
The optimal time of regulation obtained with both GP and LP on the inner optimization was
treg = 9. The profile of J(treg) = u5(tfinal) with LP on the inner-optimization is not as smooth as
using GP or the whole set of equations.
The Bi-Level optimization was finally tested with the discrete form of the control function.
The obtained results were very similar to the case of the Direct Optimization. All the obtained
f(n) functions converged to f(n) = 0 when n << treg and f(t) = 1 when n >> treg, once again
the critical time point was at n = treg and the same three cases described in the previous section
were observed:
26
2.4 Results and Discussion
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time of Regulation (Treg)F
inal
Pro
duct
Con
cent
ratio
n (u
5(fin
al)
Inner−Optimization with LP
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time of Regulation (Treg)
Fin
al P
rodu
ct C
once
ntra
tion
(u5(f
inal
)
Inner−Optimization with GP
Figure 2.8: Result of the optimization using the Inner Optimization with Geometric Programming (left) andLinear Programming (right).
• The optimal case, where the switching is f(n) = 0 → f(n+ 1) = 1 was more frequent than
with the Direct Optimization, specially when using LP on the Inner-Optimization. The fact
that the inner-optimization results in less temporal data variation might be responsible for
this optimal switches.
• The case when the optimization returns wrong values, different than 0 or 1 near t = treg,
is very frequent in the Bi-Level optimization. In fact, it is quite frequent for the optimization
function to return values far from optimal, even for initial conditions that would return optimal
values on the Direct Optimization.
• The last case, when the optimal solution contains values different than 0 or 1 near t = treg
is less frequent and is probably explained by the same reason given in the first case.
In terms of coherency, the three methods (Direct Optimization and Bi-Level Optimization with
the two different Inner-Optimizations) exhibit high consistency in terms of resulting optimal values.
For a given N , the optimal solution for the Bi-Level Optimization with GP in the Inner Optimization
was, for all the tested values, always the same as for the Direct Optimization.
In the case of the Bi-Level Optimization with LP in the Inner Optimization there are some discrep-
ancies. Such an example is N = 10. The optimal solution for the first two cases is
f(n) = [0001111111]
which supports that the optimal solution is found for treg = 9.
In the Bi-Level Optimization with LP in the Inner Optimization the optimal solution is:
f(n) = [0000111111]
27
2. Synthetic problem
While this case is not frequent, it was found for some values of N . For N = 30 and N = 60 all
three results are coherent and for N = 120 the Inner-Optimization optimal is once again deviated
one time step.
Due to the iterative nature of the inner-optimization, the integration of the inputs and outputs
must be done manually. As explained in section 2.3.3 and section 2.3.4 the obtained flux distri-
butions or metabolite concentrations are used, in each iteration, to calculate the variation ∆u of
each input and outputs.
Assuming a control function f(n), with N segments, at each step n, with duration tn = 30N
, the
new value of the input/output u is calculated by un = un−1 + ∆utn.
In the cases where N is small tn will assume large values, for example, if N = 3, tn = 10, the
input/outputs will only be calculated 3 times, which is insufficient and leads to erroneous results.
If the manual integration is bonded to the number of intervals of f(n), large values of N must be
used and consistency among integrations is not guaranteed, since, according to Euler’s method,
the size of the integration step affects the results. Thus, the implementation of the manual inte-
gration must include a fixed time step and, at each step of the integration, f(n) is estimated by
interpolation.
The results obtained for the Bi-Level Optimization are encouraging. In the case of the Inner-
Optimization the network behavior was correctly predicted with only a fraction of the original in-
formation on the network. In the current example, some stoichiometric values were adapted to
approximate the desired behavior. Such an example was the rate of consumption of the sub-
strate, which was tweaked to approximate the dynamic case. While tweaking the parameters to
obtain the desired response might seem against the proposed objective, this is only necessary
since we are dealing with a prototype network, with no physical meaning. On a real network, real
stoichiometric parameters would be used to approximate the dynamics of the system.
2.4.3 PMP: Computational implementation results
The implementation of the numerical method described in Section 2.3.5 proved to be more
complex than expected. Each iteration of the algorithm includes the integration of two sets of
equations, being the second integration (the backward integration of the co-states) particularly
demanding in terms of computational time. The initial results were not the expected and several
tweaks had to be made to the various steps of the algorithm.
In the integration of the state variables, the initial conditions were set to [u1 x2 u3x4 u5] = [0 0 1 0 0]
and the first estimate of the control function f(n) was set to f(n) = [01... N3
1N3
+1...N ], where N is
the number of intervals of the control function.
The algorithm was also tested with other initial conditions for f(n) but the effect in the final result
28
2.4 Results and Discussion
was negligible, since the algorithm always converged to the same result.
N was initially set to low values, like 15, 30 or 60 but in the final implementation was set to 30.000,
as explained below.
In the previous sections some considerations were already made regarding the necessary
interpolations. In this case the interpolations proved to be a bottleneck on the convergence of the
algorithm. The function used to integrate both the state variables and the co-states was ode45,
from the standard MATLAB package. This function solves non-stiff differential equations with a
non-fixed time step.
When integrating the equations for λ the function needs an estimate of the value of the states
x at each time step. Since function ode45 does not use a fixed time step, the values of x are
interpolated.
When calculating the Hamiltonian, the same problem arises. For every time step, an estimation of
λ and x is needed, since they are not sampled at the same intervals, they have to be interpolated.
An initial approach calculated the value of the Hamiltonian function forN intervals. Since the initial
values for N were low, the time points at which the Hamiltonian was evaluated were not enough.
Thus, the final solution uses a high N and both the states and co-states are integrated and forced
to be evaluated for the same N time points. N was set to 30.000, the integration is made from
t = [0, 30] which means 1000 points per time point.
As explained in section (2.3.5) the integration of the co-state equations must be done back-
wards, in the interval t = [30, 0] since we only know the final value of the co-states.
Matlab’s functions support backwards integration, and function ode45 was used to do it on a first
approach. Although one of the input options of the function is to force the variable to have posi-
tive values, it is not possible to force it to be greater than zero. During the backward integration
was quite frequent for the co-states to reach zero, which lead to undetermined values and sub-
sequently bad integrations. A solution including safeguards, where a small value (1e − 10) was
added to the co-states, was tried but this lead to even longer (when feasible) integration times. To
solve this problem, a simple function, based on Euler’s method, was implemented. This function
does the backward integration and evaluates the co-states for the defined N time steps.
Finally, the update of the control function at each iteration was done by adding δu(t) = KHu(t)
to the previous estimation of the control function. Several values for K were tried, big values
would make the algorithm oscillate around the optimal value while small values would take many
iterations to converge to the optimal solution. The value considered to explore this trade-off was
K = 0.05. A possible implementation for calculating δ would be a dynamic value of K, starting
with a big value, allowing fast convergence, and decreasing it when the control function was near
the optimal solution.
Another possibility to update the control function is to calculate the Hamiltonian (instead of its
29
2. Synthetic problem
derivative) at each iteration. As illustrated in section (2.3.5) the value of the control function that
maximizes the Hamiltonian is, at each time instant, either 0 or 1. Thus, the control function can
be updated by calculating the value (0 or 1) that maximizes the Hamiltonian at each instant. This
method has shown to oscillate around the optimal solution so the final implementation uses the
derivative of the Hamiltonian.
A simulation was run setting the maximum of iterations to 30 and the run-time was 4281 sec-
onds. Figure 2.9 shows the evolution of the control function (initially set to f(n) = 0.5, n =
[0 . . . 30]), the derivative of the Hamiltonian and u5 yield for 6, non-consecutive, iterations of this
simulation.
On the first iteration, with the control function set to 0.5 on all time steps, the u5 yield is low
0 5 10 15 20 25 30
0
0.5
1
Time
Con
trol
func
tion
0 5 10 15 20 25 30−0.2
−0.1
0
0.1
0.2
Time
Ham
ilton
ian
Der
ivat
ive
0 5 10 15 20 25 300
100
200
300
Time
u 5 yie
ld
Iteration 0Iteration 5Iteration 10Iteration 20
Figure 2.9: Control function, Hamiltonian derivative and u5 evolution on several iterations.
and the Hamiltonian derivative is smaller than 0 for time values before t ≈ treg and higher than 0
for time values after t ≈ treg. This shows that the optimal control function will approximate 0 for
t < treg and 1 for t > treg. The u5 yield increases in the following iterations and the variation in
the control function is clearly noticeable until the 10th iteration.
The final result for f(n) seen after 30 iterations is still not the optimal. For t << treg, f(n) always
converges to 0 and for t >> treg, f(n) always converges to 1, but the values near treg the transi-
tion from 0 to 1 is slow.
The algorithm was run with more iterations and even with bigger values of K, but still this result
30
2.4 Results and Discussion
was always constant.
As shown mathematically in section 2.3.5, the optimal control function is either 0 or 1, but all the
simulations, for the different optimization strategies, that use the discrete form of the control func-
tion have shown the same behavior for values of t near treg.
Although the problem has not been identified, this fact shows that there must be a numerical
problem.
31
3Model for Mannitol production with
Nisin induction
Contents3.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Parameter estimation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
33
3. Model for Mannitol production with Nisin induction
3.1 Problem description
The work developed in [1] towards the improvement of Mannitol production in Lactococcus
lactis, led to the creation of several different strains.
A particular strain, FI10089mtlD+Pase+, was created with the ability of simultaneously over ex-
press two genes known to code two enzymes responsible for the pathway that leads to the Man-
nitol formation. The over expression of these genes is also controlled by an inductor, Nisin.
The over expression of the genes led to increases in the activity of the enzymes up to ten times
in one case and up to 1400 times, in the other.
In separate experiments, Nisin was added in distinct time points, resulting in different yields of
Mannitol and suggesting that the time of induction can be used as a control variable for the prod-
uct yield.
Maximization of Mannitol production can be seen as an optimization problem, where several pa-
rameters must be fine tunned, among them is the pH, the temperature and Nisin induction[12, 14].
In this chapter the available data sets are presented and two simple models for the production
of Mannitol in the genetically manipulated strain FI10089mtlD+Pase+ are suggested. Finally, a
consistent parameter identification process that uses the four different data sets simultaneously
is presented,.
3.1.1 Mannitol model
The complete metabolic pathway of Lactococcus lactis that leads to the production of Mannitol
is yet to be fully understood. Fig. 3.1 shows the identified pathways and the corresponding
metabolites and enzymes involved. The metabolic pathway is dependent on the availability of
Glucose (substrate), that will be transformed in Fructose 6-phosphate (F6P). From F6P there
are two possible paths, one that leads to the formation of Mannitol, and other that leads to the
formation of Pyruvate and subsequently the formation of Biomass.
The studies in [1] suggest that Mannitol production in L.lactis is highly dependent on the avail-
able substrate and total amount of biomass, thus, a simple model with these three variables
(Mannitol, biomass and substrate) was formulated. The choice of these three variables for the
model was made, not only by their known close relation but also because of the limitations on the
practical acquisition of data.
Once again the S-System formalism was used to describe the system:
dx1
dt= −β1x
h11
1 xh12
2 xh13
3
dx2
dt= α2x
g21
1 xg23
3 − β2xh22
2 (3.1)
dx3
dt= α3x
g31
1 xg33
3 − β3xh33
3
Here x1 represents the amount of available Glucose, x2 the amount of Mannitol and x3 the
34
3.1 Problem description
Figure 3.1: Detail of a metabolic pathway of Lactococcus lactis [1]
biomass, measured in terms of its dry weight. The formation of biomass depends on the amount
of available Glucose and on the amount of the biomass itself. The production of Mannitol depends
on the amount of biomass and on the available Glucose.
The set of parameters that are part of the S-System is going to be referred as δ throughout the
rest of this thesis.
δ = {α2, α3, g21, g23, g31, g33, β1, β2, ...
...β3, h11, h12, h13, h22, h33}
The model is schematically represented in Fig 3.2.
3.1.2 Mannitol model with Nisin induction
The addition of Nisin in the pathway shown in Fig. 3.1 leads to the over expression of genes
mtlD and M1Pase (not shown in the figure). Both these genes code enzymes that will contribute
to the production of Mannitol. Thus, the addition of Nisin increases the fluxes that lead to the
production of Mannitol and, consequently, decreases the fluxes that lead to the production of
Biomass.
In order to incorporate the induction using Nisin the previous model was modified. The time
profile of the Nisin concentration is unknown, but it is assumed that the maximum concentration
in the solution is reached shortly after the addition. It is also assumed that this concentration
remains constant throughout the experience.
Given these assumptions, a Hill-type Function was used to approximate the Nisin concentration.
35
3. Model for Mannitol production with Nisin induction
Figure 3.2: Mannitol Model without Nisin induction
A Hill function has the form (3.2), where n controls the steepness of the curve and θ is the point
where f(t) = fmax−fmin
2 .
Fig. 3.3 plots a Hill Function where n = 20 and θ = 5.
f(t) =tn
θn + tn(3.2)
0 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
Time
Figure 3.3: Aspect of a Hill Function with n = 20 and θ = 5
The model described in (3.1) was adapted to the induction using Nisin by multiplying each
metabolite with a scaled Hill Function. Thus, the model becomes:
dx1
dt= −β1x
h11
1 xh12
2 xh13
3 α1n
(1 +
(tn
θn + tn
)h1n
)
dx2
dt= α2x
g21
1 xg23
3 α2n
(1 +
(tn
θn + tn
)h2n
)− β2x
h22
2 (3.3)
dx3
dt= α3x
g31
1 xg33
3 α3n
(1 −
(tn
θn + tn
)h3n
)− β3x
h33
3
For the Biomass, the Hill-type function was added to the degrading part of the equation, this will
result in a faster glucose consumption after the addition of Nisin.
36
3.1 Problem description
For Mannitol and Biomass, the Hill-type function was added to the formation part of the equations.
Since we expect that Nisin slows down Biomass production and speeds up Mannitol production,
the terms in the equations have opposite signs.
The set of parameters of the Hill Function and scaling is going to be referred as σ throughout
the rest of this thesis.
σ = {α1n, α2n, α3n, h1n, h2n, h3n, θ, n}
It is important to point that (3.3) does not obey the S-System formalism.
Fig.3.4 schematically represents the model including Nisin induction.
Figure 3.4: Mannitol Model with Nisin induction
3.1.3 Data sets description
The available data sets have information about the temporal variation of: Glucose, Manni-
tol, Lactate, Formate, Acetate, Acetoin, 2,3-Bd, Ethanol,Optical Density (OD600) and Dry Weight
(mg/ml) [1].
The optical density is obtained by the light absorbance at 600nm and is used to measure the cel-
lular density of a colony, which is proportional to its size. In this context, it was used to control the
time of addition of Nisin.
The comparison between the available metabolites on the data sets and Fig 3.1 supports the
choice of variables for the Mannitol Model.
Four data sets were used for the parameter estimations.
The first data set describes the Glucose and Mannitol concentrations and Biomass dry weight, in
a 25 hours period, in FI10089mtlD+Pase+ with no Nisin added.
The remaining three data sets, describe the Glucose and Mannitol concentrations and Biomass
in a FI10089mtlD+Pase+ with Nisin added at OD600 = 0.1, OD600 = 0.3 and OD600 = 0.8, or, in
terms of time, Nisin added at t = 2, t = 3 and t = 5 hours.
The four data sets are plotted in Fig. 3.5.
37
3. Model for Mannitol production with Nisin induction
0 5 10 15 20 250
20
40
60
Time (hours)
Glu
cose
0 5 10 15 20 250
10
20
30
Time (hours)
Man
nito
l Pro
duct
ion
0 5 10 15 20 250
0.5
1
1.5
2
Time (hours)
Dry
Wei
ght m
g/m
l
No NisinNisin @ OD
600 = 0.1
Nisin @ OD600
= 0.3
Nisin @ OD600
= 0.8
Figure 3.5: Data Sets for Mannitol production. Vertical dashed lines represent the time of induction of Nisin.
The data sets were not acquired with a fixed sampling time, since the sampling is manual the
data was mainly acquired in periods where the dynamics of the system were relevant.
In terms of mathematical modeling and parameter estimation it would be useful to have more data
points. For example in the period after 15 hours of experience, only the OD600 = 0.1 data set has
measures before the final time.
This will lead to a biased weighting in the estimation, giving more importance to the time period
before t = 15 hours. A possible approach to solve this issue would be the use of interpolation.
Analyzing the figure, it can be seen that the lowest Mannitol yield is for the case with no Nisin
added.
When Nisin is added at OD600 = 0.1 the growth rate decreases and Mannitol yield is higher that
in the case without Nisin. The consumption of Glucose is highly affected and decreases.
When Nisin is added at OD600 = 0.8 the growth and Glucose consumption rates are similar with
the no Nisin case, the Nisin production is approximately the same as in OD600 = 0.1 case.
Finally, the maximum product yield is obtained for OD600 = 0.3 with a low biomass formation and
low glucose consumption rate.
From a visual analysis the data sets appear to be coherent, with one exception. The data for
OD600 = 0.1 exhibits a noticeable different curve for Glucose consumption, thus being visible also
in the much lower biomass production. The effect of the addition of Nisin is only clearly visible
38
3.2 Parameter estimation methods
in the Mannitol production rate. It can be seen that Mannitol production starts shortly after the
addition of Nisin, on every data set.
These data sets illustrate real data from a complex network and there are underlying mech-
anisms behind this data that are still not understood. Still, from a simplified point of view, it is
possible to do the parallelism with the case explored in Section 2.1.2, one can see that adding
Nisin too soon affects Mannitol production because of the lack of biomass, adding Nisin too late
affects Mannitol production because of the lack of substrate and run time, even if there is enough
biomass. Thus, the trade-off that needs to be explored is similar to the case of the previous
section.
3.1.4 Parameters estimation
The suggested Mannitol model has 14 parameters (δ) to be estimated, the second model,
including the Nisin induction, adds 8 more parameters (σ). The estimation problem consists, on a
first stage, estimating the parameters of the Mannitol model using only the data acquired without
the addition of Nisin.
On a second stage, the parameters of the Mannitol model are estimated using both the data
acquired with and without Nisin.
Finally, the parameters of the Nisin part of the second model are estimated and fine tunned using
only the data acquired with Nisin added.
3.2 Parameter estimation methods
The estimations of the parameters of both models were made using MATLAB. On a first ap-
proach the estimations were made using the freely available toolbox SBTOOLBOX and SBPD [23]
but the necessity of easy customization of the cost functions and transparency on the estimation
process led to the use of scripts written for the effect.
The general parameters estimation algorithm is resumed as follows:
• Initial parameters and constraints are defined. The constraints of the S-System parameters
(δ) were set to -4 on the lower bound and 4 on the higher bound to reproduce biochemical
reasonable rates and kinetic parameters. In the cases where σ parameters were estimated,
no constraints were applied.
• An optimizing function is called. Three functions were tested, fminunc, fmincon and siman-
nealingSB. The first two function belong to MATLAB optimization toolbox, and perform un-
constrained and constrained optimizations respectively. The last function belongs to the
SBTOOLBOX and performs minimization by simulated annealing.
39
3. Model for Mannitol production with Nisin induction
1. Inside the optimization function the set of differential equations of the model is inte-
grated, using the initial set of parameters. Depending on the estimation, this set of
parameters is δ, σ or both.
2. After the integration a cost function calculates the cost, normally using the minimum
sum of squares.
3. If the obtained cost obeys the stop condition of the optimization function, the optimiza-
tion is stopped. Otherwise, a new set of parameters is tested.
The parameters estimation process is illustrated on Fig. 3.6.
The estimations were made on a laptop with 4GB of RAM and a dual processor. Estimation times
Figure 3.6: Parameter estimation structure
varied between less than a minute and several minutes.
3.2.1 Estimation using one data set
The first estimation uses the data set obtained without the addition of Nisin and model (3.1).
A first and rough estimation used SBTOOLBOX, with the initial parameters set to 1. After this
estimation a script written for the effect was used. The cost function was defined as the sum of
the squared residuals, where the residuals are the difference between the the modeled data and
40
3.2 Parameter estimation methods
the observed data.
J =
3∑
i=1
timepoints∑
j=1
[yij(δ) − yij
ωi
]2
(3.4)
Where i = 1, 2, 3 refers to the three metabolites, Glucose, Mannitol and Biomass, yij(δ) refers to
sample j of metabolite i of the model data, integrated with the set of parameters δ and yij refers
to experimental data, sample j of metabolite i.
The weighting factor ωi allows to give more or less weight to each metabolite during the estimation
process.
This estimation identifies the 14 parameters δ, belonging to the S-System but gives no guar-
antee that the identified parameters are valid to the data sets with Nisin induction.
3.2.2 Estimation using multiple data sets
On the second estimation both models (3.1) and (3.3) are used, as well as the four data sets,
the ones with and without the addition of Nisin. It is important to note that the data sets obtained
with Nisin added will not fit (3.1) with the parameters δ obtained in the previous section, but since
model (3.3) is an extension of (3.1) to assure consistency on the models, the common parameters
δ have to be equal.
The cost function becomes:
J = J1 + J2 + J3 + J4 (3.5)
where J1 corresponds to the cost associated with the No Nisin data set, and J2, J3 and J4 cor-
respond to the the costs associated with the Nisin data sets added at OD600 = 0.1, OD600 = 0.3
and OD600 = 0.8.
The cost functions are defined as:
J1 =
3∑
i=1
timepoints∑
j=1
[yij(δ) − yij
ωi
]2
J2,3,4 =
3∑
i=1
timepoints∑
j=1
[yij(δ, σ) − yij
ωi
]2
and σ is the set of parameters of the Hill Function and scaling.
This estimation refines the δ parameters obtained previously, ensuring that this set is common
to the sets with and without the addition of Nisin.
Since model (3.3) is used, σ must also be estimated. For the sake of simplicity, only the δ param-
eters are forced to be common to the 4 data sets, set σ is left free and will have different values
for each Nisin data set. Thus, δ + 3σ = 38 parameters are identified in this section.
The initial conditions for δ were set to the estimation obtained in the previous section and the
σ parameters were all set to 1.
41
3. Model for Mannitol production with Nisin induction
3.2.3 Estimation using the Nisin data sets
Having estimated the set of parameters of the S-System, δ, the Hill Function and the respective
scaling parameters, σ can be fine tunned.
Even though the three sets of parameters σ obtained in the previous section are able to fit the
experimental data, from a control point of view it is useful to reduce the control variables. Thus,
σ is divided, and the set {α1n, α2n, α3n, h1n, h2n, h3n} is forced to be common to the three Nisin
data sets.
The only variables/parameters left free are {θ, n}.
This decision is based in the fact that {θ, n} directly manipulate the shape and position of the Hill-
type function, as seen in (3.2), more specifically, varying θ changes the position of the function in
the time axis, creating a time control variable. This time control variable subsequently models the
time of addition of Nisin.
The estimation algorithm estimates
{α1n, α2n, α3n, h1n, h2n, h3n} + {θ, n} ∗ 3 = 12
parameters.
3.2.4 Further notes on estimation strategies
The possibility to define a custom cost function allows us to define other estimation strategies
using the four data sets. Two possible examples are:
• Estimating simultaneously {δ + {δ+ σ}× ∗3} parameters (one whole set of parameters per
data set) and defining a cost function that minimizes the difference between the modeled
and real data and the difference between the three sets of parameters δ. This strategy is
computationally heavy but gives an acceptable approximate estimation in the cases where
the other strategies fail to find the common parameters.
• Another strategy is a combination of the second and third strategies. The cost function is as
in (3.5) but the estimated parameters are {δ+ α1n, α2n, α3n, h1n, h2n, h3n}+ {θ, n} ∗ 3 = 26.
In this strategy both δ and σ are estimated at the same time but three pairs of {θ, n} are
obtained (one for each nisin data set).
3.3 Results
3.3.1 Identification of set δ
The first estimation, identifies the parameters of δ set using only the data set without Nisin.
Starting from the initial condition δ1...14 = 1 the parameters that best fit the real data were identified
42
3.3 Results
with a final cost (3.4) of J = 0.042.
Table 3.1 shows the identified parameters.
Table 3.1: Estimation of the parameters of set δ using the data set without Nisin.
Param. Value Param. Value
α2 0.1267 α3 0.0204β1 0.4861 β2 0.0001β3 0.0367 g21 0.6417g23 2.0663 g31 0.6498g33 0.3644 h11 0.7931h12 −0.1292 h13 1.6108h22 −0.0018 h33 −0.1780
Fig.3.7 plots the modeled and the real data, showing that the estimated parameters accurately
fit the experimental data with one exception. For t ≈ 15 the modeled data does not fit the real
0 5 10 15 20 250
20
40
60
Time (hours)
Glu
cose
0 5 10 15 20 250
5
10
15
Time (hours)
Man
nito
l
0 5 10 15 20 250
1
2
Time (hours)
Bio
mas
s
Modeled dataReal data
Figure 3.7: Estimation of δ using the data set without Nisin.
data of the Biomass concentration. Since there is only one data point after t = 15 it is not possible
to know if the concentration of Biomass decreases from t = 14 to t = 15, increasing after that or
if the lower value for t = 15 is a measurement error.
3.3.2 Identification of set δ using Nisin data sets
On the second estimation all data sets are used to estimate δ. The initial parameters were set
to the previously obtained δ, (Table 3.1) and σ1..8 = 1.
The final cost function was J = 0.619 and the obtained δ set is shown in Table 3.2.
43
3. Model for Mannitol production with Nisin induction
Comparing the parameters of the two tables, the main differences are found on parameters
Table 3.2: Fine tuning of set δ using all data sets.
Param. Value Param. Value
α2 0.1267 α3 0.0196β1 0.3660 β2 −0.0079β3 −0.0080 g21 0.6842g23 1.6550 g31 0.6011g33 0.8168 h11 0.7988h12 0.1432 h13 1.3605h22 0.5963 h33 −0.1577
β2, g33, h12 and h22.
While the variations on these parameters are hard to justify without a proper sensibility analysis,
based on the system equations one can observe that the change in pair β2, h22 leads to a higher
decay on Mannitol concentration, but still, the value is very small. The increase in parameter g33
results in a higher dependency of the system on Biomass concentration which also affects the for-
ward feedback that increases Mannitol production. Finally the increase in h12 gives more weight
to the decay of Glucose due to Mannitol production.
The three obtained σ sets are shown in Table 3.3
Table 3.3: The three independent σ sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
α1n 1.0166 1.2592 0.9208α2n 1.7323 2.6433 1.0987α3n 1.1775 1.5914 1.6015h1n 1.8747 2.3056 2.1681h2n −0.033 −0.411 0.9763h3n 0.9467 1.6172 1.7065n −0.043 0.2937 0.0886θ 1.6023 1.8125 1.8863
Given the new sets of parameters, the two models were integrated and the results plotted
against the real data, Fig.3.8. The algorithm was able to estimate a common set δ that will fit all
the data sets, being the variations (due to Nisin induction) all modeled in set σ.
3.3.3 Identification of set σ
Having a common δ set of parameters to all four data sets, a final estimation tries to find a
common subset of σ that will allow us to fit all data simply by varying the parameters n and θ.
Thus, the final estimation uses only the Nisin data sets.
44
3.3 Results
0 5 10 15 20 250
50100
Glu
cose
No Nisin added
0 5 10 15 20 250
1020
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
0 5 10 15 20 250
50100
Glu
cose
Nisin added @ OD600
= 0.1
0 5 10 15 20 250
2040
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
0 5 10 15 20 25 300
50100
Glu
cose
Nisin added @ OD600
= 0.3
0 5 10 15 20 25 300
2040
Man
nito
l
0 5 10 15 20 25 30012
Bio
mas
s
0 5 10 15 20 250
50100
Glu
cose
Nisin added @ OD600
=0.8
0 5 10 15 20 250
1020
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
Modeled DataReal Data
Figure 3.8: Estimation of set δ using all the data sets. Each Nisin data set is modeled with an independentσ set.
45
3. Model for Mannitol production with Nisin induction
In this estimation, δ set is not estimated, but is necessary as input to integrate the systems. The
set used was the one obtained in the previous section and shown in Table 3.2.
For the subset of σ that will be common to the three data sets, {α1n, α2n, α3n, h1n, h2n, h3n},
the initial conditions were set to the values obtained for Nisin added at OD600 = 0.1, shown in the
second column of Table 3.3.
The three initial pairs of n and θ were set to 1.
This estimation resulted in a final cost of J = 1.28 and the results are plotted in Fig.3.9.
The estimation successfully found a subset of σ, {α1n, α2n, α3n, h1n, h2n, h3n}, that allows to
2 4 6 8 10 12
4060
Glu
cose
No Nisin added
0 5 10 15 20 25−20
020
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
0 5 10 15 20 250
50100
Glu
cose
Nisin added @ OD600
= 0.1
0 5 10 15 20 25−20
020
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
0 5 10 15 20 25 300
50100
Glu
cose
Nisin added @ OD600
= 0.3
0 5 10 15 20 25 30−50
050
Man
nito
l
0 5 10 15 20 25 30012
Bio
mas
s
0 5 10 15 20 250
50100
Glu
cose
Nisin added @ OD600
= 0.8
0 5 10 15 20 25−50
050
Man
nito
l
0 5 10 15 20 25012
Bio
mas
s
Figure 3.9: Estimation of σ using the Nisin data sets and a fixed δ.
model all the Nisin data sets, only by varying the value of n and θ.
The obtained parameters are shown in Table 3.4 and Table 3.5.
The results shown on Table 3.5 are not the ones expected. The obtained n values characterize
a Hill Function with a very slow transition from the minimum to maximum, which goes against the
initial predictions.
It was also expected to obtain θ parameters that were proportional to the different OD600 values
used or proportional to the time of induction.
To better understand the effect of the Hill-Function on the Mannitol model, the Hill-function curves
46
3.3 Results
Table 3.4: Common subset of σ obtained in the estimation using the Nisin data sets.
Param. Value
α1n 0.9875α2n 2.0556α3n 1.5726h1n 1.0793h2n 0.3992h3n 1.4508
Table 3.5: Independent σ parameters obtained for each of the Nisin data sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
n 0.3040 0.8771 0.3375θ 0.2893 3.8653 2.0929
were plotted, and are shown in Fig. 3.10.
The effect of Nisin induction was modeled using a Hill-Function because its characteristics are
similar to the theoretical concentration curve of Nisin, however, the results from Fig. 3.10 suggest
that, for this model, the effect of Nisin can be modeled by a simple straight line.
While this result is not encouraging from the control point of view, the results were satisfactory
since the primary objective, to model Mannitol production with and without Nisin induction, was
fulfilled.
The use of S-Systems to model Mannitol production was based on the fact that it is a standard
when modeling metabolic systems. Given the simplicity of the system (only three variables) the
number of parameters is excessive. A future model for Mannitol production should be formulated
reducing the number of parameters.
Chapter 4 briefly explores the possible control strategies given the described results.
47
3. Model for Mannitol production with Nisin induction
0 5 10 15 20 25 300.5
1
1.5
2
Glu
cose
0 5 10 15 20 25 302
2.5
3
3.5
4
Man
nito
l
0 5 10 15 20 25 301.5
2
2.5
3
Bio
mas
s
OD600
= 0.3
OD600
= 0.1
OD600
= 0.8
Figure 3.10: Plot of the obtained Hill-type functions for each Nisin data set.
48
4Optimizing Mannitol production
using Optimal Control
Contents4.1 Control using a step function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
49
4. Optimizing Mannitol production using Optimal Control
4.1 Control using a step function
The results obtained in Section 3.3.3 have shown that for this network, a Hill-function is not
the ideal to model the temporal control achieved with Nisin induction. A simplification of the Hill-
function to model Nisin induction is the use of a step-function, with the same form of the step form
control function f(t) used in the Prototype network, in Chapter 2.2.1.
Thus, the model for Mannitol production with Nisin induction becomes:
if t < tnisin
dx1
dt= −β1x
h11
1 xh12
2 xh13
3 (1 + u11)
dx2
dt= α2x
g21
1 xg23
3 (1 + u21) − β2xh22
2
dx3
dt= α3x
g31
1 xg33
3 (1 + u31) − β3xh33
3
else (4.1)
dx1
dt= −β1x
h11
1 xh12
2 xh13
3 (1 + u12)
dx2
dt= α2x
g21
1 xg23
3 (1 + u22) − β2xh22
2
dx3
dt= α3x
g31
1 xg33
3 (1 + u32) − β3xh33
3
Here, tnisin is the time of addition of Nisin or a term proportional to it. The seven parameters
that model the step function are u11, u21, u31, u12, u22, u31, tnisin.
Since there are two distinct sets of parameters, set δ from the S-System equations and the
step function parameters, the estimation was done in three different ways.
• In a first approach all the possible parameters were estimated, thus, δ+3×(step function parameters) =
35 parameters. All data sets, including the non-nisin data set were used.
• In the second estimation, the set δ was fixed (it was used the one obtained in section 3.3) and
only the set of parameters of the step function was estimated. Here, the set of parameters for
the step function was independent for each data set, thus 3 × (step function parameters) =
21 parameters were estimated.
• Finally, having the same fixed δ, the parameters of the step function, u##, were forced to
be common, with the exception of the three tnisin parameters. Thereby, it is ensured that
the control is only dependent on one time variable. Only the Nisin data-sets were used and
u## + 3 × tnisin = 9 parameters were estimated.
50
4.2 Results
4.2 Results
The first estimation returned a low value for the functional of J = 0.65 and the model fitted the
data.
The initial estimation for δ is the one shown in Table 3.2. By the end of the estimation procedure
only small changes in the δ were observed.
The obtained parameters for the step function are shown in Table 4.1.
The step function parameters for Glucose, u11 and u12, show a reduction of consumption after
Table 4.1: Three independent step function parameters, obtained on the first estimation with all data sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
u11 1.9006 1.1973 1.9677u12 0.8843 1.0106 0.1798u21 1.4158 1.7018 1.1581u22 2.8653 3.8892 0.8279u31 0.3373 0.0398 −0.0447u32 −0.1123 −0.0678 0.0183tnisin 0.2909 0.5796 1.2695
tnisin. For Mannitol, u21 and u22 show an increase in production for OD600 = 0.1 and OD600 = 0.3
but a decrease for OD600 = 0.8. The same pattern is observed for Biomass (u31 and u31). The
obtained values for tnisin are encouraging since they are increasing proportionally with the time
of addition of Nisin.
The second estimation returned a functional of J = 0.7, the results being shown in Table 4.2.
The parameters for Glucose, u11 and u12, show a decrease of consumption for OD600 = 0.1
and OD600 = 0.8 after tnisin. Mannitol production increases after tnisin for OD600 = 0.1 and
OD600 = 0.3 and Biomass production increases after tnisin in all cases. Once again, the values
of tnisin increase with the time of addition of Nisin.
Table 4.2: Three independent step function parameters, obtained on the second estimation with the Nisindata sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
u11 1.6251 0.7102 0.7925u12 0.1514 0.7947 −0.0553u21 0.8024 1.1865 0.6190u22 1.1551 2.6113 0.3193u31 −0.7084 −0.7184 −0.7690u32 −0.4217 0.0656 0.2952tnisin 1.5365 1.8139 2.4397
51
4. Optimizing Mannitol production using Optimal Control
The final estimation forced the step function parameters to be common to all data sets. A first
optimization finished with a functional of J = 2.4 which is far from ideal, being the disadjustments
easily visible between the modeled and real data. The obtained parameters are listed in Table 4.3
and Table 4.4.
Table 4.3: The common step function parameters, obtained on the third estimation #1 with the Nisin datasets.
u11 2.3718u12 0.8186u21 1.4296u22 2.6559u31 0.3382u32 −0.1285
Table 4.4: Three independent values for tnisin, obtained on the third estimation #1 with the Nisin data sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
tnisin 0.2981 0.6994 1.1334
Although the values of tnisin are still increasing with the time of addition of Nisin, the values of
the modeled data no longer agree with real data. In this case, the final yield of Mannitol for the
modeled data, with OD600 = 0.8 is greater than the yield with OD600 = 0.3, putting in cause the
validity of the model and the possibility of determining the optimal Nisin induction time.
To confirm the last result, a second optimization was run, finishing with a functional of J = 1.3.
The obtained results are listed in Table 4.5 and Table 4.6.
Table 4.5: The common step function parameters, obtained on the third estimation #2 with the Nisin datasets.
u11 2.1825u12 0.5478u21 2.1977u22 2.1486u31 0.3661u32 −0.1574
Table 4.5 shows that the Glucose consumption (u11 and u12) increases after tnisin, Mannitol
production (u21 and u22) increases and Biomass production (u31 and u31) decreases. The values
for tnisin do not increase with the time of addition of Nisin.
While the results for u## are in part in agreement with the expected, increased Mannitol pro-
duction and decrease in Biomass production, the lack of coherency between estimations and the
52
4.2 Results
Table 4.6: Three independent values for tnisin, obtained on the third estimation #2 with the Nisin data sets.
Param. OD600 = 0.1 OD600 = 0.3 OD600 = 0.8
tnisin 0.3209 0.9237 0.3892
obtained values for tnisin do not allow a generalization of the results.
The results obtained confirm that the model used is not able to describe the variation of Manni-
tol production, controlled by the time of induction with Nisin, only by varying the parameter tnisin.
The results presented in the last paragraphs were confirmed several times, by running new opti-
mizations with different initial conditions.
When the set of parameters u## is allowed to be independent on each Nisin data set, it is possi-
ble to properly fit the data and obtain increasing tnisin values. For a common u## set, in order to
obtain a valid fit, the tnisin values will not have the desired characteristic.
Applying further restrictions on the estimation algorithm, for example, forcing tnisin to be pro-
portional to the time of induction, and fine tunning the initial conditions, would probably allow to
obtain a common δ and u## sets. Still, the validity of the model and ability to predict new results
would be questionable.
As mentioned before, inspecting Fig.3.5 the effect of Nisin addition is only clear for the data of
Mannitol production. From the figure, and from [1] it is not possible to infer a rule for the effect of
Nisin on the consumption of Glucose and formation of Biomass. In fact, in Fig.3.5 the variation
of the metabolites suggest that the difference between them is, not the time of addition, but the
amount of Nisin added, this empirical result is also confirmed with the results obtained to the Hill-
type functions in section 3.3.3.
Since it is known for a fact, that these data sets were obtained for the same strain of L.lactis,
within the same laboratory conditions, same amount of Nisin added and that the only difference
was the time of addition, one can only conclude that the problem is on the mathematical model.
As explained before, Fig.3.1, the metabolic pathways for the production of Mannitol in L.lactis is
still covered with many uncertainties. In [1] many unpredictable results were obtained that actu-
ally contributed to the formulation of new tests and for the progressive unraveling of the metabolic
pathway structure.
The results obtained in this chapter are not the ones expected but may be a proof that Man-
nitol formation with and without Nisin induction is more complex than predicted and that other
metabolites must be included in the model in order to elaborate a proper predictive model.
53
4. Optimizing Mannitol production using Optimal Control
54
5Conclusions
55
5. Conclusions
The work presented in this thesis addresses several questions in a logical sequence that can
arise when formulating a strategy to optimize and control the production of a certain metabolite
on a metabolic network.
Although the concept of metabolic engineering is not a new concept, due to its complexity many
questions are still unsolved and are expected to remain so for many years to come. The high
variety of metabolic networks makes hard to define a modeling, optimization or control strategy
that is applicable to all of them. Thus, when dealing with a new problem, it is wise to gather all
the possible information about that specific problem and combine solutions from various problems.
In Chapter 2 a prototype metabolic network was presented. Although quite simple, it exhibits
a trade-off behavior between two metabolites that often occurs in real life. It is shown that, for a
class of networks in which the yield of the product that favors cell population growth (the “natural”
product) competes with the desired product yield, with the manipulated variable affecting linearly
the fluxes, the optimal control that explores this trade-off assumes only extreme values.
While the implementation of control poses no challenge on in silico metabolic networks, on real
metabolic networks complex bioengineering skills are required. Gene knockout manipulations do
not adequate to this kind of control problem due to the long time scale associated with these tech-
niques. The manipulation of specific enzyme levels, controlled by modulating the expression of
the corresponding genes using promoter systems and inducers, is a possible solution to this kind
of control problem [6].
Since the lack of detailed information on the kinetics of the networks is frequent, a bi-level opti-
mization was presented and tested for three levels of information on the network. It is shown that
the use of a bi-level optimization strategy, that maximizes the natural product in the inner level by
manipulating the fluxes, leads to a good approximation to the optimal solution, with the advantage
of not requiring the full knowledge of the network model.
The presented optimization strategies are not a valid solution to every optimization problem re-
lated to Metabolic Networks. While the algorithms and results are valid for the network in question,
their contribution is mostly a guideline for future optimizations. The different strategies comple-
ment each other and, while some might never be used in practical terms, like the optimization
using GP when the full kinetics of the network are known, they introduce techniques that can be
used in the same context. Although the example network used is very simple, real networks are
extremely complex and exhibit relations between metabolites that are not always expected or fully
understood. This gives emphasis to the need of good in silico models. The prototype network
has proved to be useful to test the optimization strategies but a more complex network should be
used to confirm that the strategy can be scaled to a bigger network.
Having studied a synthetic prototype network, and possible optimization strategies, a real life
56
case was taken as example in Chapter 3.
A model for the production of Mannitol on a specific strain of L. lactis was created. The model
predicts two situations. Mannitol production with and without the addition of Nisin, where Nisin
acts as an inductor of two enzymes whose activation leads to the production of Mannitol.
This network was used as a case-study on the identification of the models parameters. The chal-
lenge was the identification of the models parameters using simultaneously the data sets obtained
for Mannitol production with and without Nisin induction. Since the two models have a common
part, the common parameters should be the same.
The estimation of the parameters using multiple data sets can be done in several ways. The
ability of freely manipulate the estimation algorithm and the cost function to be minimized is of
great importance, since one can adjust the estimation to each particular case. This strategy for
parameters identification provides consistency to the estimation and is, hopefully, a step forward
on the creation of predictive models, instead of simple descriptive models.
Finally, the validity of using a Hill-type function to model Nisin induction was tested in the end
of Chapter 3. The results were not encouraging, since the model was unable to identify distinct
times of induction with Nisin.
In Chapter 4 the Hill-type function approach was relaxed and a simple step-function was tested
to model Nisin induction. Although the modeled data was able to properly fit the real data, once
again it was not possible to identify distinct times of induction and, subsequently, formulate an
optimization strategy based on a temporal control variable. The answer to this problem remains
unanswered but given the reliability of the data sets, the solution must rely on a new approach to
modeling Mannitol production taking into account other underlying mechanisms of Mannitol for-
mation.
The work presented on this thesis for networks optimization, parameter estimation and con-
trol strategies, provide clues for future problems and networks with similar characteristics. The
problems and new questions raised can be used as a starting point to many new research paths.
57
5. Conclusions
58
Bibliography
[1] P. Gaspar, A. R. Neves, A. Ramos, M. J. Gasson, C. A. Shearman, and H. Santos, “Engi-
neering lactococcus lactis for production of mannitol: High yields from food-grade strains de-
ficient in lactate dehydrogenase and the mannitol transport system,” Appl. Environ. Microbiol,
vol. 70.
[2] J. Summerton, “Morpholino antisense oligomers: the case for an rnase h-independent struc-
tural type,” Biochim Biophys Acta, vol. 1489, no. 1, pp. 141–58, 1999.
[3] H. Gu, J. D. Marth, P. C. Orban, H. Mossmann, and K. Rajewsky, “Deletion of a dna poly-
merase beta gene segment in t cells using cell type-specific gene targeting,” Science, vol.
265, no. 5168, pp. 103–6, 1994.
[4] P. Masci, O. Bernard, F. Grognard, E. Latrille, J.-B. Sorba, and J.-P. Steyer, “Driving compe-
tition in a complex ecosystem: Application to anaerobic digestion,” In Proc. of the European
Control Conference (ECC’09). August 23-26, Budapest, Hungary., 2009.
[5] K. G. Gadkar, F. J. Doyle Iii, J. S. Edwards, and R. Mahadevan, “Estimating optimal profiles
of genetic alterations using constraint-based models,” Biotechnol Bioeng, vol. 89, no. 2, pp.
243–51, 2005.
[6] R. M. Kapil G. Gadkar and F. J. D. III, “Optimal genetic manipulations in batch bioreactor
control,” Automatica, vol. 42, no. 10, pp. 1723–1733, 2006.
[7] J. S. Edwards, M. Covert, and B. Palsson, “Metabolic modelling of microbes: the flux-balance
approach,” Environ Microbiol, vol. 4, no. 3, pp. 133–40, 2002.
[8] R. Mahadevan, J. S. Edwards, and r. Doyle, F. J., “Dynamic flux balance analysis of diauxic
growth in escherichia coli,” Biophys J, vol. 83, no. 3, pp. 1331–40, 2002.
[9] J. Nielsen, “Metabolic engineering,” Appl Microbiol Biotechnol, vol. 55, no. 3, pp. 263–83,
2001.
[10] Y. Liu, H. B. Sun, and H. Yokota, “Regulating gene expression using optimal control theory,”
Bioinformatic and Bioengineering, IEEE International Symposium on, vol. 0, p. 313, 2003.
59
Bibliography
[11] A. Datta and E. Dougherty, Introduction to genomic signal processing with control. CRC
Press (Taylor & Francis Group), 2007.
[12] P. Pharkya and C. D. Maranas, “An optimization framework for identifying reaction activa-
tion/inhibition or elimination candidates for overproduction in microbial systems,” Metab Eng,
vol. 8, no. 1, pp. 1–13, 2006.
[13] A. Varma and B. O. Palsson, “Stoichiometric flux balance models quantitatively predict
growth and metabolic by-product secretion in wild-type escherichia coli w3110,” Appl Environ
Microbiol, vol. 60, no. 10, pp. 3724–31, 1994.
[14] I. Mierau, K. Olieman, J. Mond, and E. J. Smid, “Optimization of the lactococcus lactis nisin-
controlled gene expression system nice for industrial applications,” Microb Cell Fact, vol. 4,
no. 1, p. 16, 2005.
[15] K. Koh, S. Kim, A. Mutapic, and S. Boyd, “GGPLAB: A simple matlab toolbox for geometric
programming,” 2006.
[16] F. Lewis and V. Syrmos, Optimal Control. John Wiley & Sons Inc., 2nd ed., New York, 1995.
[17] M. A. Savageau, “Biochemical systems analysis. i. some mathematical properties of the rate
law for the component enzymatic reactions,” J Theor Biol, vol. 25, no. 3, pp. 365–9, 1969.
[18] A. Sorribas, B. Hernandez-Bermejo, E. Vilaprinyo, and R. Alves, “Cooperativity and satu-
ration in biochemical networks: a saturable formalism using taylor series approximations,”
Biotechnol Bioeng, vol. 97, no. 5, pp. 1259–77, 2007.
[19] M. A. Savageau, “Biochemical systems analysis. 3. dynamic solutions using a power-law
approximation,” J Theor Biol, vol. 26, no. 2, pp. 215–26, 1970.
[20] ——, “Biochemical systems analysis. ii. the steady-state solutions for an n-pool system using
a power-law approximation,” J Theor Biol, vol. 25, no. 3, pp. 370–9, 1969.
[21] E. O. Voit and S. W. Omholt, “Computational analysis of biochemical systems. a practical
guide for biochemists and molecular biologists, cambridge university press, 2000, 531 pages
(isbn 0-521-78579-0; paperback),” Mathematical Biosciences, vol. 181, no. 1, pp. 107 – 109,
2003.
[22] Kyoto University and U. of Tokyo, “Kegg pathway database,”
http://www.genome.jp/kegg/pathway.html.
[23] H. Schmidt and M. Jirstrand, “Systems biology toolbox for MATLAB: a computational
platform for research in systems biology,” Bioinformatics, vol. 22, no. 4, pp. 514–515,
February 2006. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/bti799
60
Bibliography
[24] C. H. Schilling, J. S. Edwards, D. Letscher, and B. O. Palsson, “Combining pathway analysis
with flux balance analysis for the comprehensive study of metabolic systems,” Biotechnol
Bioeng, vol. 71, no. 4, pp. 286–306, 2000.
[25] S. P. Boyd and L. Vandenberghe., “Convex optimization,” Cambridge University Press., 2004.
[26] A. Marin-Sanguino, E. O. Voit, C. Gonzalez-Alcon, and N. V. Torres, “Optimization of biotech-
nological systems through geometric programming,” Theor Biol Med Model, vol. 4, p. 38,
2007.
61
Bibliography
62