+ All Categories
Home > Documents > The challenges of informatics in synthetic biology: from...

The challenges of informatics in synthetic biology: from...

Date post: 10-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
16
The challenges of informatics in synthetic biology: from biomolecular networks to artificial organisms Gil Alterovitz, Taro Muso and Marco F. Ramoni Submitted: 5th August 2009; Received (in revised form) : 23rd September 2009 Abstract The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strate- gies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms. Keywords: informatics; synthetic biology; systems biology; networks INTRODUCTION The processing and management of information is a critical part of synthetic biology, a field that approaches the design of biologically based machines from a systems engineering perspective, as a comple- ment to systems biology. Whereas systems biology studies how biological parts give rise to the emergent properties and functions of a unified organism, the main goal of synthetic biology is to start with a set of functions and properties, and build a suit- able system out of biological components. In other words, systems biology and synthetic biology repre- sent two sides of the same coin: analysis and design [1]. The development of biologically based solutions to human problems is as old as mankind. For thou- sands of years, man has been breeding plants for agri- culture, horses for transportation and pets for companionship. Genetic engineering pioneered the use of natural genes to modify organisms. Synthetic biologists also alter natural systems for human con- sumption, but with a different approach: they engineer biological systems starting from artificial components. As in systems engineering, biological modules could be developed from an eclectic set of natural sources and rapidly combined to arrive at innovations that would be far beyond incremental, time-consuming adjustments of natural organisms. The imminent departure from traditional biological engineering inspires novel ways to solve age-old problems, such as those in alternative energy [2], drug manufacture [3, 4], therapeutics [5] and green chemistry [6]. In other words, synthetic biology opens the door to unprecedented biochemical flexibility—a marked departure from an incremental pattern of progress. Gil Alterovitz is a Harvard Medical School faculty member in the Children’s Hospital Informatics Program at the Harvard/MIT Division of Health Sciences and Technology (HST). Taro Muso is a graduate of the Harvard/MIT Division of Health Sciences and Technology (HST) and an affiliate of the Partners Healthcare Center for Personalized Genetic Medicine. Corresponding author. Taro Muso. E-mail: [email protected] Marco F. Ramoni is the Associate Professor of Pediatrics and Medicine at Harvard Medical School, and the Director of the Biomedical Cybernetics Laboratory at the Partners Healthcare Center for Personalized Genetic Medicine. BRIEFINGS IN BIOINFORMATICS. VOL 11. NO 1. 80^95 doi:10.1093/bib/bbp054 Advance Access published on 11 November 2009 ß The Author 2009. Published by Oxford University Press. For Permissions, please email: [email protected]
Transcript
Page 1: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

The challenges of informatics insynthetic biology: from biomolecularnetworks to artificial organismsGil Alterovitz, Taro Muso and Marco F. RamoniSubmitted: 5th August 2009; Received (in revised form): 23rd September 2009

AbstractThe field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biologicaldata and the systems engineering paradigm in the design of new biological machines and systems. These biologicalmachines are built from basic biomolecular components analogous to electrical devices, and the information flowamong these components requires the augmentation of biological insight with the power of a formal approach toinformation management. Here we review the informatics challenges in synthetic biology along three dimensions:in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, fromthe specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitrosynthetic biology in terms of information flow, and discuss genetic fidelity in DNAmanipulation, development strate-gies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineeringchassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.

Keywords: informatics; synthetic biology; systems biology; networks

INTRODUCTIONThe processing and management of information is a

critical part of synthetic biology, a field that

approaches the design of biologically based machines

from a systems engineering perspective, as a comple-

ment to systems biology. Whereas systems biology

studies how biological parts give rise to the emergent

properties and functions of a unified organism,

the main goal of synthetic biology is to start with

a set of functions and properties, and build a suit-

able system out of biological components. In other

words, systems biology and synthetic biology repre-

sent two sides of the same coin: analysis and

design [1].

The development of biologically based solutions

to human problems is as old as mankind. For thou-

sands of years, man has been breeding plants for agri-

culture, horses for transportation and pets for

companionship. Genetic engineering pioneered the

use of natural genes to modify organisms. Synthetic

biologists also alter natural systems for human con-

sumption, but with a different approach: they

engineer biological systems starting from artificial

components. As in systems engineering, biological

modules could be developed from an eclectic set

of natural sources and rapidly combined to arrive at

innovations that would be far beyond incremental,

time-consuming adjustments of natural organisms.

The imminent departure from traditional biological

engineering inspires novel ways to solve age-old

problems, such as those in alternative energy [2],

drug manufacture [3, 4], therapeutics [5] and green

chemistry [6]. In other words, synthetic biology

opens the door to unprecedented biochemical

flexibility—a marked departure from an incremental

pattern of progress.

Gil Alterovitz is a Harvard Medical School faculty member in the Children’s Hospital Informatics Program at the Harvard/MIT

Division of Health Sciences and Technology (HST).

Taro Muso is a graduate of the Harvard/MIT Division of Health Sciences and Technology (HST) and an affiliate of the Partners

Healthcare Center for Personalized Genetic Medicine.

Corresponding author. Taro Muso. E-mail: [email protected]

Marco F. Ramoni is the Associate Professor of Pediatrics and Medicine at Harvard Medical School, and the Director of the

Biomedical Cybernetics Laboratory at the Partners Healthcare Center for Personalized Genetic Medicine.

BRIEFINGS IN BIOINFORMATICS. VOL 11. NO 1. 80^95 doi:10.1093/bib/bbp054Advance Access published on 11 November 2009

� The Author 2009. Published by Oxford University Press. For Permissions, please email: [email protected]

Page 2: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

In theory, the synthetic biologist should be able

to start with a set of desired features, design a bio-

logical circuitry that meets those requirements, and

implement that design in vivo. The reality is not

so straightforward (Figure 1). The current practice

of producing complex biological systems usually

requires an iterative optimization, partly because bio-

logical parts are subject to apoptosis, crosstalk, muta-

tions and perturbations. In addition, a biological

component can exhibit context dependence—

it can stop working when it is transplanted from its

native context into another cell type. Synthesized

biological circuitry also suffers from biological noise

and undesirable initial conditions. The issues inher-

ent in this field become most apparent when one

considers biological components, when put together,

give rise to emergent properties in the whole. The

existence of emergent properties indicates that our

biological knowledge and design capabilities are not

yet at the level of sophistication needed for a prioridesign and production of a prototype with a fair

shot at success.

It is clear that the acknowledgement of the exis-

tence of emergent properties implies the need for

a better understanding of systems biology. What

is less obvious is that efficiently building a robust

infrastructure for synthetic biology requires a careful

management of relevant information by the research

community. Such information would include bio-

logical device data exchanged by collaborators, net-

work models exported by software and signals

transduced from one biological device to another.

The complexity and amount of information

needed implies an opportunity for synergy through

standardized communication. However, reviews

on synthetic biology from an informatics perspective

are rare. This review addresses this gap in the

literature.

IN SILICOComputer-based design and simulation are key

elements of synthetic biology, and there is a need

for efficient communication between both human

beings and software programs. Taken together,

these facts imply the need for standardization of

synthetic biology data in silico.

Information standardsMost of the efforts in synthetic biology computer

data standardization can be grouped into two areas.

One starts with a network perspective, and the

other has a ‘bottom-up’ approach that emphasizes

the fundamental building block of synthetic biology,

or the biological part. The dominant parts format

appears to be the BioBrick Standard [7] (Figure 2),

which is used by the Registry of Standard Biological

Parts (http://partsregistry.org) and the international

Genetically Engineered Machine (iGEM) competi-

tion [8]. The Biobricks Standard is a set of rules

that define features of a DNA sequence so that

each BioBrick can be easily combined into larger

compositions in vitro. In other words, each BioBrick

is an easily clonable DNA sequence which codes for

a biological part. While the ease of DNA construc-

tion is addressed, extending the format to support

Figure 1: The synthetic biology infrastructure. Solidlines indicate the components of synthetic biology andthe connections among them. Bold solid lines empha-size the main path from given requirements to finishedproduct. Boxes with thin solid lines indicate supportstructures that need to be developed in order to makesynthetic biology a practical reality. The cycles withinthe graph convey that the current technology requiresan iterative approach to arrive at a useful biologicalsystem. In a larger context, synthetic biology (design)and systems biology (analysis) feed into each other(see dashed lines). For example, in vivo tests resultin data which feed back into synthetic biology. TheDatabase box represents organized knowledge fromsystems biology and quantitative data on biologicalparts.

Challenges of informatics in synthetic biology 81

Page 3: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

the functional composition of these modules remains

an important challenge [9]. The BioBrick format

bases its parts characterization on promoter structure

and sequences, and this is not easily translated

into functional characterization within the context

of interacting networks [10]. Sequence-based

descriptions of parts would be appropriate in design-

ing small systems where potential interactions

could be intuitively processed (for example, by

ignoring ‘nonessential’ DNA segments), but this

becomes impractical for the design of large networks.

This is because even ‘nonessential’ portions of bio-

logical sequences still affect functional efficiency

in DNA promoters, RNA, and proteins [11]. (This

paper [11] not only published new biological parts

but also proposed a general strategy that addresses

problems of emergent properties and design inaccu-

racy. This paper convincingly argued for a new

way to develop and characterize components, and

will likely influence the way future biological parts

are presented in databases and publications.) Minor

changes of nonessential sequences affect individual

components in minor amounts that are only quanti-

tatively noticeable, but small changes to one compo-

nent can still have a dramatic impact on network

behavior. Therefore, quantitative characterizations

of component functions are necessary for efficient

network design. Canton et al. [12] (this paper

proposed to augment the BioBricks documentation

standards) proposed to extend the Biobrick Standard

by adding quantified descriptions formatted into

datasheets akin to those common in electrical engi-

neering. However, different biological parts may

require different types of information [9]. In other

words, the Registry may require more than one

datasheet format.

Other enhancements of the BioBrick Standard

have also been proposed. Recent experimental

tests to confirm the validity of plasmid inserts for

a collection of clones have resulted in unexpected

discrepancies, so a quality control scheme has

been proposed [13] (This paper proposes a quality

control scheme for the Registry of Biological

Parts). A provisional BioBrick language (PoBoL)

was created to define a data exchange standard

(http://pobol.org) [14]. More specifically, PoBoL

aims to define minimal information requirements

for BioBricks, provide annotation methods for

BioBricks, maintain interlinking possibilities and

set the stage for further language extensions.

Of equal importance to biological parts standard-

ization is an agreement on how network designs

should be described. To model biological

systems, it seems logical to start with conventions

developed in the systems biology community, such

as the Systems Biology Markup Language (SBML)

[15–18], Cellular Markup Language (CellML)

[19, 20], MIRIAM [21], Systems Biology

Graphical Notation (SBGN) [22, 23] ([23] formally

presents a set of conventions in graphical notation

Figure 2: The BioBrick Standard [7]. (a) Basic sequence template of the BioBrick Standard. The insert ofthe BioBrick is flanked upstream and downstream with restriction sites. EcoRI and XbaI restriction sites are atthe 50 -end (prefix). SpeI and PstI restriction sites are at the 30 -end (suffix). Each insert is a genetic componentthat can code for a promoter, ribosome-binding site, open reading frame, transcription termination sequence, orany combination of these. Restriction site sequences are not allowed within the genetic component. (b) Schematicof the joining process. To attach insert 1 upstream of insert 2, use restriction enzymes SpeI and XbaI respectively.The ends can then be joined together to form a scar, which cannot be cleaved again by either one of the restrictionenzymes.

82 Alterovitz et al.

Page 4: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

that will help biologists communicate clearly and

efficiently) and BioPAX [24].

SBML was developed to exchange biological

process information in the systems biology commu-

nity [15–18]. It can be used to model a variety of

phenomena, such as metabolic pathways, gene regu-

lation and cell signaling pathways. Its success can

be attributed to a number of factors. First, SBML

has incorporated a number of other useful standards:

MathML 2.0 [25], which provides a common math-

ematical expression language; the Resource

Description Framework (RDF) [26], which allows

for machine-readable metadata; and the Systems

Biology Ontology (SBO) [27, 28] is a set of six

controlled vocabularies. Second, SBML provides

community-driven software support [29] (http://

sbml.org/SBML_Software_Guide). A particularly

useful software platform is an application program-

ming interface (API) library called LIBSBML [30],

which makes SBML file manipulation accessible

to scripting languages. Current translation scripts

have bridged SBML-structured data and

other formats [31]. Third, the SBML format

is used in the BioModels Database [32] (http://

www.ebi.ac.uk/biomodels-main/). Recent develop-

ments demonstrate both language extensions

and applications. Its utility has been extended for

stochastic simulations [33]. SBML has been used in

the analysis of iron metabolism [34] and the RB/E2F

pathway [35].

CellML, an alternative to SBML, is an extensible

markup language that models the cell as a set of

ordinary differential equations [19, 20]. Its more

modular structure is convenient for multi-scale

modeling and reuse of parts but has less emphasis

on the biochemistry. CellML also incorporates

MathML and RDF. It also has some community-

driven software support [36] (http://www.cellml

.org/tools). There are translators that bridge

SBML and CellML [37]. Community adoption

of this standard has resulted in the CellML

Model Repository, which is a publicly accessible

database of curated biological models [38] (this

paper [38] presents the current state of the model

repository). CellML’s flexibility stems from its

ability to represent biological phenomena through

mathematical and model building constructs,

but sometimes it is useful to have explicit bio-

logical descriptions. To this end Wimalaratne et al.[39] have developed a biophysical annotation

framework.

MIRIAM, or minimal information requested

in the annotation of biochemical networks, is a

scheme to provide extensive documentation in the

model file in a structured manner [21]. Models can

only be useful if there is enough annotation.

Controlled annotations are achieved with the help

of uniform resource identifiers (URIs) [40]. The

MIRIAM approach provides a common annotation

format as well as controlled vocabularies and data-

bases [41].

SBGN is a recent attempt at standardizing

the visual representation of biological networks

(Figure 3) [22]. Recently, automatic equation gen-

eration for SBML from SBGN diagrams was

made possible [42].

BioPAX is an effort to represent pathway data

with ontological annotations [24, 43]. BioPAX com-

plements formats like CellML and SBML because it

focuses on the integration of large qualitative

pathways rather than on mathematical modeling

[10, 44].

The synthetic biology community also has other

approaches that border on standardization. For

example, Pedersen et al. [45] introduced a formal

language called Genetic Engineering of Cells

(GEC), which allows a modular modeling of inter-

actions between potentially undetermined proteins

and genes.

Ideally, a synthetic biology design approach

would have the versatility to employ both network-

and component-centric standards so that multiple

levels of detail could be considered at the same

time. In addition to importing publicly accessible

data in common formats, the workflow would

integrate problem-specific data and formats as well.

Integration of the network and component perspec-

tives is occurring or anticipated on multiple fronts.

The BioBricks format is expected to support the

design of ever more complex networks by incorpor-

ating integration approaches akin to BioPAX [10]

that allow for ontological annotations. In contrast,

standards like CellML and SBML that already

allow mathematical network modeling would

benefit from extending their formalisms to leverage

synthetic biology constructs, such as DNA sequences

and device-level information [10]. A third front

is composed of integration efforts not though

explicit dialogue on standards but with software

development. OpenCell (PCEnv), a CellML-based

platform, can model both quantitative networks and

synthetic biology constructs [19].

Challenges of informatics in synthetic biology 83

Page 5: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

The result of these efforts would be a compre-

hensive description framework, but the classic trade-

off between detail-driven accuracy and analytical

efficiency will persist. Because a tradeoff naturally

implies numerous possible approaches to addressing

both accuracy and efficiency, each subgroup

within synthetic biology may opt to pursue

their own specialized formats for data management.

For example, a network that depends on transcrip-

tional regulation and a model that depends on

protein–protein interaction may have different

description requirements for modules and control

kinetics equations. Such specializations may be

easily achieved through the custom tag facility of

XML [46], which is already familiar to developers

of SBML and CellML.

Figure 3: SBGN network example. of inter-cellular signaling near the neuromuscular junction [22, 23]. Biologicalconcepts are organized with glyphs, or named containers. Some glyphs represent entity pool nodes, each of whichis a population of entities that are not distinguished from one another in the current SBGN framework. Circular(not ellipsoid) glyphs represent ‘simple molecules’ like ATP and calcium ions. Rectangles with four rounded cornersrepresent ‘macromolecules’ such as myosin.Glyphs can be adorned with additional information, such as the nicotinicacetylcholine receptors (nAChR), which are attached to the ‘state variable’ glyphs ‘open’ and ‘closed’. Note thatthe transition from ‘closed’ to ‘open’ is designated by an arrow with the ‘transition’ glyph, a small square. Anotherused process glyph here is ‘association’, where to lines converge to form one arrow, and a filled disk is placed down-stream of the connection. By having a carefully planned set of conventions for depicting biological processes, colla-borators can communicate with each other with minimal ambiguity in graphical notation.

84 Alterovitz et al.

Page 6: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

Databases and software toolsNo single data standard in synthetic biology has

yet achieved the scope necessary to account for all

useful information, such as epigenetic data [9].

Nevertheless, the current data formats are still

useful for organizing biological information in data-

bases and software. Synthetic Biology Software

Suite (SynBioSS), designed for modeling synthetic

genetic constructs, uses the Registry of Standard

Biological Parts as well as a kinetic parameter data-

base [47]. GenoCAD aims to streamline the design

of synthetic DNA sequences [48]. This program

appears to imply a debate in the synthetic biology

community about the need for well-formatted

ends for easy connection of coding sequences. The

software takes advantage of the BioBrick-formatted

DNA registry, but it also aims to do away with

the standardization of the means by which the parts

are connected. This implies a BioBrick-independent,

general means of producing long stretches of error-

free DNA (discussed later). CellML has software

support through OpenCell (formerly PCEnv),

Cellular Open Resource (COR) [49], InsilicoIDE

and JSIM [19]. Cytoscape can visualize and ana-

lyze complex networks for biological research [50].

Plug-ins, which confer additional features, are

actively being developed [51–54]. Funahashi’s

CellDesigner [55], an editor for SBML, was designed

as a tool to model network dynamics. It has a

plug-in facility that enables third parties to extend

the software capability. CellDesigner’s utility has

been extended for stochastic simulations [33] and

automatic equation generation from SBGN diagrams

[42]. CellDesigner has been used in the analyses

of iron metabolism [34] and the RB/E2F pathway

[35]. The Process Modeling Tool (ProMoT) is

a ‘drag and drop’ design platform [29]. Other

software developments can be found at format-

specific resource pages [29, 36, 56]. In short, con-

current with the efforts to reach consensus on

information standards are attempts to employ

data and standards in the design of synthetic

networks.

Algorithms and heuristicsComputer-based informatics also has the advantage

of relatively low-cost, quick simulations prior to

in vitro implementation. Loewe [57] proposed a

framework that combined systems biology and evo-

lutionary theory to simulate mutations whose effects

are too subtle to be detected in vitro. Chen et al. [58]

proposed a stochastic game theory-based approach

to address complications due to uncertain initial

conditions and extra-cellular disturbances. They

also proposed managing uncertainties by addressing

four design specifications [59]. Banga [60] has

recently reviewed optimization in computational

systems biology. Computational limits make model

simplification a useful strategy. To this end enzyme

kinetic models are translated in a number of formats

to reduce the model complexity. Hadlich et al. [61]

developed an algorithm to automate the process of

kinetic format translation. Bentley [36] proposes

methods called systemic computation (SC) and

fractal proteins for improving the simulations of bio-

logical systems. OptCircuit is an optimization-based

method for automatically identifying the required

circuits from a database of components and kinetic

parameters [62]; this method may work well

with Ellis et al.’s strategy of designing networks

from quantitatively characterized libraries of diversi-

fied components [11]. Cantone et al. [63] developed

a small synthetic gene network to assess current

modeling and reverse-engineering algorithms.

Models based on ordinary differential equations and

Bayesian networks were qualitatively accurate, but it

is not yet clear if these conclusions are generalizable

to the analysis of larger networks. We see that

the need for an unambiguous, quantitative, and

collaborative exchange of digital, computerized

information is currently being addressed by a variety

of standards, databases and software.

Improvements in algorithms for analyzing

networks in synthetic and systems biology are

needed, because our current, relatively simple

models do not have the capacity to handle the abun-

dant data acquired from complex biological systems

[31]. Issues in network analysis are exemplified by

the fact that inferences from small-sized networks

cannot be simply extrapolated to larger networks,

as Stumpf etal. [64] have shown that sub-networks of

a scale-free network are not necessarily scale free.

In general, a rigorous statistical analysis of network

data is difficult because there are numerous correla-

tions [31].

IN VITROThe informatics approach can also reframe the

in vitro aspects of synthetic biology. In this

light, DNA synthesis from computer-aided design

is essentially a format conversion from bytes

Challenges of informatics in synthetic biology 85

Page 7: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

to basepairs. Biological parts development often

involves a refinement of signal transduction, or

data flow within a biological circuit. Protein

complexes can be modeled as instances of noisy

communication channels [65, 66]. Indeed, because

information-processing devices such as logic gates

have been already implemented in vitro (Figure 4).

In other words, critical informatics technology

Figure 4: Transcription-based logic gates constructed from modular transcription units [67]. Electronic logic gatesare the fundamental building blocks of computational ability. For each logic gate, the table presents the booleanlogic (column 2), design a biological module (column 3) and emulate the electronic counterpart with an expressionprofile (column 4). Each network architecture represents a synthetically designed component.

86 Alterovitz et al.

Page 8: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

in synthetic biology resides not only in computers

but also in biological circuitry as well.

DNA synthesisFollowing a successful simulation, the computer-

based network design must be translated into an

invitro DNA sequence. BioBrick-formatted synthetic

genes can provide a set of required, proofread

sequences that one can splice together (Figure 5).

Combined, the much longer sequence codes for

the synthetic biological circuitry. On the other

hand, doing away with the BioBrick parts connec-

tion formats can streamline the design of synthetic

DNA sequences [48], as long as sequence proofread-

ing can still be done. In other words, an approach

independent of the build-by-parts strategy requires

a high-fidelity method for writing the basepair

sequence, because even a single basepair mutation

has been shown to cause system-wide disorders

such as sickle-cell anemia. Linshiz et al. [68] (this

paper proposes a strategy to make large, error-free

DNA target molecules) developed a method for

writing long, error-free DNA from potentially

faulty building blocks (Figure 6). Gibson et al. [69]

(this paper demonstrates that it is possible to handle

an entire Mycoplasma genome with high fidelity)

developed a method for constructing large DNA

molecules, such as a 582 970-basepair Mycoplasmagenitalium genome.

Biological component designJust as electrical circuits need devices that control

data flow, biological networks need biological parts

that modulate signal transduction. Informatics issues

in components and the network overlap with

Figure 5: Assembling DNA molecules with BioBrick parts [70]. Gene A is to be added to the standardizedplasmid p1. Neither Gene A nor any gene within p1may have sequences that can be recognized by the four restric-tion enzymes used during the main assembly process. Gene A is flanked by ‘prefix’ and ‘suffix’ sequences whichare deliver by primers during PCR. Alternatively, one can acquire a plasmid pA that already has Gene A withthe necessarily prefix and suffix. Plasmid p1 and Gene A undergo separate restriction enzyme digests, and arelater combined to form p1A.The plasmid p1A is now ready to receive another gene.

Challenges of informatics in synthetic biology 87

Page 9: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

each other. We will start with components and tran-

sition into network informatics.

Synthetic biological devices are often made

from natural devices with evolutionary optimization.

Natural components may therefore have context

dependence that precludes them from compatible

connection points with other devices. One example

is the codon mismatch that occurs when a biological

Figure 6: Recursive construction of error-free DNA molecules from imperfect oligonucleotides [68]. (A) GFPDNA construction. The entire sequence is divided into overlapping ones in silico. These pieces are synthesizedconventionally. Assembly by overlapping ssDNA results in a target molecule, which are then sequenced to finderrors. Error-free segments are derived, amplified and assembled by overlapping ssDNA takes place again. Thisloop continues until an error-free target molecule is formed. (B) Construction of ssDNA from two overlappingsequences. During PCR, one primer is a phosphorylated primer, which becomes a degradation target of Lambdaendonuclease.

88 Alterovitz et al.

Page 10: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

part is transferred from one organism to a host of

a different kingdom [71]. In order to adapt natural

parts to the needs of synthetic biology, they must

be standardized. Lucks [72] proposed a set of general

features to consider when developing a biological

device. An ideal part would be independent, reliable,

tunable, orthogonal and composable. In other words,

it does not interfere with other circuitry, functions

as intended (context independent), can function

in a range of selectable modes, can be tuned so

that it does not interfere with similar devices, and

can be combined to function in a system predictably.

In addition, DNA sequences must adhere to the rules

of transcription control [73]. Suarez et al. [74]

discuss the challenges in the computational design

of proteins. Martin et al. [71] review guidelines for

engineering synthetic enzymes. Recent synthetic

biology devices include a cellular counter in

Escherichia coli [75], a tunable synthetic mammalian

oscillator [76], an aptazyme-based riboswitch [77],

a tunable synthetic gene oscillator [78] and a

double inversion recombination switch [79].

Incidentally, Tsai et al. [80] argue that biological

oscillators sometimes contain positive feedback

loops in order to achieve frequency control without

amplitude change. Dawid et al. [81] designed syn-

thetic RNA regulatory elements based on transcrip-

tion attenuator control.

Arkin [79] proposed developing a group of

devices from a common core structure by altering

a particular key property. Calling them a ‘family of

parts’, Arkin argued that related devices are likely

to share characterization protocols. Common proto-

cols for a versatile set of devices would simplify the

physical composition process, and this would have

important ramifications on design strategies as well

as parts organization within the Registry. However,

it is important to keep in mind that similar devices

raise the risk of crosstalk and interference with each

other [10]. Unlike electrical circuits, the same ‘logic

gate’ probably cannot be used in the same space.

Ellis et al. [11] proposed the development of

libraries of diversified components—parts that are

functionally equivalent but have differences in the

nonessential sequences—for improving design strat-

egy. Differences in nonessential sequences affect

quantitative functional efficiencies of components,

and this in turn can have a large impact on overall

network behavior. If required documented libraries

are established prior to design, then one can

accurately simulate and fine-tune a system by picking

the components with appropriate functional efficien-

cies. In other words, Ellis etal. [11] proposes to move

component ‘tweaking’ to the front-end of the

synthetic biology infrastructure and upstream of

software-based network design. Such ‘diversified’

parts would address issues of emergent properties,

biological noise and tunability. It may also address

the need for compatible inputs and outputs in serial

connectivity. Ellis et al. [11] successfully employed

the above strategy in the development of a feed-

forward loop network and a gene timer network.

Establishment of such libraries will probably occur

not only for DNA but RNA and proteins as well.

Biological noise presents problems for informa-

tion flow through biological parts. A digital step-

like interface between components may reduce

the effect that noise would have on an analog

system [82].

Network informaticsInformation flow can also be addressed from the

perspective of networks. The oldest synthetic bio-

logical circuits were based on transcriptional regula-

tion. Within the transcriptional network, two genes

were connected by having one gene code for the

transcription factor of the promoter of the other

gene. Carrera et al. [83] (this paper demonstrates

a method to model and modify the transcription

regulation network of E. coli ) proposed to rewire

the transcription regulation network by exchanging

the endogenous promoters. Other biological circuit

experiments have involved RNA-based regulation

and metabolism [84]. Recently, Bashor et al. [85]

[this paper introduces and demonstrates the idea

of using protein scaffolds (and hence protein–

protein interactions) to control synthetic regulatory

networks] constructed a biological network

through protein–protein interactions. Compared to

translation-dependent regulatory circuits, protein-

level connections have the potential for quicker

response with lower cellular resource consumption

rates. Engineering of protein–protein interactions

becomes a tractable problem if system design

leverages well-characterized protein domains [86]

that enable a combinatorial strategy to generating

synthetic proteins and signaling pathways. In antici-

pation of multi-cellular assemblies with synthetic

signaling requirements, Weber et al. developed

a metabolite-controlled intercellular signaling

Challenges of informatics in synthetic biology 89

Page 11: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

method [87]. To achieve transient system dynamics,

Yin et al. [88] argued for augmenting target structure

sequences with the capability to automatically con-

struct self-assembly and disassembly pathways. Yin etal. [88] implemented such a system with a DNA

hairpin motif.

Biological noise is also a problem at the net-

work level. Studying noise in complex networks tra-

ditionally involves computational perturbation

methods, because an in vitro implementation of

an arbitrary noise source is not always trivial. To

bridge this gap, Lu et al. [89] have developed a

means of implementing simple in silico perturbation

sources as in vitro molecular noise generators.

IN VIVOWhereas in vitro synthetic biology enables biochem-

ical flexibility, invivo synthetic biology endows large-

scale production capacity to a biological network

[90]. The first step in the transition from in vitroto in vivo is the insertion of the constructed DNA

into a biological chassis where transcription and

translation could take place, such as a bacterium’s

genome. Itaya et al. [91] addressed physicochemical

stability issues of large DNA by developing

the Bacillus subtilis genome (BGM) vector, which

accommodates large DNA as part of the B. subtilisgenome, which might combine well with cell-

free expression systems in the future [92]. Shao

et al. [93] developed a method for assembling a

19 kb recombinant DNA molecule in Saccharomycescerevisiae. Minaeva et al. [56] integrated two recom-

bination methods—phages site-specific and Red/

ET-mediated—into a straightforward, convenient

protocol. This method, called the Dual-In/Out

Strategy, was applied successfully on plasmid-less

marker-less E. coli.When a biological network is expressed by syn-

thetic DNA sequences within the host, or engineer-

ing chassis, crosstalk between the host and synthetic

circuitry can adversely affect performance. For

example, endogenous carotenoid pathways in

higher plants seem to resist synthetic alterations

[94]. Emergent problems from crosstalk is not sur-

prising, even for commonly studied organisms like E.coli, because significant portions of organismal gene

regulatory networks are not yet known [95].

Hence, minimizing or at least controlling crosstalk

is a desired goal in network information control.

One approach is to reach community consensus on

a ‘standard’ organism in which developed ‘standard’

parts exhibit negligible crosstalk and other desired

properties. The obvious candidates are those that

already have methods for accommodating large

DNA molecules: S. cerevisiae [93] and E. coli [56].

However, both species will probably require cross-

talk reduction through numerous deletions of non-

essential genes.

The logical endpoint of systematic nonessential

gene deletion is the concept of the minimal cell

[96, 97], which in theory is composed only of

genetic material critical to survival. Natural minimal

cells like Pelagibacter ubique that thrive in resource-

deficient environments may also be good starting

points for the development of a standard artificial

organism [97]. The standard artificial organism,

however, is not necessarily a minimal cell, because

effective crosstalk elimination may occur before

all nonessential genes are deleted. In addition, the

genomes of parasitic minimal cells and artificially

minimized cells may present fastidious habits and

lack the reliability of a bulkier genome [82].

Synthetic biology needs a host that minimizes inter-

ference while providing robust cellular infrastructure,

and minimals cells do not guarantee that.

Another way to address crosstalk is to develop

orthogonal ribosomes and mRNA that interact

only with each other and with neither the ribosome

nor the genetic material of the host organism [98].

Evolved ribosome–mRNA pairs can then be

used to construct cellular networks [99]. With this

approach, a synthetic type 1 coherent feed-forward

loop was developed in E. coli [100] (this paper

demonstrates that synthetic circuits can based

on orthogonal transcription–translation networks).

With enough orthogonal components, it may be

possible to build a parallel metabolism within the

cell [101].

Ultimately however, it may be necessary to

implement physicochemical partitions with the

phospholipid bilayer, whose adoption in natural

modules poses a convincing argument for its use in

synthetic biology. The bilayer can form a liposome

into which one can incorporate several biochemical

modules [96], which roughly outline the series

of steps needed. This is essentially a ‘ground-up’

approach to the minimal cell, and the option to

use artificial, low-interference modules suggests

a higher chance of success than the ‘top-down’

90 Alterovitz et al.

Page 12: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

approach of multiple gene deletions. Recently,

Kuruma et al. [102] (this paper represents the latest

progress in the development of the liposome into

a viable chassis) developed a liposome-based system

that synthesizes phosphatidic acid, a major constitu-

ent of cell membranes. A cell-free translation sys-

tem was encapsulated in a liposome, in which

functional membrane enzymes were synthesized.

This represents a significant step toward liposome-

encapsulated phospholipid bilayer biosynthesis and

points toward synthetic modules with autopoietic

capabilities.

At the border of in vitro and in vivo synthetic

biology is the cell-free system, a platform for

implementing complex biological processes outside

a cell membrane. Historically, it has been difficult

to activate more than one biochemical network

in a single platform, but Jewett et al. [103]

(this paper represents the latest progress on

integrating multiple biochemical networks in a

single cell-free system) have recently developed

a cell-free system capable of co-activating central

catabolism, oxidative phosphorylation, and protein

synthesis.

Once a synthetic network has been fully imple-

mented in vivo, the combined host-guest network

must be characterized for performance and poten-

tial crosstalk. However, experimental perturbations

inevitably lead to data noise. In fact, for pro-

tein interactions networks the rate of false-positive

and false-negative results may be as high as

40% [104, 105]. To address this problem Lappe

and Holm [106] have devised a means of efficiently

deriving interaction networks. Cantone et al. [63]

found that reverse-engineering methods based on

ordinary differential equations and Bayesian net-

works were effective at inferring the structure of a

small, synthetic gene regulatory network.

CONCLUSIONThe survey of the role of information processing

in synthetic biology reveals how future develop-

ments may be influenced by current ones

(Table 1). Consolidation of and additions to data

exchange formats are needed to enable efficient

communication between people and software. The

likely improvement in quantitative precision of

component functional data will reduce network

design unpredictability and post hoc tweaking.

Current hosts for in vivo synthetic biology include

E. coli and S. cerevisiae, but future hosts may take

a more minimalist approach and incorporate ortho-

gonal metabolic systems.

Synthetic biology is the next step in the progress

of engineering biological systems. The key infor-

matics challenges (some of which overlap with

those of systems biology) are standardization, devel-

opment of appropriate statistical analysis methods,

digital data integrity, biological noise control and

limitation of crosstalk (Table 2). When these issues

are properly addressed, the result will be artificial

organisms unrivaled in their biochemical

sophistication.

Table 1: Recent major developments in synthetic biology. For each development, the row indicates its immediateimpact niche, and the column indicates the informatics scope. However, all items noted have the potential todeeply influence the progress of synthetic biology in the next decade

In silico In vitro In vivo

Biological part Proposals to extend partsdocumentation standards[11, 12]

Proposal for a revised qualitycontrol scheme for the PartsRegistry [13]

Proposal to develop libraries ofdiversified components [11]

Network SBGN [23]CellML ModelRepository [38]

Increased size of high-fidelityDNA [68, 69] Synthetic networksbased on protein-proteininteractions [85]

Organism Redesigning global transcriptionregulation [83]

Semi-synthetic minimal cells [102]

Orthogonal transcription^translationnetworks [100]

Integrated cell-free metabolicplatform [103]

Challenges of informatics in synthetic biology 91

Page 13: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

Table2:

Idealized

recipe

forsynthe

ticbiolog

y.Fo

reach

step,p

otentially

useful

Toolsareidentified.

AllstepsexhibitIssues

atthistim

e.Em

ergent

prop

ertie

scanbe

thou

ghtof

astheresultof

biologicalno

ise.Notethat

manyof

theprob

lemscanbe

traced

tojust

acoup

leof

infrastructure

issues.For

exam

ple,having

manychoices

forpartswithvery

precisecharacterizatio

nswou

ldaddressissues

ofem

ergent

prop

ertie

sat

theinvitro

level.Ifgene

regulatory

netw

orks

ofthechassiswereto

bebe

tter

documented,

then

crosstalkissues

(and

thereforeinvivoem

ergent

prop

ertie

s)wou

ldprob

ablypo

seless

ofaprob

lem

Step

(s)

Tools

Major

issues

Pro

pos

als

Networkdesig

nandsim

ulation

Ope

nCell[19],

Cytoscape

[50],

CellD

esigner[55],and

othersoftware[29,36,5

6]PartsRegistrydataba

se.

Nocomprehensiv

edata

standard

[9]

Insufficientpartsdo

cumentatio

n[9]

Design-friendlypartsdo

cumentatio

n[9^12]

PartsRegistryqu

alitycontrol[13]

Com

putatio

nald

esignmetho

ds[29,36,62]

Librariesof

diversified

compo

nents[11]

DNA

synthe

sis

BioB

rick

ligationscheme(Figure2)

[7]

The

integrityof

largeDNAtarget

molecules

Highfidelityconstruc

tionmetho

dsforlargeDNA

molecules

[68,

69]

Stream

lined

sequ

ence

desig

nmetho

d[48]

Invitro

testing

PCRto

inferne

tworkstructure[93]

Emergent

prop

ertie

sLack

ofrobu

ststatistic

altoolsto

analyzene

tworks

Strategy

toderive

interactionne

tworks

[106].

Simulationstrategies

toincrease

predictability[36]

Increasedprecisionin

partsselectionand

documentatio

n[11]

Chassisloading

E.coli[56,75,100

]S.cerevisiae[93]

Crosstalk

[10]

Cod

onmismatch

[71]

Largelyun

mappe

dgene

regulatory

netw

orks

[95]

B.subtilis[91]

Cell-freesystem

[92,103]

Genedeletio

nsof

host

geno

me

Mod

eling[58]

Ortho

gonalexp

ressionsystem

s[98^101]

Invivotesting

Fluo

rescentproteinexpressio

n[75]

Mutations

Apo

ptosis

Emergent

prop

ertie

s

Strategy

tostud

ysubtlemutations

[57]

Strategy

toderive

interactionne

tworks

[106]

92 Alterovitz et al.

Page 14: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

Key Points

� The main goal of synthetic biology is to start with a set offunctions and properties, and build a suitable system out ofbiological components.

� Component data standards (such as BioBrick) will likelyrequire extensions to account for quantitative performancedata, so that networkdesign can becomemore predictable.

� Data standards for networks and componentswill likely consoli-date in order to increase the accuracy of design simulations andefficiency of collaborations.

� Biological parts development will likely employ the strategyof building quantitatively characterized libraries of diversifiedcomponents, because these libraries will increase the accuracyof network-level simulations.

� Host interference of synthetic networks might be effectivelyaddressed by gene deletions and the use of orthogonal proteinexpression systems.

FUNDINGThis work was supported in part by the National

Library of Medicine (NLM/NIH) under grant K99

LM009826 and the National Human Genome

Research Institute (NHGRI/NIH) under grants

1R01HG003354 and 1R01HG004836.

References1. Barrett CL, Kim TY, Kim HU, et al. Systems biology as a

foundation for genome-scale synthetic biology. Curr OpinBiotechnol 2006;17:488–92.

2. Lee SK, Chou H, Ham TS, et al. Metabolic engineeringof microorganisms for biofuels production: from bugs tosynthetic biology to fuels. Curr Opin Biotechnol 2008;19:556–63.

3. Chang MCY, Keasling JD. Production of isoprenoidpharmaceuticals by engineered microbes. Nat Chem Biol2006;2:674–81.

4. Weber W, Schoenmakers R, Keller B, et al. A syntheticmammalian gene circuit reveals antituberculosis com-pounds. Proc Natl Acad Sci 2008;105:9994–8.

5. Lu TK, Collins JJ. Engineered bacteriophage targeting genenetworks as adjuvants for antibiotic therapy. Proc Natl AcadSci 2009;106:4629–34.

6. Marguet P, Balagadde F, Tan C, et al. Biology by design:reduction and synthesis of cellular components andbehaviour. J RSoc Interface 2007;4:607–23.

7. Knight T. Idempotent Vector Design for StandardAssembly of Biobricks. MIT Synth Biol Wkg Grp 2003;1:1–11. http://hdl.handle.net/1721.1/21168 (23 October2009, date last accessed).

8. Brown J. The iGEM competition: building with biology.IETSynth Biol 2007;1:3–6.

9. Purnick PEM, Weiss R. The second wave of syntheticbiology: from modules to systems. Nat Rev Mol Cell Biol2009;10:410–22.

10. Matsuoka Y, Ghosh S, Kitano H. Consistent designschematics for biological systems: standardization of repre-sentation in biological engineering. JRSoc Interface (Advance

online version) 2009, doi: 10.1098/rsif.2009.0046.focus(23 October 2009, date last accessed).

11. Ellis T, Wang X, Collins JJ. Diversity-based, model-guidedconstruction of synthetic gene networks with predictedfunctions. Nat Biotech 2009;27:465–71.

12. Canton B, Labno A, Endy D. Refinement and standardiza-tion of synthetic biological parts and devices. Nat Biotech2008;26:787–93.

13. Peccoud J, Blauvelt MF, Cai Y, etal. Targeted developmentof registries of biological parts. PLoSONE 2008;3:e2671.

14. Participants. PoBoL: provisional BioBrick language. In:Standards and Specifications in Synthetic Biology WorkshopApril 26^27; Seattle,WA,USA, 2008.

15. Hucka M, Finney A, Sauro HM, et al. The systems biologymarkup language (SBML): a medium for representation andexchange of biochemical network models. Bioinformatics2003;19:524–31.

16. Finney A, Hucka M. Systems biology markuplanguage: Level 2 and beyond. Biochem SocTrans 2003;31:1472–3.

17. Hucka M, Finney A, Bornstein BJ, et al. Evolvinga lingua franca and associated software infrastructure forcomputational systems biology: the Systems BiologyMarkup Language (SBML) project. Syst Biol (Stevenage)2004;1:41–53.

18. Endler L, Rodriguez N, Juty N, et al. Designing and encod-ing models for synthetic biology. J R Soc Interface 2009;6:S405–17.

19. Beard DA, Britten R, Cooling MT, et al. CellML metadatastandards, associated tools and repositories. PhilosTransact AMath Phys Eng Sci 2009;367:1845–67.

20. Lloyd CM, Halstead MD, Nielsen PF. CellML: its future,present and past. Prog BiophysMol Biol 2004;85:433–50.

21. Novere NL, Finney A, Hucka M, et al. Minimum informa-tion requested in the annotation of biochemical models(MIRIAM). Nat Biotech 2005;23:1509–15.

22. Le Novere N, Moodie S, Sorokin A, et al. Systems biologygraphical notation: process diagram level 1. Nature Precedings2008;1–75. http://hdl.handle.net/10101/npre.2008.2320.1(23 October 2009, date last accessed).

23. Novere NL, Hucka M, Mi H, et al. The systems biologygraphical notation. Nat Biotech 2009;27:735–741.

24. Workgroup B. BioPAX – biological pathways exchangelanguage, level 3, release candidate 3 (version 0.92) docu-mentation. BioPAX Workgroup 2007.

25. Ausbrooks R, Buswell S, Carlisle D, et al. MathematicalMarkup Language (MathML) Version 2.0, 2nd edn. Editedby (NAG) DC, Patrick Ion (Mathematical Reviews AMS,Robert Miner (Design Science I, Scope) NPP: W3C;REC-MathML2-20031021, 2003. http://dret.net/biblio/reference/mathml2sec (23 October 2009, date last accessed).

26. RDF/XML Syntax Specification (Revised) on WorldWide Web URL: http://www.w3.org/TR/rdf-syntax-grammar/ (23 October 2009, date last accessed).

27. Le Novere N, Courtot M, Laibe C. Adding semanticsin kinetics models of biochemical pathways. In: Proceedingsof the 2nd International Symposium on experimental standardconditions of enzyme characterizations (ESEC 2006) 19^23 March2006; Beilstein Institute, Frankfurt am Main, Germany, 2007,pp. 137–53. http://www.beilstein-institut.de/escec2006/proceedings/LeNovere/LeNovere.pdf (23 October 2009,date last accessed).

Challenges of informatics in synthetic biology 93

Page 15: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

28. Le Novere N. Model storage, exchange and integration.BMCNeurosci 2006;7(Suppl 1):S11.

29. Marchisio MA, Stelling J. Computational design of syn-thetic gene circuits with composable parts. Bioinformatics2008;24:1903–10.

30. Bornstein BJ, Keating SM, Jouraku A, et al. LibSBML: anAPI library for SBML. Bioinformatics 2008;24:880–1.

31. de Silva E, Stumpf MPH. Complex networks and simplemodels in biology. J RSoc Int 2005;2:419–30.

32. Le Novere N, Bornstein B, Broicher A, et al.BioModels Database: a free, centralized database ofcurated, published, quantitative kinetic models ofbiochemical and cellular systems. Nucleic Acids Res 2006;34:D689–91.

33. Erhard F, Friedel CC, Zimmer R. FERN - a Java frame-work for stochastic simulation and evaluation of reactionnetworks. BMCBioinformatics 2008;9:356.

34. Hower V, Mendes P, Torti FM, et al. A general map ofiron metabolism and tissue-specific subnetworks. MolBiosyst 2009;5:422–43.

35. Calzone L, Gelay A, Zinovyev A, et al. A comprehensivemodular map of molecular interactions in RB/E2Fpathway. Mol Syst Biol 2008;4:173.

36. Bentley PJ. Methods for improving simulations of biologicalsystems: systemic computation and fractal proteins. J R SocInt 2009;6:S451–66.

37. Schilstra MJ, Li L, Matthews J, et al. CellML2SBML:conversion of CellML into SBML. Bioinformatics 2006;22:1018–20.

38. Lloyd CM, Lawson JR, Hunter PJ, etal. The CellML modelrepository. Bioinformatics 2008;24:2122–3.

39. Wimalaratne SM, Halstead MDB, Lloyd CM, et al.Biophysical annotation and representation of CellMLmodels. Bioinformatics 2009;25:2263–70.

40. Berners-Lee T, Fielding R, Masinter L. Uniform ResourceIdentifier (URI): Generic Syntax. Request For CommentsArchive 2005, RFC3986: http://www.ietf.org/rfc/rfc3986.txt (23 October 2009, date last accessed).

41. Laibe C, Le Novere N. MIRIAM resources: tools togenerate and resolve robust cross-references in systemsbiology. BMCSyst Biol 2007;1:58.

42. Drager A, Hassis N, Supper J, et al. SBMLsqueezer: aCellDesigner plug-in to generate kinetic rate equationsfor biochemical networks. BMCSyst Biol 2008;2:39.

43. Luciano JS. PAX of mind for pathway researchers. DrugDiscovToday 2005;10:937–42.

44. Stromback L, Lambrix P. Representations of molecularpathways: an evaluation of SBML, PSI MI and BioPAX.Bioinformatics 2005;21:4401–07.

45. Pedersen M, Phillips A. Towards programming languagesfor genetic engineering of living cells. JRSoc Interface 2009;6:S437–50.

46. Bray T, Paoli J, Sperberg-McQueen CM, et al. ExtensibleMarkup Language (XML) 1.0. 5th edn. 2008. http://www.w3.org/TR/xml11/ (23 October 2009, date lastaccessed).

47. Hill AD, Tomshine JR, Weeding EMB, et al. SynBioSS:the synthetic biology modeling suite. Bioinformatics 2008;24:2551–3.

48. Czar MJ, Cai Y, Peccoud J. Writing DNA with GenoCAD.Nucleic Acids Res 2009;37:W40–W47.

49. Garny A, Noble D, Hunter PJ, et al. Cellular OpenResource (COR): current status and future directions.PhilosTRoy Soc A:Math Phys Eng Sci 2009;367:1885–905.

50. Cline MS, Smoot M, Cerami E, et al. Integration of biolo-gical networks and gene expression data using Cytoscape.Nat Protocols 2007;2:2366–82.

51. Guziolowski C, Bourde A, Moreews F, et al. BioQualiCytoscape plugin: analysing the global consistency ofregulatory networks. BMCGenomics 2009;10:244.

52. Bindea G, Mlecnik B, Hackl H, et al. ClueGO: a Cytoscapeplug-in to decipher functionally grouped gene ontologyand pathway annotation networks. Bioinformatics 2009;25:1091–3.

53. Clement-Ziza M, Malabat C, Weber C, et al. Genoscape:a Cytoscape plug-in to automate the retrieval and integra-tion of gene expression data and molecular networks.Bioinformatics 2009;25:2617–8.

54. Gao J, Ade AS, Tarcea VG, et al. Integrating and annotatingthe interactome using the MiMI plugin for cytoscape.Bioinformatics 2009;25:137–8.

55. Funahashi A, Morohashi M, Kitano H, et al.CellDesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO 2003;1:159–62.

56. Minaeva NI, Gak ER, Zimenkov DV, et al. Dual-In/Outstrategy for genes integration into bacterial chromosome:a novel approach to step-by-step construction of plasmid-less marker-less recombinant E. coli strains with predesignedgenome structure. BMCBiotechnol 2008;8:63.

57. Loewe L. A framework for evolutionary systems biology.BMCSyst Biol 2009;3:27.

58. Chen BS, Chang CH, Lee HC. Robust synthetic biologydesign: stochastic game theory approach. Bioinformatics2009;25:1822–30.

59. Chen B-S, Wu C-H. A systematic design method forrobust synthetic biology to satisfy design specifications.BMCSystems Biology 2009;3:66.

60. Banga J. Optimization in computational systems biology.BMCSyst Biol 2008;2:47.

61. Hadlich F, Noack S, Wiechert W. Translating biochemicalnetwork models between different kinetic formats. MetabEngineering 2009;11:87–100.

62. Dasika M, Maranas C. OptCircuit: An optimization basedmethod for computational design of genetic circuits. BMCSystems Biology 2008;2:24.

63. Cantone I, Marucci L, Iorio F, et al. A yeast synthetic net-work for in vivo assessment of reverse-engineering andmodeling approaches. Cell 2009;137:172–81.

64. Stumpf MP, Wiuf C, May RM. Subnets of scale-freenetworks are not scale-free: sampling properties ofnetworks. Proc Natl Acad Sci USA 2005;102:4221–4.

65. Lenaerts T, Ferkinghoff-Borg J, Schymkowitz J, et al.Information theoretical quantification of cooperativity insignalling complexes. BMCSyst Biol 2009;3:9.

66. Lenaerts T, Ferkinghoff-Borg J, Stricher F, et al.Quantifying information transfer by protein domains:analysis of the Fyn SH2 domain structure. BMCStruct Biol2008;8:43.

67. Greber D, Fussenegger M. Mammalian synthetic biology:engineering of sophisticated gene networks. J Biotechnol2007;130:329–45.

94 Alterovitz et al.

Page 16: The challenges of informatics in synthetic biology: from ...projects.iq.harvard.edu/files/bcl/files/alterovitz... · the main goal of synthetic biology is to start with a set of functions

68. Linshiz G, Yehezkel TB, Kaplan S, et al. Recursiveconstruction of perfect DNA molecules from imperfectoligonucleotides. Mol Syst Biol 2008;4:191.

69. Gibson DG, Benders GA, Andrews-Pfannkoch C, et al.Complete chemical synthesis, assembly, and cloning of amycoplasma genitalium genome. Science 2008;319:1215–20.

70. Leonard E, Nielsen D, Solomon K, et al. Engineeringmicrobes with synthetic biology frameworks. TrendsBiotechnol 2008;26:674–81.

71. Martin CH, Nielsen DR, Solomon KV, et al. Syntheticmetabolism: engineering biology at the protein and path-way scales. Chemistry & Biology 2009;16:277–86.

72. Lucks JB, Qi L, Whitaker WR, et al. Toward scalable partsfamilies for predictable design of biological circuits. CurrOpinMicrobiol 2008;11:567–73.

73. Welch M, Villalobos A, Gustafsson C, et al. You’re one ina googol: optimizing genes for protein expression. J R SocInterface 2009;6:S467–76.

74. Suarez M, Jaramillo A. Challenges in the computationaldesign of proteins. J RSoc Interface 2009;6:S477–91.

75. Friedland AE, Lu TK, Wang X, et al. Synthetic genenetworks that count. Science 2009;324:1199–202.

76. Tigges M, Marquez-Lago TT, Stelling J, et al. A tunablesynthetic mammalian oscillator. Nature 2009;457:309–312.

77. Atsushi O, Mizuo M. An Artificial Aptazyme-BasedRiboswitch and its Cascading System in E. coli. Chem BioChem 2008;9:206–9.

78. Stricker J, Cookson S, Bennett MR, et al. A fast, robustand tunable synthetic gene oscillator. Nature 2008;456:516–9.

79. Ham TS, Lee SK, Keasling JD, et al. Design andconstruction of a double inversion recombination switchfor heritable sequential genetic memory. PLoSONE 2008;3:e2815.

80. Tsai TY-C, Choi YS, Ma W, et al. Robust, TunableBiological Oscillations from Interlinked Positive andNegative Feedback Loops. Science 2008;321:126–9.

81. Dawid A, Cayrol B, Isambert H. RNA synthetic biologyinspired from bacteria: construction of transcription attenua-tors under antisense regulation. Physical Biology 2009;6:025007.

82. Andrianantoandro E, Basu S, Karig DK, et al. Syntheticbiology: new engineering rules for an emerging discipline.Mol Syst Biol 2006;2:2006.0028.

83. Carrera J, Rodrigo G, Jaramillo A. Model-based redesignof global transcription regulation. Nucleic Acids Res 2009;37:e38.

84. Guye P, Weiss R. Customized signaling with reconfigurableprotein scaffolds. Nat Biotech 2008;26:526–8.

85. Bashor CJ, Helman NC, Yan S, et al. Using engineeredscaffold interactions to reshape MAP kinase pathway signal-ing dynamics. Science 2008;319:1539–43.

86. Pawson T, Nash P. Assembly of Cell Regulatory SystemsThrough Protein Interaction Domains. Science 2003;300:445–52.

87. Weber W, Schuetz M, Denervaud N, et al. A syntheticmetabolite-based mammalian inter-cell signaling system.Mol Biosyst 2009;5:757–63.

88. Yin P, Choi HMT, Calvert CR, et al. Programming bio-molecular self-assembly pathways. Nature 2008;451:318–22.

89. Lu T, Ferry M, Weiss R, et al. A molecular noise generator.Phys Biol 2008;5:036006.

90. Forster AC, Church GM. Synthetic biology projectsin vitro. Genome Res 2007;17:1–6.

91. Itaya M, Fujita K, Kuroki A, et al. Bottom-up genomeassembly using the Bacillus subtilis genome vector. NatMeth 2008;5:41–3.

92. Yoshihiro S, Yutetsu K, Bei-Wen Y, et al. Cell-free transla-tion systems for protein engineering. FEBS J 2006;273:4133–40.

93. Shao Z, Zhao H, Zhao H. DNA assembler, an in vivogenetic method for rapid construction of biochemicalpathways. Nucleic Acids Res 2009;37:e16.

94. Fraser PD, Enfissi EMA, Bramley PM. Genetic engineeringof carotenoid formation in tomato fruit and the potentialapplication of systems and synthetic biology approaches.Arch Biochem Biophys 2009;483:196–204.

95. Keseler IM, Bonavides-Martinez C, Collado-Vides J, et al.EcoCyc: a comprehensive view of Escherichia coli biology.Nucleic Acids Res 2009;37:D464–70.

96. Forster AC, Church GM. Towards synthesis of a minimalcell. Mol Syst Biol 2006;2:45.

97. Moya A, Gil R, Latorre A, et al. Toward minimal bacterialcells: evolution vs. design. FEMS Microbiol Rev 2009;33:225–35.

98. Rackham O, Chin JW. A network of orthogonal ribosome-mRNA pairs. Nat Chem Biol 2005;1:159–66.

99. Rackham O, Chin JW. Synthesizing cellular networksfrom evolved ribosome-mRNA pairs. Biochem Soc Trans2006;34:328–9.

100.An W, Chin JW. Synthesis of orthogonal transcription-translation networks. Proc Natl Acad Sci 2009;106:8477–82.

101.Filipovska A, Rackham O. Building a parallel metabolismwithin the cell. ACSChem Biol 2008;3:51–63.

102.Kuruma Y, Stano P, Ueda T, et al. A synthetic biologyapproach to the construction of membrane proteins insemi-synthetic minimal cells. Biochimica et Biophysica Acta(BBA) - Biomembranes 2009;1788:567–74.

103. Jewett MC, Calhoun KA, Voloshin A, et al. An integratedcell-free metabolic platform for protein production andsynthetic biology. Mol Syst Biol 2008;4:220.

104.Tong AH, Lesage G, Bader GD, et al. Global mapping ofthe yeast genetic interaction network. Science 2004;303:808–13.

105.Bader JS, Chaudhuri A, Rothberg JM, et al. Gaining con-fidence in high-throughput protein interaction networks.Nat Biotechnol 2004;22:78–85.

106.Lappe M, Holm L. Unraveling protein interaction networkswith near-optimal efficiency. Nat Biotechnol 2004;22:98–103.

Challenges of informatics in synthetic biology 95


Recommended