Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
1. Lecture WS 2004/05
Bioinformatics III 1
Bioinformatics III “Systems biology”,“Integrative cell biology”
Course will address two areas:
25% genomics: single protein phylogenies versus genome rearrangement,
comparative genomics
75% integrated view of cellular networks
1. Lecture WS 2004/05
Bioinformatics III 2
Content
Week1 scale-free networks in biology
Week2 transcription, regulatory networks
Week3 protein complexes (Cellzome, Aloy et al. 2004)
Week4 protein networks: exp. data (Y2H; MS), computational data (Rosetta)
Week5 protein networks: graphical layout (force minimization)
Week6 protein networks: quality check (Bayesian analysis)
Week7 protein networks: modularity?
Week8 phylogeny
Week9 genome rearrangement (breakpoint analysis)
Week10+11 metabolic networks: metabolic flux analysis, extreme pathways,
elementary modes, C13 method
Week12 mathematical modelling of signal transduction networks
Week13 integration of protein networks with metabolic pathways
Week14 exam
1. Lecture WS 2004/05
Bioinformatics III 3
Literature
lecture slides will be available 1-2 days prior to lecture
suggested reading: links will be put up on course website
http://gepard.bioinformatik.uni-saarland.de/teaching...
1. Lecture WS 2004/05
Bioinformatics III 4
assignments
12 weekly assignments planned
Homework assignments are handed out in the Thursday lectures and are
available on the course website on the same day.
Solutions need to be returned until Thursday of the following week 14.00
to Tihamer Geyer in room 1.09 Geb. 17.1, first floor, or handed in prior (!) to the
lecture starting at 14.15. 2 students may submit one joint solution.
Also possible: submit solution by e-mail as 1 printable PDF-file to
Tutorial: participation is recommended but not mandatory. Tue 11-13.
Homeworks submitted on Thursdays will be discussed on the following Tuesday.
In case of illness please send E-mail to:
[email protected] and provide a medical certificate.
1. Lecture WS 2004/05
Bioinformatics III 5
Schein = successful written exam
The successful participation in the lecture course („Schein“) will be certified upon
successful completion of the written exam in February 2005.
Participation at the exam is open to those students who have received 50% of
credit points for the 12 assignments.
Unless published otherwise on the course website until 3 weeks prior to exam,
the exam will be based on all material covered in the lectures and in the
assignments.
In case of illness please send E-mail to:
[email protected] and provide a medical certificate.
A „second and final chance“ exam will be offered in April 2005.
1. Lecture WS 2004/05
Bioinformatics III 6
tutor
Dr. Tihamer Geyer – assignments
Geb. 17.1, room 1.09
1. Lecture WS 2004/05
Bioinformatics III 7
Systems biology
Biological research in the 1900s followed a reductionist approach:
detect unusual phenotype isolate/purify 1 protein/gene, determine its
function
However, it is increasingly clear that discrete biological function can only rarely
be attributed to an individual molecule.
new task of understanding the structure and dynamics of the complex
intercellular web of interactions that contribute to the structure and function of
a living cell.
1. Lecture WS 2004/05
Bioinformatics III 8
Systems biology
Development of high-throughput data-collection techniques,
e.g. microarrays, protein chips, yeast two-hybrid screens
allow to simultaneously interrogate all cell components at any given time.
there exists various types of interaction webs/networks
- protein-protein interaction network
- metabolic network
- signalling network
- transcription/regulatory network ...
These networks are not independent but form „network of networks“.
1. Lecture WS 2004/05
Bioinformatics III 9
DOE initiative: Genomes to Lifea coordinated effort
slides borrowedfrom talk of
Marvin FrazierLife Sciences DivisionU.S. Dept of Energy
1. Lecture WS 2004/05
Bioinformatics III 10
Facility IProduction and Characterization of Proteins
Estimating Microbial Genome Capability
• Computational Analysis– Genome analysis of genes, proteins, and operons– Metabolic pathways analysis from reference data– Protein machines estimate from PM reference data
• Knowledge Captured– Initial annotation of genome– Initial perceptions of pathways and processes– Recognized machines, function, and homology– Novel proteins/machines (including
prioritization)– Production conditions and experience
1. Lecture WS 2004/05
Bioinformatics III 11
• Analysis and Modeling
– Mass spectrometry expression analysis
– Metabolic and regulatory pathway/ network analysis and modeling
• Knowledge Captured– Expression data and conditions– Novel pathways and processes– Functional inferences about novel
proteins/machines– Genome super annotation: regulation, function,
and processes (deep knowledge about cellular subsystems)
Facility II Whole Proteome Analysis
Modeling Proteome Expression, Regulation, and Pathways
1. Lecture WS 2004/05
Bioinformatics III 12
Facility III Characterization and Imaging of Molecular Machines
Exploring Molecular Machine Geometry and Dynamics
• Computational Analysis, Modeling and Simulation
– Image analysis/cryoelectron microscopy
– Protein interaction analysis/mass spec
– Machine geometry and docking modeling
– Machine biophysical dynamic simulation
• Knowledge Captured
– Machine composition, organization, geometry,
assembly and disassembly
– Component docking and dynamic simulations
of machines
1. Lecture WS 2004/05
Bioinformatics III 13
Facility IVAnalysis and Modeling of Cellular Systems
Simulating Cell and Community Dynamics
• Analysis, Modeling and Simulation
– Couple knowledge of pathways, networks, and
machines to generate an understanding of
cellular and multi-cellular systems
– Metabolism, regulation, and machine simulation
– Cell and multicell modeling and flux visualization
• Knowledge Captured
– Cell and community measurement data sets
– Protein machine assembly time-course data sets
– Dynamic models and simulations of cell processes
1. Lecture WS 2004/05
Bioinformatics III 14
“Genomes To Life” Computing Roadmap
Biological Complexity
ComparativeGenomics
Constraint-BasedFlexible Docking
Co
mp
uti
ng
an
d I
nfo
rmat
ion
In
fras
tru
ctu
re C
apab
ilit
ies
Constrained rigid
docking
Genome-scale protein threading
Community metabolic regulatory, signaling simulations
Molecular machine classical simulation
Protein machineInteractions
Cell, pathway, and network
simulation
Molecule-basedcell simulation
Current U.S. Computing
1. Lecture WS 2004/05
Bioinformatics III 15
First breakthrough: scale-free metabolic networks
(d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology.
(e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical
architecture of metabolism (The data shown in d and e represent an average over 43
organisms).
(f) The flux distribution in the central metabolism of Escherichia coli follows a power law,
which indicates that most reactions have small metabolic flux, whereas a few reactions, with
high fluxes, carry most of the metabolic activity. It should be noted that on all three plots the
axis is logarithmic and a straight line on such log–log plots indicates a power-law scaling.
CTP, cytidine triphosphate; GLC, aldo-hexose glucose; UDP, uridine diphosphate; UMP,
uridine monophosphate; UTP, uridine triphosphate.Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
1. Lecture WS 2004/05
Bioinformatics III 16
Second breakthrough: Yeast protein interaction network:first example of a scale-free network
A map of protein–protein interactions in
Saccharomyces cerevisiae, which is
based on early yeast two-hybrid
measurements, illustrates that a few
highly connected nodes (which are also
known as hubs) hold the network
together.
The largest cluster, which contains
78% of all proteins, is shown. The colour
of a node indicates the phenotypic effect
of removing the corresponding protein
(red = lethal, green = non-lethal, orange
= slow growth, yellow = unknown).
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
1. Lecture WS 2004/05
Bioinformatics III 17
Characterising metabolic networks
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
To study the network characteristics of the metabolism a graph theoretic description needs to
be established.
(a) illustrates the graph theoretic description for a simple pathway (catalysed by Mg2+-
dependant enzymes).
(b) In the most abstract approach all interacting metabolites are considered equally. The
links between nodes represent reactions that interconvert one substrate into another. For
many biological applications it is useful to ignore co-factors, such as the high-energy-
phosphate donor ATP, which results
(c) in a second type of mapping that connects only the main source metabolites to the main
products.
1. Lecture WS 2004/05
Bioinformatics III 18
Degree
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
The most elementary characteristic of a node is its
degree (or connectivity), k, which tells us how
many links the node has to other nodes.
a In the undirected network, node A has k = 5.
b In networks in which each link has a selected
direction there is an incoming degree, kin, which
denotes the number of links that point to a node,
and an outgoing degree, kout, which denotes the
number of links that start from it.
E.g., node A in b has kin = 4 and kout = 1.
An undirected network with N nodes and L links is
characterized by an average degree <k> = 2L/N
(where <> denotes the average).
1. Lecture WS 2004/05
Bioinformatics III 19
Degree distribution
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
The degree distribution, P(k), gives the probability
that a selected node has exactly k links.
P(k) is obtained by counting the number o f nodes
N(k) with k = 1,2... links and dividing by the total
number of nodes N.
The degree distribution allows us to distinguish
between different classes of networks.
For example, a peaked degree distribution, as
seen in a random network, indicates that the
system has a characteristic degree and that there
are no highly connected nodes (which are also
known as hubs).
By contrast, a power-law degree distribution
indicates that a few hubs hold together numerous
small nodes.
1. Lecture WS 2004/05
Bioinformatics III 20
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
Aa
The Erdös–Rényi (ER) model of a random network starts with N
nodes and connects each pair of nodes with probability p, which
creates a graph with approximately pN (N-1)/2 randomly placed
links.
Ab
The node degrees follow a Poisson distribution, where most
nodes have approximately the same number of links (close to
the average degree <k>). The tail (high k region) of the degree
distribution P(k ) decreases exponentially, which indicates that
nodes that significantly deviate from the average are extremely
rare.
Ac
The clustering coefficient is independent of a node's degree, so
C(k) appears as a horizontal line if plotted as a function of k.
The mean path length is proportional to the logarithm of the
network size, l log N, which indicates that it is characterized by
the small-world property.
Random networks
1. Lecture WS 2004/05
Bioinformatics III 21
Origin of scale-free topology and hubs in biological networks
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
The origin of the scale-free topology in complex networks
can be reduced to two basic mechanisms: growth and
preferential attachment. Growth means that the network
emerges through the subsequent addition of new nodes,
such as the new red node that is added to the network that
is shown in part a . Preferential attachment means that new
nodes prefer to link to more connected nodes. For
example, the probability that the red node will connect to
node 1 is twice as large as connecting to node 2, as the
degree of node 1 (k1=4) is twice the degree of node 2 (k2
=2). Growth and preferential attachment generate hubs
through a 'rich-gets-richer' mechanism: the more connected
a node is, the more likely it is that new nodes will link to it,
which allows the highly connected nodes to acquire new
links faster than their less connected peers.
1. Lecture WS 2004/05
Bioinformatics III 22
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
Scale-free networks Scale-free networks are characterized by a power-law degree
distribution; the probability that a node has k links follows P(k) ~ k- -,
where is the degree exponent. The probability that a node is highly
connected is statistically more significant than in a random graph, the
network's properties often being determined by a relatively small number
of highly connected nodes („hubs“, see blue nodes in Ba).
In the Barabási–Albert model of a scale-free network, at each time point
a node with M links is added to the network, it connects to an already
existing node I with probability I = kI/JkJ, where kI is the degree of node
I and J is the index denoting the sum over network nodes. The network
that is generated by this growth process has a power-law degree
distribution with = 3.
Bb Such distributions are seen as a straight line on a log–log plot. The
network that is created by the Barabási–Albert model does not have an
inherent modularity, so C(k) is independent of k.
(Bc). Scale-free networks with degree exponents 2< <3, a range that is
observed in most biological and non-biological networks, are ultra-small,
with the average path length following ℓ ~ log log N, which is significantly
shorter than log N that characterizes random small-world networks.
1. Lecture WS 2004/05
Bioinformatics III 23
Network measures
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
Scale-free networks and the degree exponent
Most biological networks are scale-free, which means that their
degree distribution approximates a power law, P(k) k- , where
is the degree exponent and ~ indicates 'proportional to'. The
value of determines many properties of the system. The
smaller the value of , the more important the role of the hubs
is in the network. Whereas for >3 the hubs are not relevant, for
2> >3 there is a hierarchy of hubs, with the most connected
hub being in contact with a small fraction of all nodes, and for
= 2 a hub-and-spoke network emerges, with the largest hub
being in contact with a large fraction of all nodes. In general, the
unusual properties of scale-free networks are valid only for <
3, when the dispersion of the P(k) distribution, which is defined
as 2 = <k2> - <k>2, increases with the number of nodes (that
is, diverges), resulting in a series of unexpected features,
such as a high degree of robustness against accidental node
failures. For >3, however, most unusual features are absent,
and in many respects the scale-free network behaves like a
random one.
1. Lecture WS 2004/05
Bioinformatics III 24
Shortest path and mean path length
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
Distance in networks is measured with the path length,
which tells us how many links we need to pass through to
travel between two nodes. As there are many alternative
paths between two nodes, the shortest path — the path
with the smallest number of links between the selected
nodes — has a special role.
In directed networks, the distance ℓAB from node A to
node B is often different from the distance ℓBA from B to
A. E.g. in b , ℓBA = 1, whereas ℓAB = 3. Often there is no
direct path between two nodes. As shown in b, although
there is a path from C to A, there is no path from A to C.
The mean path length, <ℓ>, represents the average over
the shortest paths between all pairs of nodes and offers
a measure of a network's overall navigability.
1. Lecture WS 2004/05
Bioinformatics III 25
Clustering coefficient
Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
In many networks, if node A is connected to B, and B is connected to C,
then it is highly probable that A also has a direct link to C. This
phenomenon can be quantified using the clustering coefficient CI =
2nI/k(k-1), where nI is the number of links connecting the kI neighbours of
node I to each other. In other words, CI gives the number of 'triangles'
that go through node I, whereas kI (kI -1)/2 is the total number of triangles
that could pass through node I, should all of node I's neighbours be
connected to each other. For example, only one pair of node A's five
neighbours in a are linked together (B and C), which gives nA = 1 and CA
= 2/20. By contrast, none of node F's neighbours link to each other,
giving CF = 0. The average clustering coefficient, <C >, characterizes the
overall tendency of nodes to form clusters or groups. An important
measure of the network's structure is the function C(k), which is defined
as the average clustering coefficient of all nodes with k links. For many
real networks C(k) k-1, which is an indication of a network's
hierarchical character.
The average degree <k>, average path length <ℓ> and average
clustering coefficient <C> depend on the number of nodes and links (N
and L) in the network. By contrast, the P(k) and C(k ) functions are
independent of the network's size and they therefore capture a network's
generic features, which allows them to be used to classify various
networks.
1. Lecture WS 2004/05
Bioinformatics III 26
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
Hierarchical networks To account for the coexistence of modularity, local clustering and scale-
free topology in many real systems it has to be assumed that clusters
combine in an iterative manner, generating a hierarchical network.
The starting point of this construction is a small cluster of 4 densely
linked nodes (4 central nodes in Ca).
Next, 3 replicas of this module are generated and the 3 external nodes of
the replicated clusters connected to the central node of the old cluster,
which produces a large 16-node module.
3 replicas of this 16-node module are then generated and the 16
peripheral nodes connected to the central node of the old module, which
produces a new module of 64 nodes. The hierarchical network model
seamlessly integrates a scale-free topology with an inherent modular
structure by generating a network that has a power-law degree
distribution with degree exponent = 1 + ln4/ln3 = 2.26 (Cb) and a
large, system-size independent average clustering coefficient <C> ~ 0.6.
The most important signature of hierarchical modularity is the scaling of
the clustering coefficient, which follows C(k) ~ k-1 a straight line of slope -
1 on a log–log plot (Cc). A hierarchical architecture implies that sparsely
connected nodes are part of highly clustered areas, with communication
between the different highly clustered neighbourhoods being maintained
by a few hubs (Ca).
1. Lecture WS 2004/05
Bioinformatics III 27
First breakthrough: scale-free metabolic networks
(d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology.
(e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical
architecture of metabolism (The data shown in d and e represent an average over 43
organisms).
(f) The flux distribution in the central metabolism of Escherichia coli follows a power law,
which indicates that most reactions have small metabolic flux, whereas a few reactions, with
high fluxes, carry most of the metabolic activity. It should be noted that on all three plots the
axis is logarithmic and a straight line on such log–log plots indicates a power-law scaling.
CTP, cytidine triphosphate; GLC, aldo-hexose glucose; UDP, uridine diphosphate; UMP,
uridine monophosphate; UTP, uridine triphosphate.Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
1. Lecture WS 2004/05
Bioinformatics III 28
Second breakthrough: Yeast protein interaction network:first example of a scale-free network
A map of protein–protein interactions in
Saccharomyces cerevisiae, which is
based on early yeast two-hybrid
measurements, illustrates that a few
highly connected nodes (which are also
known as hubs) hold the network
together.
The largest cluster, which contains
78% of all proteins, is shown. The colour
of a node indicates the phenotypic effect
of removing the corresponding protein
(red = lethal, green = non-lethal, orange
= slow growth, yellow = unknown).
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
1. Lecture WS 2004/05
Bioinformatics III 29
Summary
Many cellular networks show properties of scale-free networks
- protein-protein interaction networks
- metabolic networks
- genetic regulatory networks (where nodes are individual genes and links are
derived from expression correlation e.g. by microarray data)
- protein domain networks
However, not all cellular networks are scale-free.
E.g. the transcription regulatory networks of S. cerevisae and E.coli are examples
of mixed scale-free and exponential characteristics.
Next lecture:
- mathematical properties of networks
- origin of scale-free topology
- topological robustness
Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)