DISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK
Ideker Bioinformatics 2002
Presented by: Omrit Zemach April 3 2013
Seminar in Algorithmic Challenges in
Analyzing Big Data* in Biology and Medicine-TAU
OUTLINE Introduction- biological terms Motivation Methods
Basic z-score calculation simulated annealing
Results Discussion
PROTEIN-PROTEIN INTERACTION
All living organisms consist of living cells All those cells, comprise the same building
blocks: RNA ,DNA and PROTEIN Protein sequences are encoded in DNA Proteins play major roles in all cellular
processes
DNA REPLICATION
TRANSCIPTION INTO mRNA
TRANSLATION OF mRNA
PROTEIN-DNA INTERACTIONS protein binds a molecule of DNA Regulate the biological function of DNA, usually the expression of a gene. Transcription factors that activate or repress
gene expression
GENE EXPRESSION
Gene is a sequence of the DNA . The gene decodes to a protein. the process by which information from a
gene is used in the synthesis of a functional
protein is called gene expression It is interesting to test gene expression on
multiple conditions (experiments). Differential-express
DNA chips/ Microarrays-Simultaneous measurement of expression levels of all genes.
MOTIVATION
Databases of PROTEIN-PROTEIN & PROTEIN-DNA interactions
Widely available mRNA expression data
Generate concrete hypotheses for the underlying mechanisms governing the observed changes in gene expression
MOTIVATION Exposing the yeast galactose utilization
pathway to 20 perturbations Constructing a molecular interaction
network by screening a database of protein-protein and protein-DNA interactions
Select 362 interactions linking genes that were differentially-expressed under one or more perturbations .
Analyze changes in expression.
Conclusion: Pairs of genes linked in this network were
more likely to have correlated expression profiles than genes chosen at random
however,the general task of
Associating gene expression changes with higher order
groups of interaction was not discussed
DISCOVERING REGULATORY AND SIGNALING CIRCUITS IN MOLECULAR
INTERACTION NETWORKS Introducing method for searching the
networks to find ‘active sub-networks’ On multiple conditions , determine which
conditions significantly affect gene expression in each subnetwork.
METHODS
Z-SCORE CALCULATION Given each gene i a value pi
pi= The significance of differential expression of gene I
zi= Ф-1 (1- pi) ( z-score for gene i)
aggregate z-score for subnetwork A
Calibrating z against the background distribution
SCORING OVER MULTIPLE CONDITIONS
Extending the scoring system over multiple conditions .
Create a matrix of z-score . Rows- m conditionsColumns-genes Produce m different aggregate scores (one for each condition Sort them from highest to lowest. compute rA
max = max j (rA[j] )
Compute rA[j] for each j=1….m as follows: PZ = 1 – Ф( ZA[j] )
(the probability that any single condition has a z-score above ZA[j] )
b
(the probability that at least j of the m conditions had scores above ZA[j])
rA[j] = Ф-1 (1-pA[j) )
rAmax = max j (rA[j] )
compute rAmax
Z score of gene 1
Condition 1
Condition 2
Condition 3
Condition 4
Aggregate scores of zA1 ….. zAmc
Aggregate scores of zA1 ….. zAm sorted
Computing rA[1] … rA[m]
Taking max j (rA[j] )
Calibrating z against the background distribution
SIMULATED ANNEALING
strategy to find local maximumwe must sometimes select new points
that do not improve solutionAnnealing- Gradual cooling of liquidIncorporate a temperature parameter
into the maximization procedureAt high temperatures, explore parameter space At lower temperatures, restrict exploration
SIMULATED ANNEALING STRATEGYStart with some sample
Propose a change
Decide whether to accept change
SIMULATED ANNEALING STRATEGY Decide whether to accept change-
HOW?? Consider decreasing series of
temperatures For each temperature, iterate these
steps:Propose an update and evaluate function
Accept updates that improve solutionAccept some updates that don't
improve solution Acceptance probability depends on
“temperature” parameter
SEARCHING FOR HIGH SCORING SUBNETWORKS VIA SIMULATED ANNEALING
•Associate an active/inactive state with each node •GW = denote the working sub graph of G induced by the active nodes
THE ALGORITHM
HEURISTICS FOR IMPROVED ANNEALING Search for M subnetworks
simultaneously
Increasing the efficiency of annealing in networks with many ‘hubs’
High score node
Solution- changing step 3
Defining dmin at the beginning of the algorithm
If deg(node)> dmin
Remove all neighbors that are not in the top scoring component
Solution- changing step 3
RESULTS
RESULTSSmall network with a single perturbation
7.7 3.1
2.3
2.82.5
Z-scores
GAL4
TRANSCRIPTION FACTOR
Simulated annealing was preformed with parameters:
N=100,000 Tstart= 1 Tend= 0.01 M=5 dmin=100
Distribution of sub-network score in actual and randomized data
Large network with several perturbation
DISCUSSION
SUBNETWORKS ARE CONSISTS WITH KNOWN REGULATORY CIRCUITS
SUBNETWORKS VERSUS GENE EXPRESSION CLUSTERS
Our approach groups genes subject to the constraints of molecular interaction network
Subnetworks are scored over only a subset of conditions
Groups genes only by the significance of change, while clustering methods groups genes by both magnitude and direction of change
Our method leaves some genes unaffiliated with any subnetwork, unlike clustering which assign every gene to distinct cluster
FUTURE WORK Investigating the subnetworks we found in
the laboratory Accommodating new types of interaction
networks (protein and small molecules) Annotating each interaction with its
directionally compartments
QUESTIONS?
THANKS