+ All Categories
Home > Documents > Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD ([email protected])...

Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD ([email protected])...

Date post: 11-Jan-2016
Category:
Upload: clementine-rose
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
71
Bioinformatics Bioinformatics Dealing with expression data Dealing with expression data Kristel Van Steen, PhD, ScD Kristel Van Steen, PhD, ScD ([email protected]) ([email protected]) Université de Liege - Institut Montefiore Université de Liege - Institut Montefiore 2008-2009 2008-2009
Transcript
Page 1: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

BioinformaticsBioinformaticsDealing with expression dataDealing with expression data

Kristel Van Steen, PhD, ScDKristel Van Steen, PhD, ScD

([email protected])([email protected])

Université de Liege - Institut MontefioreUniversité de Liege - Institut Montefiore

2008-20092008-2009

Page 2: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

AcknowledgementsAcknowledgements

Material based on: Material based on:

Slides from Slides from Patrik D’haeseleer, Shoudan Liang and Roland Somogyi Patrik D’haeseleer, Shoudan Liang and Roland Somogyi (genetic network inference)(genetic network inference)

Slides from Steve Horvath and Jun Dong (co-expression networks)Slides from Steve Horvath and Jun Dong (co-expression networks)

Slides from Slides from Sargur Srihari (bagging and boosting)Sargur Srihari (bagging and boosting)

Page 3: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Class Outline

Genetic networks A primer to co-expression network

analysis Bagging and boosting (as promised …) Concensus microarray data analysis

Theory Application

Page 4: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Genetic networksGenetic networks

Page 5: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline

IntroductionIntroduction A conceptual approach to complex A conceptual approach to complex

network dynamicsnetwork dynamics Inference of regulation through clustering Inference of regulation through clustering

of gene expression dataof gene expression data Modeling methodologiesModeling methodologies Gene network inference: reverse Gene network inference: reverse

engineeringengineering

Page 6: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Genes encode proteins, some of Genes encode proteins, some of which in turn regulate other geneswhich in turn regulate other genes

determine the structure of this determine the structure of this intricate network of genetic intricate network of genetic regulatory interactionsregulatory interactions

Page 7: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Traditional approach: localTraditional approach: local Examining and collecting data on a Examining and collecting data on a

single gene, a single protein or a single single gene, a single protein or a single reaction at a timereaction at a time

functional genomicsfunctional genomics

Page 8: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Functional GenomicsFunctional Genomics

Specifically, Specifically, functional genomicsfunctional genomics refers to refers to the development and application of the development and application of globalglobal experimental approaches to assess gene experimental approaches to assess gene function by making use of the information function by making use of the information and reagents provided by structural and reagents provided by structural genomic. genomic. high throughput high throughput large scale experimental methodologies large scale experimental methodologies

combined with statistical and computational combined with statistical and computational analysis of the results.analysis of the results.

Page 9: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Functional Genomics(Cont.)Functional Genomics(Cont.)

We need to define the mapping from We need to define the mapping from sequence space to functional space. sequence space to functional space.

Page 10: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Intermediate representationIntermediate representation

Focus at the level of single cellsFocus at the level of single cells A biological system can be A biological system can be

considered to be a state considered to be a state machine,where the change in machine,where the change in internal state of the system depends internal state of the system depends on both its current internal state and on both its current internal state and any external inputs.any external inputs.

Page 11: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

The goal The goal

Observe the state of a cell and how it Observe the state of a cell and how it changes under different changes under different circumstances, and from this to circumstances, and from this to derive a model of how these state derive a model of how these state changes are generatedchanges are generated The state of cellThe state of cell

All those variables determining its behaviorAll those variables determining its behavior

Page 12: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Example Example

A simple,6-node regulatory networkA simple,6-node regulatory network

Page 13: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline

Introduction Introduction A conceptual approach to complex network A conceptual approach to complex network

dynamicsdynamics Inference of regulation through clustering of Inference of regulation through clustering of

gene expression datagene expression data Modeling methodologiesModeling methodologies Gene network inference:reverse Gene network inference:reverse

engineeringengineering Conclusions and OutlookConclusions and Outlook

Page 14: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

The global gene expression pattern is The global gene expression pattern is the result of the collective behavior the result of the collective behavior of individual regulatory pathwaysof individual regulatory pathways

Gene function depends on its cellular Gene function depends on its cellular context; thus understanding the context; thus understanding the network as a whole is essential.network as a whole is essential.

Page 15: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Boolean NetworksBoolean Networks

Each gene is considered as a binary Each gene is considered as a binary variable—either ON or OFF—variable—either ON or OFF—regulated by other genes through regulated by other genes through logical or Boolean functions.logical or Boolean functions.

Even with this simplification ,the Even with this simplification ,the network behavior is already network behavior is already extremely rich.extremely rich.

Page 16: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Boolean Networks(Cont.)Boolean Networks(Cont.)

Cell differentiation corresponds to Cell differentiation corresponds to transitions from one global gene transitions from one global gene expression pattern to another.expression pattern to another.

Page 17: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline

Introduction Introduction A conceptual approach to complex network A conceptual approach to complex network

dynamicsdynamics Inference of regulation through clustering of Inference of regulation through clustering of

gene expression datagene expression data Modeling methodologiesModeling methodologies Gene network inference:reverse Gene network inference:reverse

engineeringengineering Conclusions and OutlookConclusions and Outlook

Page 18: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Scoring methodsScoring methods

Whether there has been a significant Whether there has been a significant change at any one conditionchange at any one condition

Whether there has been a significant Whether there has been a significant aggregate change over all conditionsaggregate change over all conditions

Whether the fluctuation pattern Whether the fluctuation pattern shows high diversity according to shows high diversity according to Shannon entropyShannon entropy

Page 19: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Guilt By AssociationGuilt By Association

Select a geneSelect a gene Determine its nearest neighbors in Determine its nearest neighbors in

expression space within a certain expression space within a certain user-defined distance cut-offuser-defined distance cut-off

Page 20: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

ClusteringClustering

extract groups of genes that are extract groups of genes that are tightly co-expressed over a range of tightly co-expressed over a range of different experiments. different experiments.

Page 21: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

CautionCaution

Different clustering methods can Different clustering methods can have very different resultshave very different results

It’s not yet clear which clustering It’s not yet clear which clustering methods are most useful for gene methods are most useful for gene expression analysis.expression analysis.

Page 22: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Definition:Gene Expression Definition:Gene Expression ProfileProfile

An An expression profile eexpression profile ejj of an ordered of an ordered list of N samples(k=1 to N) for a list of N samples(k=1 to N) for a particular gene j is a vector of scaled particular gene j is a vector of scaled expression values vexpression values vjkjk

The expression profile is:The expression profile is: eejj=(v=(vj1j1,v,vj2j2,v,vj3j3,…,v,…,vjNjN))

Page 23: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Definition:Gene Expression Definition:Gene Expression Profile( Cont.)Profile( Cont.)

A A differencedifference between two genes p between two genes p and q may be estimated as N-and q may be estimated as N-dimensional metric “distance” dimensional metric “distance” between ebetween epp and e and eqq..

Euclidean distanceEuclidean distance: :

= == = N

vvNj

jqjp

..1

2)(pqd

Page 24: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Clustering algorithmsClustering algorithms

Non-hierarchical methodsNon-hierarchical methods Cluster N objects into K groups in an Cluster N objects into K groups in an

iterative process until certain goodness iterative process until certain goodness criteria are optimizedcriteria are optimized

E.g. K-meansE.g. K-means

Page 25: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Clustering algorithmsClustering algorithms

Hierarchical methodsHierarchical methods Return an hierarchy of nested clusters, Return an hierarchy of nested clusters,

where each cluster typically consists of where each cluster typically consists of the union of two or more smaller the union of two or more smaller clusters.clusters.

Agglomerative methodsAgglomerative methods Start with single object clusters and recursively Start with single object clusters and recursively

merge them into larger clustersmerge them into larger clusters Divisive methodsDivisive methods

Start with the cluster containing all objects and Start with the cluster containing all objects and recursively divide it into smaller clustersrecursively divide it into smaller clusters

Page 26: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Other applications of co-Other applications of co-expression clustersexpression clusters

Extraction of regulatory motifsExtraction of regulatory motifs Genes in the same expression share biological Genes in the same expression share biological

funtionsfuntions Inference of functional annotationInference of functional annotation

Functions of unknown genes may be Functions of unknown genes may be hypothesized from genes with know function hypothesized from genes with know function within the same clusterwithin the same cluster

As a molecular signature in distinguishing As a molecular signature in distinguishing cell or tissue typescell or tissue types mRNA expressionmRNA expression

Page 27: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Which clustering method to Which clustering method to use?use?

There is no single best criterion for There is no single best criterion for obtaining a partition because no obtaining a partition because no precise and workable definition of precise and workable definition of ‘cluster’ exists. ‘cluster’ exists.

Clusters can be of any arbitrary Clusters can be of any arbitrary shapes and sizes in a shapes and sizes in a multidimensional pattern space.multidimensional pattern space.

Page 28: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Challenge in cluster analysisChallenge in cluster analysis

A gene could be a member of several A gene could be a member of several clusters, each reflecting a particular clusters, each reflecting a particular aspect of its function and controlaspect of its function and control

SolutionsSolutions clustering methods that partition genes clustering methods that partition genes

into non-exclusive clustersinto non-exclusive clusters Several clustering methods could be Several clustering methods could be

used simultaneouslyused simultaneously

Page 29: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline

Introduction Introduction A conceptual approach to complex network A conceptual approach to complex network

dynamicsdynamics Inference of regulation through clustering of Inference of regulation through clustering of

gene expression datagene expression data Modeling methodologiesModeling methodologies Gene network inference:reverse Gene network inference:reverse

engineeringengineering Conclusions and OutlookConclusions and Outlook

Page 30: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Level of biochemical detailLevel of biochemical detail

abstractabstract Boolean networksBoolean networks

concreteconcrete Full biochemical interaction models with Full biochemical interaction models with

stochastic kinetics in Arkin et al.(1998)stochastic kinetics in Arkin et al.(1998)

Page 31: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Forward and inverse Forward and inverse modelingmodeling

Forward modeling approachForward modeling approach Inverse modeling, or reverse Inverse modeling, or reverse

engineeringengineering Given an amount of data, what can we Given an amount of data, what can we

deduce about the unknown underlying deduce about the unknown underlying regulatory network?regulatory network?

Requires the use of a parametric model, Requires the use of a parametric model, the parameters of which are then fit to the parameters of which are then fit to the real-world data.the real-world data.

Page 32: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline

Introduction Introduction A conceptual approach to complex network A conceptual approach to complex network

dynamicsdynamics Inference of regulation through clustering of Inference of regulation through clustering of

gene expression datagene expression data Modeling methodologiesModeling methodologies Gene network inference:reverse Gene network inference:reverse

engineeringengineering Conclusions and OutlookConclusions and Outlook

Page 33: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Goal of network inferenceGoal of network inference

Construct a coarse-scale model of Construct a coarse-scale model of the network of regulatory the network of regulatory interactions between the genesinteractions between the genes

It’s possible to reverse engineer a It’s possible to reverse engineer a network from its activity profilesnetwork from its activity profiles

Page 34: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Data requirementsData requirements

We need to observe the expression We need to observe the expression of that gene under many different of that gene under many different combinations of expression levels of combinations of expression levels of its regulatory inputsits regulatory inputs Use data from different sourcesUse data from different sources Deal with different data types Deal with different data types

Page 35: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Estimates for network Estimates for network modelsmodels

a sparse network model of a sparse network model of NN genes, genes, where each gene is only affected bywhere each gene is only affected by KK other genes on average. other genes on average.

a sparsely connected, directed a sparsely connected, directed graph with graph with NN nodes and nodes and NKNK edges. edges.

Page 36: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Co-expression network Co-expression network analysisanalysis

Page 37: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

OutlineOutline Network and network conceptsNetwork and network concepts Approximately factorizable networksApproximately factorizable networks Gene Co-expression NetworkGene Co-expression Network

Eigengene Factorizability, Eigengene Eigengene Factorizability, Eigengene ConformityConformity

Eigengene-based network conceptsEigengene-based network concepts What can we learn from the What can we learn from the

geometric interpretation?geometric interpretation?

Page 38: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Network=Adjacency Network=Adjacency MatrixMatrix

A network can be represented by an A network can be represented by an adjacency matrix, A=[aadjacency matrix, A=[aijij], that encodes ], that encodes whether/how a pair of nodes is connected.whether/how a pair of nodes is connected. A is a symmetric matrix with entries in [0,1] A is a symmetric matrix with entries in [0,1] For unweighted network, entries are 1 or 0 For unweighted network, entries are 1 or 0

depending on whether or not 2 nodes are depending on whether or not 2 nodes are adjacent (connected)adjacent (connected)

For weighted networks, the adjacency matrix For weighted networks, the adjacency matrix reports the connection strength between node reports the connection strength between node pairspairs

Our convention: diagonal elements of Our convention: diagonal elements of A A are all are all 1.1.

Page 39: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Motivational example I:Motivational example I:Pair-wise relationships between genes across Pair-wise relationships between genes across

different mouse tissues and gendersdifferent mouse tissues and genders

Challenge:Challenge:

Develop simple Develop simple

descriptive measures descriptive measures

that describe the that describe the

patterns.patterns.

Solution: Solution:

The following network The following network

concepts are useful: concepts are useful:

density, centralization,density, centralization,

clustering coefficient, clustering coefficient,

heterogeneityheterogeneity

Page 40: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Motivational example (continued)Motivational example (continued)

Challenge: Find a simple measure for describing the relationship between Challenge: Find a simple measure for describing the relationship between

gene significance and connectivitygene significance and connectivity

Solution: network concept called hub gene significanceSolution: network concept called hub gene significance

Page 41: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

BackgroundsBackgrounds

Network concepts are also known as Network concepts are also known as network statistics or network indicesnetwork statistics or network indices Examples: connectivity (degree), clustering Examples: connectivity (degree), clustering

coefficient, topological overlap, etccoefficient, topological overlap, etc Network concepts underlie network Network concepts underlie network

language and systems biological language and systems biological modeling.modeling.

Dozens of potentially useful network Dozens of potentially useful network concepts are known from graph theory.concepts are known from graph theory.

Page 42: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Review of Review of somesome fundamental network fundamental network

concepts which are defined concepts which are defined for all networks (not just co-for all networks (not just co-

expression networks)expression networks)

Page 43: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

ConnectivityConnectivity Node connectivity = row sum of the adjacency Node connectivity = row sum of the adjacency

matrixmatrix For unweighted networks=number of direct For unweighted networks=number of direct

neighborsneighbors For weighted networks= sum of connection For weighted networks= sum of connection

strengths to other nodesstrengths to other nodes

iScaled connectivity=Kmax( )

i i ijj i

i

Connectivity k a

k

k

Page 44: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

DensityDensity Density= mean adjacencyDensity= mean adjacency Highly related to mean connectivityHighly related to mean connectivity

( )

( 1) 1

where is the number of network nodes.

iji j ia mean k

Densityn n n

n

Page 45: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

CentralizationCentralization

CentralizationCentralization = 1 = 1

because it has a star topologybecause it has a star topology

CentralizationCentralization = 0 = 0

because all nodes have the same connectivity of because all nodes have the same connectivity of

22

max( ) max( )

2 1 1

n k kCentralization Density Density

n n n

= 1 if the network has a star topology= 1 if the network has a star topology

= 0 if all nodes have the same connectivity= 0 if all nodes have the same connectivity

Page 46: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

HeterogeneityHeterogeneity Heterogeneity: coefficient of variation of the Heterogeneity: coefficient of variation of the

connectivityconnectivity Highly heterogeneous networks exhibit hubsHighly heterogeneous networks exhibit hubs

( )

( )

variance kHeterogeneity

mean k

Page 47: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Clustering CoefficientClustering CoefficientMeasures the cliquishness of a particular nodeMeasures the cliquishness of a particular node

« A node is cliquish if its neighbors know each other »« A node is cliquish if its neighbors know each other »

Clustering Coef of Clustering Coef of

the white node = 0the white node = 0

Clustering Coef = 1Clustering Coef = 1

,

22

il lm mil i m i li

il ill i l i

a a aClusterCoef

a a

This This

generalizes generalizes

directly to directly to

weightedweighted

networks networks

(Zhang and (Zhang and

Horvath 2005)Horvath 2005)

Page 48: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

The topological overlap dissimilarity is The topological overlap dissimilarity is used as input of hierarchical clusteringused as input of hierarchical clustering

Generalized in Zhang and Horvath (2005) to the case of weighted networksGeneralized in Zhang and Horvath (2005) to the case of weighted networks Generalized in Li and Horvath (2006) to multiple nodesGeneralized in Li and Horvath (2006) to multiple nodes Generalized in Yip and Horvath (2007) to higher order interactionsGeneralized in Yip and Horvath (2007) to higher order interactions

,

min( , ) 1

iu uj iju i j

iji j ij

a a a

TOMk k a

1ij ijDistTOM TOM

Page 49: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Network SignificanceNetwork Significance Defined as average gene significanceDefined as average gene significance We often refer to the network significance We often refer to the network significance

of a module network as module of a module network as module significance.significance.

iGSNetworkSignif

n

Page 50: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Hub Gene Significance=Hub Gene Significance=slope of the regression line slope of the regression line

(intercept=0)(intercept=0)

2( )i i

i

GS KHubGeneSignif

K

Page 51: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Q: What do all of these fundamental Q: What do all of these fundamental network concepts have in common?network concepts have in common?

They are functions of the adjacency They are functions of the adjacency matrix A and/or a gene significance matrix A and/or a gene significance measure GS.measure GS.

Page 52: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

CHALLENGECHALLENGEFind relationships between these and other Find relationships between these and other

seemingly disparate network concepts.seemingly disparate network concepts. For general networks, this is a difficult For general networks, this is a difficult

problem.problem. But a solution exists for a But a solution exists for a special subclassspecial subclass of of

networks: networks: approximately factorizable approximately factorizable networksnetworks

Page 53: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Definition of an approximately Definition of an approximately factorizable networkfactorizable network

Definitions:

The adjacency matrix A is if

there exists a vector CF with non-negative elements such that

for all

is referred to as the of the

approximately factorizable

conformity

ij i j

i

a CFCF i j

CF

i-th node

Why is this relevant?Why is this relevant?

Answer: Because modules are often approximately factorizableAnswer: Because modules are often approximately factorizable

Page 54: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

ObservationObservation: Approximate relationships : Approximate relationships among network concepts in among network concepts in

approximately factorizable networksapproximately factorizable networks

22

2

2[1]

1

max( , )1

1

where [1] denotes the index of the most highly connected hub

i jij

j

mean ClusterCoef Heterogeneity Density

k kTopOverlap Heterogeneity

n

TopOverlap Centralization Density Heterogeneity

Page 55: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Weighted Gene Co-expression Weighted Gene Co-expression NetworkNetwork

[ ] [| ( , ) | ]

where is the expression profile for gene ,

and mathematically a vector of expression values

across multiple samples.

ij i j

i

A a cor x x

x i

Note: Unweighted Network is

[ ] [ (| ( , ) | )]

where (.) is an indicator function.

ij i jA a I cor x x

I

Page 56: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Steps for constructing Steps for constructing aa

co-expression networkco-expression network HiHi

A) Microarray gene expression data A) Microarray gene expression data

B) Measure concordance of gene B) Measure concordance of gene

expression with a Pearson expression with a Pearson

correlationcorrelation

C) The Pearson correlation matrix is C) The Pearson correlation matrix is

either dichotomized to arrive at an either dichotomized to arrive at an

adjacency matrix adjacency matrix unweighted unweighted

network network

Or transformed continuously with the Or transformed continuously with the

power adjacency function power adjacency function

weighted networkweighted network

Page 57: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Definition of module (cluster)Definition of module (cluster) Module=cluster of highly connected Module=cluster of highly connected

nodesnodes Any clustering method that results in such sets Any clustering method that results in such sets

is suitableis suitable We define modules as branches of a We define modules as branches of a

hierarchical clustering tree using the hierarchical clustering tree using the topological overlap matrixtopological overlap matrix

Page 58: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

brown

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185

brown

-0.10.00.10.20.30.4

Module Eigengene= measure of over-expression=average Module Eigengene= measure of over-expression=average

rednessrednessRows=genes, Columns=microarrayRows=genes, Columns=microarray

module eigengenes across samplesmodule eigengenes across samples

Page 59: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

The module eigengene is highly correlated with the most highly connected hub The module eigengene is highly correlated with the most highly connected hub

gene.gene.

Page 60: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Some insightsSome insights Intramodular hub gene= a genes that is Intramodular hub gene= a genes that is

highly correlated with the module eigengene, highly correlated with the module eigengene, i.e. it is a good representative of a modulei.e. it is a good representative of a module

Gene screening strategies that use Gene screening strategies that use intramodular connectivity amount to path-intramodular connectivity amount to path-way based gene screening methodsway based gene screening methods

Intramodular connectivity is a highly Intramodular connectivity is a highly reproducible “fuzzy” measure of module reproducible “fuzzy” measure of module membership.membership.

Network concepts are useful for describing Network concepts are useful for describing pairwise interaction patterns.pairwise interaction patterns.

Page 61: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Bagging and BoostingBagging and Boosting

Page 62: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

BaggingBagging

Page 63: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Bagging Bagging

Page 64: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

BoostingBoosting

Page 65: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Creating a classifier Creating a classifier sequencesequence

Page 66: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Creating a 2Creating a 2ndnd training set training set

Page 67: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Creating a 3rd data setCreating a 3rd data set

Page 68: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Boosting vs BaggingBoosting vs Bagging

Page 69: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Concensus microarray analysisConcensus microarray analysis

Page 70: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

TheoryTheory(Allison et al 2006 !!!)

PracticalIBD application

Page 71: Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009.

Recommended