Home >Documents >Computer Science TRI C LUSTER An Effective Algorithm for Mining Coherent Clusters in 3D Microarray...

# Computer Science TRI C LUSTER An Effective Algorithm for Mining Coherent Clusters in 3D Microarray...

Date post:21-Dec-2015
Category:
View:214 times
Transcript:
• Slide 1
• Computer Science TRI C LUSTER An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data Mohammed J. Zaki & Lizhuang Zhao Department of Computer Science, Rensselaer Polytechnic Institute (RPI), Troy, NY {zhaol2, zaki}@cs.rpi.edu
• Slide 2
• Computer Science Microarray Data Essential source of information about the Gene Expression within a cell Typically 2D: Genes x Samples (Genes x Time) Measure the expression level of genes in different samples Labeled samples: Classification (cancer vs. non- cancer) Non-labeled samples: Clustering (Bi-clusters) Goal: Identify the expression patterns, providing clues to the gene regulatory networks within a cell
• Slide 3
• Computer Science Why Biclustering? v 21 v 22 v 23 v 24 v 25 v 41 v 42 v 43 v 44 v 45 v 51 v 52 v 53 v 54 v 55 s 1 s 2 s 3 s 4 s 5 g1g2g3g4g5g1g2g3g4g5 v 22 v 23 v 25 v 42 v 43 v 45 v 52 v 53 v 55 s 1 s 2 s 3 s 4 s 5 g1g2g3g4g5g1g2g3g4g5 (g 2, g 4, g 5 )(s 2, s 3, s 5 ) (g 2, g 4, g 5 ) Bicluster full-space cluster some genes similarly expressed in some samples
• Slide 4
• Computer Science Constant 1.01.42.0 2.84.0 2.53.55.0 more general 222 222 222 125 125 125 111 222 555 1.01.42.0 2.43.0 2.52.93.5 417 325 638 Order: 2 1 3 Scaling/Shifting Order Preserving Different Homogeneity or Similarity Criteria Col Row All Note: small noise is allowed in all expression values Scale=1.4 Shift=0.4
• Slide 5
• Computer Science Why TriCluster? Typical microarray data is 2D (gene x sample) Temporal expression very important tool How does gene expression evolve in time? Find clusters over genes x samples x time Spatial expression also of interest How does gene expression differ in space (e.g., different regions of mouse brain)? Find clusters over gene x samples x space Combine temporal and spatial expression Find clusters over gene x time x space, etc. There is an emerging need to mine 3D data
• Slide 6
• Computer Science TriCluster: Our Contributions First algorithm to mine tri-clusters in 3D microarray data Complete and deterministic Mine maximal clusters satisfying given homogeneity criteria Constant: column, row, all Scaling & Shifting Clusters can be overlapping; optionally delete/merge clusters having large overlap Propose a set of metrics for cluster evaluation Use Gene Ontology (GO) to access biological significance
• Slide 7
• Computer Science Definitions G is a set of genes {g 0, g 1, , g n-1 } S is a set of samples {s 0, s 1, , s m-1 } T is a set of time courses {t 0, t 1, , t l-1 } 3D Real-valued Dataset D = {d ijk } G x S x T d ijk is the expression value of gene g i in sample s j at time t k triCluster is a maximal submatrix of D that satisfies some homogeneity conditions C = X x Y x Z = {c ijk } X G, Y S, Z T Given homogeneity conditions

Embed Size (px)
Recommended

Documents

Documents

Documents

Documents

Documents

Documents