Mining Maximally Banded Matrices in Binary Data

Post on 25-May-2015

331 views 2 download

Tags:

transcript

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Mining Maximally Banded Matrices in BinaryData

Faris AlqadahRaj Bhatnagar

Anil Jegga

University of CincinnatiCincinnati Children’s Hospital

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Banded Matrices in Data

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Banded structures inbinary matrices havenatural interpretations

Bioinformatics (overlappingroles of genes)

Paleontology (patterns ofspecies in space)

Social Networks(community structures)

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Motivating Example

k-means multi-way EM bi-cluster subspacedoc1 1 0 1 0 1doc2 0 1 0 1 0doc3 0 0 0 0 1doc4 0 0 0 1 1doc5 0 0 1 0 1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Motivating Example

k-means EM subspace bi-cluster multi-waydoc1 1 1 1 0 0doc5 0 1 1 0 0doc3 0 0 1 0 0doc4 0 0 1 1 0doc2 0 0 0 1 1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-Clustering Problem

Banded sub-matrices are a form of bi-clusters

Bi-Clustering in binary data focuses on maximallyrectangles full of (or almost full) of 1s

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Related Work

Nestedness and segmented nestedness [6]

MBS algorithm [2]

Fix column permutations

Solve the consecutive ones problem

Only find a single band

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Basic Notation

Matrix K with row labels G and column labels M

Think of K as K = (G,M, I)

π permutation of G and τ permutation of M

Kπτ

gπi and mτj

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Basic Notation

Matrix K with row labels G and column labels M

Think of K as K = (G,M, I)

π permutation of G and τ permutation of M

Kπτ

gπi and mτj

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Fully Banded Matrix

Definition

A binary matrix K= (G,M, I) is fully banded if there exists apermutation π of G and permutation τ of M such that (1) forevery row i in K

πτ the entries with 1s occur in consecutive

column indices {mi ,mi + 1, . . . ,m⋆

i } and (2) the values ofstarting indices for 1s in successive rows (i and i + 1) satisfythe conditions mi ≤ mi+1 and m⋆

i ≤ m⋆

i+1.

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Relaxation of Fully Banded

Real data has noise

Subspaces may encompass banded structure

e(Kπτ ): number of 1s or 0s that must be flipped to achieve

banded structure

Maximal banded sub-matrix: no more rows or columns canbe added while still preserving bandedness

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Relaxation of Fully Banded

Real data has noise

Subspaces may encompass banded structure

e(Kπτ ): number of 1s or 0s that must be flipped to achieve

banded structure

Maximal banded sub-matrix: no more rows or columns canbe added while still preserving bandedness

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Problem Statement

Given binary matrix K and noise threshold ǫ find allsub-matrices K of K that are ǫ-banded and maximal.

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-clustering

Bi-clusters in binary data defined as Formal Concepts

For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.

B ⊆ M, we have B′ = {g ∈ G|gImfor allm ∈ B}

Formal Concept: C = (A,B) such that A′ = B and B′ = A

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-clustering

Bi-clusters in binary data defined as Formal Concepts

For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.

B ⊆ M, we have B′ = {g ∈ G|gImfor allm ∈ B}

Formal Concept: C = (A,B) such that A′ = B and B′ = A

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Formal Concepts

m1 m2 m3 m4

g1 0 1 0 1g2 0 0 1 1g3 0 0 0 1g4 1 0 0 0g5 1 1 1 0g7 1 1 0 0g6 0 0 1 0

Maximal rectangles of 1s

Maximal bicliques

Bi-clusters may be ordered by the subset supersetrelationship and form a complete lattice

B(G,M, I) denotes the concept or bi-cluster lattice

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Formal Concepts

m1 m2 m3 m4

g1 0 1 0 1g2 0 0 1 1g3 0 0 0 1g4 1 0 0 0g5 1 1 1 0g7 1 1 0 0g6 0 0 1 0

Maximal rectangles of 1s

Maximal bicliques

Bi-clusters may be ordered by the subset supersetrelationship and form a complete lattice

B(G,M, I) denotes the concept or bi-cluster lattice

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

Trivially a bi-cluster is fully banded

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

Trivially a bi-cluster is fully banded

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Intuitively, any fully banded matrix can be splintered exactly intomaximal rectangles of 1s or bi-clusters

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bands as Sequences of Concepts

Proposition

Given a context K, if permutations π and τ exist such that Kπτ

isfully banded then there exists a sequence of bi-clustersC1 = (A1,B1), . . . ,Cn = (An,Bn) s.t.

π ={

A1,A2 \ A1, . . . ,An \ An−1}

τ ={

B1 \ B2, . . . ,Bn−1 \ Bn,Bn}

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

An ExampleA B C D E

1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

g Γ(g)1

{

(1,ABC), (12,BC), (1234,C)}

2{

(12,BC), (1234,C)}

3{

(1234,C)}

4{

(4,CD), (45,D)}

5{

(5,DE), (45,D)}

F(Kπτ)

{

(1,ABC) < (12,BC) < (1234,C) < (4,CD) < (45,D) < (5,DE)}

π ={

1,12 \ 1, . . . ,5 \ 45}

= {1,2,3,4,5}

τ ={

ABC \ BC, . . . ,D \ DE ,DE}

= {A,B,C,D,E}

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Paths in the lattice

Represent B(G,M, I) as G = (V ,E)

Edge set define as: C1,C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1

Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 ifCi ≺ Ci+1

Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Paths in the lattice

Represent B(G,M, I) as G = (V ,E)

Edge set define as: C1,C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1

Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 ifCi ≺ Ci+1

Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Construct Partial Bands Via Paths

sA,B,C,D,E

sA,B,C

1

sD,E5

sC,D4

sB,C1,2

sD4,5

sC

1,2,3,4

s

1,2,3,4,5

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bound on the error

Key Fact

Each individual edge in a path P is guaranteed to produce abanded structure

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bound on the error

Proposition

e(Pn) ≤

0 if n ≤ 1e(Pn−1) +

a∈A

|a′ ∩ B| if Cn+1 ≻ Cn

e(Pn−1) +∑

b∈B

|b′ ∩ A| if Cn+1 ≺ Cn

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Overview

Weigh edges of concept lattice with upper bound of error

Bad news: weights change depending on path

Good news: Error is monotonic along a path, so pruningwith backtracking works!Three steps:

1 Compute G

2 Search paths of G3 Determine top bands

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Overview

Weigh edges of concept lattice with upper bound of error

Bad news: weights change depending on path

Good news: Error is monotonic along a path, so pruningwith backtracking works!Three steps:

1 Compute G

2 Search paths of G3 Determine top bands

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Compute G

Many existing algorithms [1, 5, 3, 4, 7]

Incremental vs. non-incremental

Assume availability of G

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Search Paths

Potentially exponential number of paths

Any bi-cluster is a valid starting point...but initiate withupper neighbors of null-element

At each edge add concept to path utilizing previousprocedure

Utilize backtracking, mark previously visited edges

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Search Paths

Potentially exponential number of paths

Any bi-cluster is a valid starting point...but initiate withupper neighbors of null-element

At each edge add concept to path utilizing previousprocedure

Utilize backtracking, mark previously visited edges

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Top Bands

Allow user to specify : minRows, minCols, maxOvlp

Quality measure: q(P) = |r(P)| ∗ |c(P)| − w ∗ e(P)

If two bands exceed maxOvlp select the higher quality one

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Analysis and Improvements

Running time: O(|U| × |E | × max{X ,Y}|)

|U| : size of initial conceptsX ,Y : largest symmetric difference between neighboringconcepts

Speed up by reducing size of |U|

Perform simple clustering of U based on maxOvlpparameter

Good experimental results with this speed up.

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Analysis and Improvements

Running time: O(|U| × |E | × max{X ,Y}|)

|U| : size of initial conceptsX ,Y : largest symmetric difference between neighboringconcepts

Speed up by reducing size of |U|

Perform simple clustering of U based on maxOvlpparameter

Good experimental results with this speed up.

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Setup

Single band and segmented bands planted in syntheticdataAll experiments:

w = 1maxOvlp = 0.1minRows = 5minCols = 5ǫ = 99

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Results

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

50020 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

Planted Bands

50 100 150 200 250 300

50

100

150

200

250

300

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Results

Dataset name Dataset Size p Num. Planted bands Algorithm Quality top ranked Num. bands mined

SynBand100_001 100 × 100 0.01 1

MMBS 3590 6MMBS_Fast 3406 4

MBS_BD 2507 1MBS_SD 438 1

SynBand100_005 100 × 100 0.05 1

MMBS 2278 9MMBS_Fast 1503 8

MBS_BD 1050 1MBS_SD 1201 1

SynBand500_001 500 × 500 0.01 1

MMBS 8918 7MMBS_Fast 8261 6

MBS_BD 2822 1MBS_SD 2145 1

SynMultiBand100_001 100 × 100 0.01 2

MMBS 3367 2MMBS_Fast 3367 2

MBS 4101 1MBS_SD 4045 1

SynMultiBand100_001 100 × 100 0.05 2

MMBS 4054 2MMBS_Fast 3933 2

MBS_BD 3910 1MBS_SD 3736 1

SynMultiBand500_001 500 × 500 0.01 2

MMBS 28242 8MMBS_Fast 21346 5

MBS_BD 17498 1MBS_SD 430 1

SynRandom100_005 100 × 100 0.05 unknown

MMBS 3311 17MMBS_Fast 3220 14

MBS_BD 2801 1MBS_SD 1949 1

SynRandom500_001 500 × 500 0.01 unknown

MMBS 18635 73MMBS_Fast 16163 64

MBS_BD 16771 1MBS_SD 5229 1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Dataset Size Sparsity Algorithm Quality top ranked Num. bands mined

Genes_Phenotypes 1910 × 3965 0.008

MMBS 6665 56MMBS_Fast 6665 43

MBS_BD 5204 1MBS_SD 3578 1

Genes_Drugs 1608 × 49 0.042

MMBS 6423 18MMBS_Fast 6423 13

MBS_BD 5346 1MBS_SD 3047 1

NewsGroups_Mideast_Religion 2000 × 890 0.003

MMBS 72906 42MMBS_Fast 61410 31

MBS_BD 59781 1MBS_SD 58713 1

NewsGroups_AllPC 5000 × 2805 0.0001

MMBS 93368 5MMBS_Fast 93368 5

MBS_BD 89106 1MBS_SD 74125 1

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

50 100 150 200 250 300 350 400

1

2

3

4

5

6

7

8

9

10

early eyelidopening

eyelids open at birth

abnormal timing ofpostnatal eyelid opening

abnormal eyelidmorphology

abnormal eyemorphology

abnormal homeostasis

abnormal ear physiology

abnormal hearingphysiology

abnormal brainstem audiotryevokedpotential

deafness

Genes_Phenotypes

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

100 200 300 400 500 600 700 800 900

1

2

3

4

5

6

7

Genes_Drugs

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

10 20 30 40 50 60 70 80

100

200

300

400

500

600

700

800

MideastReligion_SubjectLines

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

10 20 30 40 50 60 70 80

100

200

300

400

500

600

700

800

900

1000

AllPC_SubjectLines

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Performance

0 20 40 60 80 10010

0

101

102

103

104

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

2

103

104

105

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

1

102

103

104

105

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

−1

100

101

102

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Conclusion

Explored connection between bi-clustering and bandedstructures in matrices

Banded sub-matrices correspond to paths in the bi-clusterlattice

MMBS algorithm is based on this correspondence andability to bound error

Future work: More efficient search methodologies,stronger bounds on error

Future work: Quantitative measures of bandedness,different types of bands desirable in different applications

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Conclusion

Explored connection between bi-clustering and bandedstructures in matrices

Banded sub-matrices correspond to paths in the bi-clusterlattice

MMBS algorithm is based on this correspondence andability to bound error

Future work: More efficient search methodologies,stronger bounds on error

Future work: Quantitative measures of bandedness,different types of bands desirable in different applications

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

B. Gamter and R. Wille.Formal Concept Analysis: Mathematical Foundations.Springer-Verlag, Berlin, 1999.

G. C. Garriga, E. Junttila, and H. Mannila.Banded structure in binary matrices.In KDD ’08: Proceeding of the 14th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 292–300, New York, NY, USA, 2008. ACM.

R. B. H. Bian.An algorithm for lattice-structured subspace clustering.Proceedings of the SIAM International Conference on DataMining, 2005.

S. O. Kuznetsov and S. A. Obiedkov.Algorithms for the construction of concept lattices and theirdiagram graphs.

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

In PKDD ’01: Proceedings of the 5th European Conferenceon Principles of Data Mining and Knowledge Discovery,pages 289–300, London, UK, 2001. Springer-Verlag.

C. Lindig.Fast concept analysis.8th International Conference on Conceptual Structures,2000.

H. Mannila and E. Terzi.Nestedness and segmented nestedness.In KDD ’07: Proceedings of the 13th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 480–489, New York, NY, USA, 2007. ACM.

C.-J. H. Mohammed J. Zaki.Efficient algorithms for mining closed itemsets and theirlattice structure.IEEE Transactions on Knowledge and Data Engineering,17 (4), 2005.