+ All Categories
Home > Technology > Mining Maximally Banded Matrices in Binary Data

Mining Maximally Banded Matrices in Binary Data

Date post: 25-May-2015
Category:
Upload: faris-alqadah
View: 331 times
Download: 2 times
Share this document with a friend
Popular Tags:
63
Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Mining Maximally Banded Matrices in Binary Data Faris Alqadah Raj Bhatnagar Anil Jegga University of Cincinnati Cincinnati Children’s Hospital
Transcript
Page 1: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Mining Maximally Banded Matrices in BinaryData

Faris AlqadahRaj Bhatnagar

Anil Jegga

University of CincinnatiCincinnati Children’s Hospital

Page 2: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 3: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 4: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Banded Matrices in Data

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Banded structures inbinary matrices havenatural interpretations

Bioinformatics (overlappingroles of genes)

Paleontology (patterns ofspecies in space)

Social Networks(community structures)

Page 5: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Motivating Example

k-means multi-way EM bi-cluster subspacedoc1 1 0 1 0 1doc2 0 1 0 1 0doc3 0 0 0 0 1doc4 0 0 0 1 1doc5 0 0 1 0 1

Page 6: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Motivating Example

k-means EM subspace bi-cluster multi-waydoc1 1 1 1 0 0doc5 0 1 1 0 0doc3 0 0 1 0 0doc4 0 0 1 1 0doc2 0 0 0 1 1

Page 7: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-Clustering Problem

Banded sub-matrices are a form of bi-clusters

Bi-Clustering in binary data focuses on maximallyrectangles full of (or almost full) of 1s

Page 8: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Related Work

Nestedness and segmented nestedness [6]

MBS algorithm [2]

Fix column permutations

Solve the consecutive ones problem

Only find a single band

Page 9: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Page 10: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Page 11: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Contributions

1 Establish correspondence between banded structures andbi-clustering in binary data

2 Introduce the novel MMBS algorithm to uncover multiple,possibly overlapping banded sub-matrices

3 Empirical evaluation verifying advantage of MMBS overprevious approaches

Page 12: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 13: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Basic Notation

Matrix K with row labels G and column labels M

Think of K as K = (G,M, I)

π permutation of G and τ permutation of M

Kπτ

gπi and mτj

Page 14: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Basic Notation

Matrix K with row labels G and column labels M

Think of K as K = (G,M, I)

π permutation of G and τ permutation of M

Kπτ

gπi and mτj

Page 15: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Fully Banded Matrix

Definition

A binary matrix K= (G,M, I) is fully banded if there exists apermutation π of G and permutation τ of M such that (1) forevery row i in K

πτ the entries with 1s occur in consecutive

column indices {mi ,mi + 1, . . . ,m⋆

i } and (2) the values ofstarting indices for 1s in successive rows (i and i + 1) satisfythe conditions mi ≤ mi+1 and m⋆

i ≤ m⋆

i+1.

Page 16: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Relaxation of Fully Banded

Real data has noise

Subspaces may encompass banded structure

e(Kπτ ): number of 1s or 0s that must be flipped to achieve

banded structure

Maximal banded sub-matrix: no more rows or columns canbe added while still preserving bandedness

Page 17: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Relaxation of Fully Banded

Real data has noise

Subspaces may encompass banded structure

e(Kπτ ): number of 1s or 0s that must be flipped to achieve

banded structure

Maximal banded sub-matrix: no more rows or columns canbe added while still preserving bandedness

Page 18: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Problem Statement

Given binary matrix K and noise threshold ǫ find allsub-matrices K of K that are ǫ-banded and maximal.

Page 19: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 20: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-clustering

Bi-clusters in binary data defined as Formal Concepts

For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.

B ⊆ M, we have B′ = {g ∈ G|gImfor allm ∈ B}

Formal Concept: C = (A,B) such that A′ = B and B′ = A

Page 21: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bi-clustering

Bi-clusters in binary data defined as Formal Concepts

For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.

B ⊆ M, we have B′ = {g ∈ G|gImfor allm ∈ B}

Formal Concept: C = (A,B) such that A′ = B and B′ = A

Page 22: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Formal Concepts

m1 m2 m3 m4

g1 0 1 0 1g2 0 0 1 1g3 0 0 0 1g4 1 0 0 0g5 1 1 1 0g7 1 1 0 0g6 0 0 1 0

Maximal rectangles of 1s

Maximal bicliques

Bi-clusters may be ordered by the subset supersetrelationship and form a complete lattice

B(G,M, I) denotes the concept or bi-cluster lattice

Page 23: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Formal Concepts

m1 m2 m3 m4

g1 0 1 0 1g2 0 0 1 1g3 0 0 0 1g4 1 0 0 0g5 1 1 1 0g7 1 1 0 0g6 0 0 1 0

Maximal rectangles of 1s

Maximal bicliques

Bi-clusters may be ordered by the subset supersetrelationship and form a complete lattice

B(G,M, I) denotes the concept or bi-cluster lattice

Page 24: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

Trivially a bi-cluster is fully banded

Page 25: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

Trivially a bi-cluster is fully banded

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Page 26: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Splintering Bands

A B C D E1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

Intuitively, any fully banded matrix can be splintered exactly intomaximal rectangles of 1s or bi-clusters

Page 27: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Page 28: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Page 29: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Page 30: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Ordering Splintered Bands

Let Kπτ

be fully banded

Γ(g) is a mapping from row g to the bi-clusters g appearsin

The union of all Γ(g) can always be ordered

n-tuple of bi-clusters {C1, . . . ,Cn} having total ordering{<π1,τ1, . . . , <πn,τn}

Define lexicographical order <π,τ on C1 × C2 × · · · × Cn.

Considering {C1, . . . ,Cn} in order completely specifies thepermutations π and τ

Page 31: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bands as Sequences of Concepts

Proposition

Given a context K, if permutations π and τ exist such that Kπτ

isfully banded then there exists a sequence of bi-clustersC1 = (A1,B1), . . . ,Cn = (An,Bn) s.t.

π ={

A1,A2 \ A1, . . . ,An \ An−1}

τ ={

B1 \ B2, . . . ,Bn−1 \ Bn,Bn}

Page 32: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

An ExampleA B C D E

1 1 1 1 0 02 0 1 1 0 03 0 0 1 0 04 0 0 1 1 05 0 0 0 1 1

g Γ(g)1

{

(1,ABC), (12,BC), (1234,C)}

2{

(12,BC), (1234,C)}

3{

(1234,C)}

4{

(4,CD), (45,D)}

5{

(5,DE), (45,D)}

F(Kπτ)

{

(1,ABC) < (12,BC) < (1234,C) < (4,CD) < (45,D) < (5,DE)}

π ={

1,12 \ 1, . . . ,5 \ 45}

= {1,2,3,4,5}

τ ={

ABC \ BC, . . . ,D \ DE ,DE}

= {A,B,C,D,E}

Page 33: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 34: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Paths in the lattice

Represent B(G,M, I) as G = (V ,E)

Edge set define as: C1,C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1

Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 ifCi ≺ Ci+1

Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1

Page 35: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Paths in the lattice

Represent B(G,M, I) as G = (V ,E)

Edge set define as: C1,C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1

Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 ifCi ≺ Ci+1

Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1

Page 36: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Construct Partial Bands Via Paths

sA,B,C,D,E

sA,B,C

1

sD,E5

sC,D4

sB,C1,2

sD4,5

sC

1,2,3,4

s

1,2,3,4,5

Page 37: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bound on the error

Key Fact

Each individual edge in a path P is guaranteed to produce abanded structure

Page 38: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Bound on the error

Proposition

e(Pn) ≤

0 if n ≤ 1e(Pn−1) +

a∈A

|a′ ∩ B| if Cn+1 ≻ Cn

e(Pn−1) +∑

b∈B

|b′ ∩ A| if Cn+1 ≺ Cn

Page 39: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 40: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Overview

Weigh edges of concept lattice with upper bound of error

Bad news: weights change depending on path

Good news: Error is monotonic along a path, so pruningwith backtracking works!Three steps:

1 Compute G

2 Search paths of G3 Determine top bands

Page 41: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Overview

Weigh edges of concept lattice with upper bound of error

Bad news: weights change depending on path

Good news: Error is monotonic along a path, so pruningwith backtracking works!Three steps:

1 Compute G

2 Search paths of G3 Determine top bands

Page 42: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Compute G

Many existing algorithms [1, 5, 3, 4, 7]

Incremental vs. non-incremental

Assume availability of G

Page 43: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Search Paths

Potentially exponential number of paths

Any bi-cluster is a valid starting point...but initiate withupper neighbors of null-element

At each edge add concept to path utilizing previousprocedure

Utilize backtracking, mark previously visited edges

Page 44: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Search Paths

Potentially exponential number of paths

Any bi-cluster is a valid starting point...but initiate withupper neighbors of null-element

At each edge add concept to path utilizing previousprocedure

Utilize backtracking, mark previously visited edges

Page 45: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Top Bands

Allow user to specify : minRows, minCols, maxOvlp

Quality measure: q(P) = |r(P)| ∗ |c(P)| − w ∗ e(P)

If two bands exceed maxOvlp select the higher quality one

Page 46: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Analysis and Improvements

Running time: O(|U| × |E | × max{X ,Y}|)

|U| : size of initial conceptsX ,Y : largest symmetric difference between neighboringconcepts

Speed up by reducing size of |U|

Perform simple clustering of U based on maxOvlpparameter

Good experimental results with this speed up.

Page 47: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Analysis and Improvements

Running time: O(|U| × |E | × max{X ,Y}|)

|U| : size of initial conceptsX ,Y : largest symmetric difference between neighboringconcepts

Speed up by reducing size of |U|

Perform simple clustering of U based on maxOvlpparameter

Good experimental results with this speed up.

Page 48: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 49: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Setup

Single band and segmented bands planted in syntheticdataAll experiments:

w = 1maxOvlp = 0.1minRows = 5minCols = 5ǫ = 99

Page 50: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Results

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

50020 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

Planted Bands

50 100 150 200 250 300

50

100

150

200

250

300

Page 51: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Results

Dataset name Dataset Size p Num. Planted bands Algorithm Quality top ranked Num. bands mined

SynBand100_001 100 × 100 0.01 1

MMBS 3590 6MMBS_Fast 3406 4

MBS_BD 2507 1MBS_SD 438 1

SynBand100_005 100 × 100 0.05 1

MMBS 2278 9MMBS_Fast 1503 8

MBS_BD 1050 1MBS_SD 1201 1

SynBand500_001 500 × 500 0.01 1

MMBS 8918 7MMBS_Fast 8261 6

MBS_BD 2822 1MBS_SD 2145 1

SynMultiBand100_001 100 × 100 0.01 2

MMBS 3367 2MMBS_Fast 3367 2

MBS 4101 1MBS_SD 4045 1

SynMultiBand100_001 100 × 100 0.05 2

MMBS 4054 2MMBS_Fast 3933 2

MBS_BD 3910 1MBS_SD 3736 1

SynMultiBand500_001 500 × 500 0.01 2

MMBS 28242 8MMBS_Fast 21346 5

MBS_BD 17498 1MBS_SD 430 1

SynRandom100_005 100 × 100 0.05 unknown

MMBS 3311 17MMBS_Fast 3220 14

MBS_BD 2801 1MBS_SD 1949 1

SynRandom500_001 500 × 500 0.01 unknown

MMBS 18635 73MMBS_Fast 16163 64

MBS_BD 16771 1MBS_SD 5229 1

Page 52: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 Problem DefinitionPreliminaries

3 Bandedness and Bi-ClusteringFormal Concept AnalysisConcept Lattice Paths

4 MMBS AlgorithmThree Steps

5 Experimental ResultsSynthetic DataReal-World Data

6 Conclusion

Page 53: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Dataset Size Sparsity Algorithm Quality top ranked Num. bands mined

Genes_Phenotypes 1910 × 3965 0.008

MMBS 6665 56MMBS_Fast 6665 43

MBS_BD 5204 1MBS_SD 3578 1

Genes_Drugs 1608 × 49 0.042

MMBS 6423 18MMBS_Fast 6423 13

MBS_BD 5346 1MBS_SD 3047 1

NewsGroups_Mideast_Religion 2000 × 890 0.003

MMBS 72906 42MMBS_Fast 61410 31

MBS_BD 59781 1MBS_SD 58713 1

NewsGroups_AllPC 5000 × 2805 0.0001

MMBS 93368 5MMBS_Fast 93368 5

MBS_BD 89106 1MBS_SD 74125 1

Page 54: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

50 100 150 200 250 300 350 400

1

2

3

4

5

6

7

8

9

10

early eyelidopening

eyelids open at birth

abnormal timing ofpostnatal eyelid opening

abnormal eyelidmorphology

abnormal eyemorphology

abnormal homeostasis

abnormal ear physiology

abnormal hearingphysiology

abnormal brainstem audiotryevokedpotential

deafness

Genes_Phenotypes

Page 55: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

100 200 300 400 500 600 700 800 900

1

2

3

4

5

6

7

Genes_Drugs

Page 56: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

10 20 30 40 50 60 70 80

100

200

300

400

500

600

700

800

MideastReligion_SubjectLines

Page 57: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

10 20 30 40 50 60 70 80

100

200

300

400

500

600

700

800

900

1000

AllPC_SubjectLines

Page 58: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Page 59: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Performance

0 20 40 60 80 10010

0

101

102

103

104

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

2

103

104

105

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

1

102

103

104

105

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

0 20 40 60 80 10010

−1

100

101

102

epsilon

CP

U T

ime

(sec

onds

)

MMBS_fastMMBSMBS

Page 60: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Conclusion

Explored connection between bi-clustering and bandedstructures in matrices

Banded sub-matrices correspond to paths in the bi-clusterlattice

MMBS algorithm is based on this correspondence andability to bound error

Future work: More efficient search methodologies,stronger bounds on error

Future work: Quantitative measures of bandedness,different types of bands desirable in different applications

Page 61: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

Conclusion

Explored connection between bi-clustering and bandedstructures in matrices

Banded sub-matrices correspond to paths in the bi-clusterlattice

MMBS algorithm is based on this correspondence andability to bound error

Future work: More efficient search methodologies,stronger bounds on error

Future work: Quantitative measures of bandedness,different types of bands desirable in different applications

Page 62: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

B. Gamter and R. Wille.Formal Concept Analysis: Mathematical Foundations.Springer-Verlag, Berlin, 1999.

G. C. Garriga, E. Junttila, and H. Mannila.Banded structure in binary matrices.In KDD ’08: Proceeding of the 14th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 292–300, New York, NY, USA, 2008. ACM.

R. B. H. Bian.An algorithm for lattice-structured subspace clustering.Proceedings of the SIAM International Conference on DataMining, 2005.

S. O. Kuznetsov and S. A. Obiedkov.Algorithms for the construction of concept lattices and theirdiagram graphs.

Page 63: Mining Maximally Banded Matrices in Binary Data

Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion

In PKDD ’01: Proceedings of the 5th European Conferenceon Principles of Data Mining and Knowledge Discovery,pages 289–300, London, UK, 2001. Springer-Verlag.

C. Lindig.Fast concept analysis.8th International Conference on Conceptual Structures,2000.

H. Mannila and E. Terzi.Nestedness and segmented nestedness.In KDD ’07: Proceedings of the 13th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 480–489, New York, NY, USA, 2007. ACM.

C.-J. H. Mohammed J. Zaki.Efficient algorithms for mining closed itemsets and theirlattice structure.IEEE Transactions on Knowledge and Data Engineering,17 (4), 2005.


Recommended