+ All Categories
Home > Documents > Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern...

Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern...

Date post: 18-Dec-2015
Category:
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
21
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins Our new methods Conventional methods Frequent pattern mining FP-grow th Apriori,TreeProjection Sequential pattern mining PrefixSpan, FreeSpan GSP Frequent closed pattern mining CLO SET A-close,CHARM
Transcript
Page 1: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Our New Progress on Frequent/Sequential Pattern Mining

We develop new frequent/sequential pattern mining methods

Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins

Our newmethods

Conventionalmethods

Frequent patternmining

FP-growth Apriori, TreeProjection

Sequential patternmining

PrefixSpan,FreeSpan

GSP

Frequent closedpattern mining

CLOSET A-close, CHARM

Page 2: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Complete Set of Frequent Patterns on T10I4D100k

0

20

40

60

80

100

120

140

0.00% 0.05% 0.10% 0.15%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Page 3: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Complete Set of Frequent Patterns on T25I20D100k

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Page 4: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Complete Set of Frequent Patterns on Connect-4

0

50

100

150

200

250

300

350

400

70% 75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) Apriori

TreeProjection

FP-growth

Page 5: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Sequential Patterns on C10T4S16I4

0

100

200

300

400

500

600

700

800

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 6: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Sequential Patterns on C10T8S8I8

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 7: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Scalability of Mining Sequential Patterns on C10-100T8S8I8

0

100

200

300

400

500

600

700

800

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 8: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Scalability of Mining Sequential Patterns on C10-100T4S16I4

0

200

400

600

800

1000

1200

1400

1600

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 9: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Why Prefix Is Faster Than GSP?

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

Dataset C10T4S16I4 Dataset C10T8S8I8

Page 10: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Frequent Closed Itemsets on T25I20D100k

0

20

40

60

80

100

0.7% 0.9% 1.1% 1.3% 1.5%

Support threshold

Ru

nti

me (

seco

nd

)

A-CLOSE

CLOSET

ChARM

Page 11: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Frequent Closed Itemsets on Connect-4

1

10

100

1000

10000

40% 50% 60% 70% 80% 90% 100%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

Page 12: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Mining Frequent Closed Itemsets on Pumsb

0

50

100

150

200

250

300

75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

Page 13: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for

generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.

J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.

J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.

J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication

R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.

M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.

Page 14: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

DBMiner Version 2.5 (Beta)

DBMiner Technology Inc.B.C. Canada

Page 15: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

What we had for DBMiner 2.0…

Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser

Page 16: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

What we will do in DBMiner 2.5…

Keep the existing association module and classification module in version 2.0

Change the existing clustering module Add new visual classification module

both on SQL server and OLAP Add new sequential pattern modules

on SQL server using FP algorithm

Page 17: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

What we have done…

We have incorporated the existing association module and added OLAP browser Module

We have added the visual classification module

We have changed the existing clustering module

We have added the sequential pattern module

We are still in the development stage

Page 18: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

Association module on data cubes

Page 19: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

New sequential pattern module on SQL Server

Page 20: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

New visual classification module on data cubes

Page 21: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.

New clustering module on data cubes


Recommended