+ All Categories
Home > Documents > Intelligent Databases and Information Systems research group Department of Computer Science and...

Intelligent Databases and Information Systems research group Department of Computer Science and...

Date post: 15-Jan-2016
Category:
Upload: constance-ward
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 CEDI’2005 Taller de Minería de Datos Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Fernando Berzal [email protected]
Transcript
Page 1: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

Intelligent Databases and Information Systems research groupDepartment of Computer Science and Artificial IntelligenceE.T.S Ingeniería Informática – Universidad de Granada (Spain)

CEDI’2005CEDI’2005Taller de Minería de DatosTaller de Minería de Datos

Association Rules:Algorithms, variations,

extensions, and applicationsFernando Berzal

[email protected]

Page 2: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

2

Association mining searches for Association mining searches for interesting relationships among items in interesting relationships among items in

a given data seta given data set

EXAMPLESEXAMPLES Diapers and six-packs are bought Diapers and six-packs are bought

together, specially on Thursday evening together, specially on Thursday evening (a myth?)(a myth?)

A sequence such as buying first a digital A sequence such as buying first a digital camera and then a memory card is a camera and then a memory card is a frequent (sequential) patternfrequent (sequential) pattern

……

MotivationMotivation

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 3: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

3

MARKET BASKET ANALYSISMARKET BASKET ANALYSIS

The earliest form of association rule The earliest form of association rule miningmining

Applications: Applications:

Catalog design, store layout, cross-Catalog design, store layout, cross-marketing…marketing…

MotivationMotivation

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 4: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

4

DefinitionDefinition

ItemItem In transactional databases:

Any of the items included in a transaction.

In relational databases:

(Attribute, value) pair(Attribute, value) pair

k-itemsetk-itemsetSet of k items

Itemset supportItemset support support(I) = P(I)

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 5: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

5

DefinitionDefinition

Association ruleAssociation rule

X X Y Y

SupportSupport

support(XY) = support(XUY) = P(XUY)

ConfidenceConfidence

confidence(XY) = support(XUY) / support(X)

= P(Y|X)  

NOTE: Both support and confidence are relative

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 6: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

6

DiscoveryDiscovery

Association rule mining

1. Find all frequent itemsets

2. Generate strong association rules from the frequent itemsets

Strong association rules are those that satisfy both a minimum support threshold and a minimum confidence threshold.

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 7: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

7

Apriori

Observation:

All non-empty subsets of a frequent itemset must also be frequent

Algorithm:

Frequent k-itemsets are used to explore potentially frequent (k+1)-itemsets (i.e. candidates)

DiscoveryDiscovery

Agrawal & Skirant: "Fast Algorithms for "Fast Algorithms for Mining Association Rules",Mining Association Rules",

VLDB'94

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 8: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

8

Apriori improvements (I)

Reducing the number of candidates Park, Chen & Yu: "An Effective Hash-Based "An Effective Hash-Based Algorithm for Mining Association Rules",Algorithm for Mining Association Rules", SIGMOD'95

Sampling Toivonen: "Sampling Large Databases for Association Rules", VLDB'96 Park, Yu & Chen: "Mining Association Rules "Mining Association Rules with Adjustable Accuracy",with Adjustable Accuracy", CIKM'97

Partitioning Savasere, Omiecinski & Navathe: "An Efficient "An Efficient Algorithm for Mining Association Rules in Large Algorithm for Mining Association Rules in Large Databases"Databases", VLDB'95

DiscoveryDiscovery

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 9: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

9

Apriori improvements (II)

Transaction reduction Agrawal & Skirant: "Fast Algorithms for Mining "Fast Algorithms for Mining Association Rules",Association Rules", VLDB'94 (AprioriTID)

Dynamic itemset counting Brin, Motwani, Ullman & Tsur: "Dynamic "Dynamic Itemset Counting and Implication Rules for Itemset Counting and Implication Rules for Market Basket Data",Market Basket Data", SIGMOD'97 (DIC) Hidber: "Online Association Rule Mining","Online Association Rule Mining", SIGMOD'99 (CARMA)

DiscoveryDiscovery

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 10: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

10

DiscoveryDiscovery

Apriori-like algorithm:TBAR

(Tree-based association rule mining)

Berzal, Cubero, Sánchez & Serrano

““TBAR: An efficient method for TBAR: An efficient method for association association

rule mining in relational rule mining in relational databases”databases”

Data & Knowledge Engineering, 2001

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 11: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

11

Discovery: TBARDiscovery: TBAR

A A #7#7 B B #9#9 C C #7#7 D D #8#8

B B #6#6 D D #5#5 C C #6#6 D D #7#7 D D #5#5

D D #5#5D D #5#55 instances 5 instances

withwith ABDABD

7 instances 7 instances

wihwih A A6 instances 6 instances

withwith ABAB

5 instances 5 instances

withwith ADAD

LL11

LL22

LL33

6 instances 6 instances

withwith BCBC

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 12: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

12

An alternative to Apriori:Compress the database

representing frequent items into a frequent-pattern tree (FP-tree)…

Han, Pei & Yin:

"Mining Frequent Patterns without "Mining Frequent Patterns without Candidate Candidate Generation",Generation", SIGMOD'2000

DiscoveryDiscovery

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 13: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

13

A challengeWhen an itemset is frequent,all its subsets are also frequent

Closed itemset C: There exists no proper super-itemset S such that support(S)=support(C)

Maximal (frequent) itemset M:M is frequent and there exists no super-itemset Y such that MY and Y is frequent.

DiscoveryDiscovery

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 14: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

14

VariationsVariations

Based on the kinds of patterns to be mined:

Frequent itemset mining(transactional and relational data)

Sequential pattern mining(sequence data sets, e.g. bioinformatics)

Structured pattern mining(structured data, e.g. graphs)

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 15: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

15

VariationsVariations

Based on the types of values handled:

Boolean association rules

Quantitative association rules

Fuzzy association rules

Delgado, Marín, Sánchez & Vila

““Fuzzy association rules: General model and Fuzzy association rules: General model and applications”applications”IEEE Transactions on Fuzzy Systems, 2003

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 16: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

16

VariationsVariations

More options:

Generalized association rules(a.k.a. multilevel association rules)

Constraint-based association rule mining

Incremental algorithms

Top-k algorithms

ICDM FIMI

ICDM FIMI

Workshop on

Workshop on

Frequent Itemset

Frequent Itemset

Mining

Mining

Implementatio

ns

Implementatio

ns

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 17: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

17

VisualizationVisualization

Integrated into data mining tools to help users understand data mining

results:

Table-based approache.g. SAS Enterprise Miner, DBMiner…

2D Matrix-based approache.g. SGI MineSet, DBMiner…

Graph-based techniquese.g. DBMiner ball graphs

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 18: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

18

Visualization: TablesVisualization: Tables

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 19: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

19

Visualization: Visual aidsVisualization: Visual aids

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 20: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

20

Visualization: 2D MatrixVisualization: 2D Matrix

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 21: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

21

Visualization: GraphsVisualization: Graphs

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 22: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

22

Visualization: VisARVisualization: VisAR

Based on parallel coordinates(Techapichetvanich & Datta,

ADMA’2005)

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 23: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

23

ExtensionsExtensions

Confidence is not the best possible

interestingness measure for rules

e.g. A very frequent item will always appear in rule consequents,

regardless its true relationship with the rule antecedent

X went to war X did not serve in Vietnam

(from the US Census)

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 24: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

24

ExtensionsExtensions

Desirable properties for interestingness measuresPiatetsky-Shapiro, 1991

P1 ACC(A⇒C) = 0 when supp(A⇒C) =

supp(A)supp(C)

P2 ACC(A⇒C) monotonically increases with supp(A⇒C)

P3 ACC(A⇒C) monotonically decreases with supp(A) (or supp(C))

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 25: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

25

ExtensionsExtensions

Certainty factors… … satisfy Piatetsky-Shapiro’s properties … are widely-used in expert systems … are not symmetric (as interest/lift) … can substitute conviction when CF>0 Berzal, Blanco, Sánchez & Vila:

“Measuring the accuracy and interest of “Measuring the accuracy and interest of association rules: A new framework",association rules: A new framework", Intelligent Data Analysis, 2002

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 26: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

26

ExtensionsExtensions

References:

Hilderman & Hamilton: “Evaluation of “Evaluation of interestingness measures for ranking discovered interestingness measures for ranking discovered knowledge”knowledge”. PAKDD, 2001

Tan, Kumar & Srivastava: “Selecting the right “Selecting the right objective measure for association analysis”objective measure for association analysis”. Information Systems, vol. 29, pp. 293-313, 2004.

Berzal, Cubero, Marín, Sánchez, Serrano & Vila: “Association rule evaluation for classification “Association rule evaluation for classification purposes”purposes” TAMIDA’2005

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 27: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

27

ApplicationsApplications

Two sample applications where associations rules have been successful

Classification (ART)

Anomaly detection (ATBAR) Balderas, Berzal, Cubero, Eisman & Marín “Discovering Hidden Association “Discovering Hidden Association Rules ”Rules ”

KDD’2005, Chicago, Illinois, USA

Berzal, Cubero, Sánchez & Serrano

““ART: A hybrid classification ART: A hybrid classification modelmodel””

Machine Learning Journal, 2004

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 28: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

28

ClassificationClassification

Classification models based on association rules

Partial classification modelsvg: Bayardo

“Associative” classification models vg: CBA (Liu et al.)

Bayesian classifiersvg: LB (Meretakis et al.)

Emergent patternsvg: CAEP (Dong et al.)

Rule treesvg: Wang et al.

Rules with exceptionsvg: Liu et al.

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 29: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

29

GOALGOAL

Simple, intelligible, and robust Simple, intelligible, and robust

classification modelsclassification models

obtained in an efficient and scalable wayobtained in an efficient and scalable way

MEANSMEANS

ClassificationClassification

Decision Tree Induction+

Association Rule Mining=

ARTART[Association Rule Trees][Association Rule Trees]

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 30: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

30

ART Classification ModelART Classification Model

IDEAMake use of efficient association rule mining algorithms to build a decision-

tree-shaped classification model.

ART = Association Rule Tree

KEY

Association rules + “else” branches

Hybrid between decision trees and decision lists

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 31: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

31

ART Classification ModelART Classification Model

SPLICESPLICEMotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 32: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

41

ExampleExample ART vs. TDIDTART vs. TDIDT

ARTART TDIDTTDIDT

X Y

Z

0

0

0 1

1

0 0 e ls e0 1

1

Y

X

1

0

X

Z Z0

0 1 0 1

0 1

0 1 1

0 1 0 1

ART classification modelART classification model

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 33: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

48

Final commentsFinal commentsART classification modelART classification model

Classification models Acceptable accuracy Reduced complexity Attribute interactions Robustness (noise & primary keys)

Classifier building method Efficient algorithm Good scalability properties Automatic parameter selection

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 34: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

49

It is often more interesting to find It is often more interesting to find surprising non-frequent events than surprising non-frequent events than

frequent onesfrequent ones

EXAMPLESEXAMPLES Abnormal network activity patterns in Abnormal network activity patterns in

intrusion detection systems.intrusion detection systems. Exceptions to “common” rules in Exceptions to “common” rules in

Medicine (useful for diagnosis, drug Medicine (useful for diagnosis, drug evaluation, detection of conflicting evaluation, detection of conflicting therapies…)therapies…)

……

Anomaly detectionAnomaly detection

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 35: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

50

Anomaly detectionAnomaly detection

Anomalous association rule

Confident rule representing homogeneous deviations from common behavior.

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 36: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

51

Anomaly detectionAnomaly detection

X¬Y confident

X Y frequent and confident

X usually implies Y (dominant rule)

When X does not imply Y, then it usually implies A (the Anomaly)

A

X Y ¬A confident

Anomalous association rule

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 37: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

52

Anomaly detectionAnomaly detection

X Y A1 Z1…

X Y A1 Z2…

X Y A2 Z3…

X Y A2 Z1…

X Y A3 Z2…

X Y A3 Z3…

X Y A Z …

X Y3A Z3

X Y3A Z …

X Y4A Z …

X Y is the dominant rule

X A when ¬ Yis the anomalous rule

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 38: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

53

Anomaly detectionAnomaly detection

Suzuki et al.’s “Exception Rules”

X Y is an association rule

X I

X I is the reference rule

is the exception rule

¬ Y

I is the “interacting” itemset

Too many exceptions

The “cause” needs to be present

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 39: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

54

Anomaly detection: ATBARAnomaly detection: ATBAR

Anomalous association rules

AA#7 #7 AB#6 AC#4 AD#5 AE#3 AF#3AB#6 AC#4 AD#5 AE#3 AF#3

B B #9#9 C C #7#7 D D #8#8First First scanscan

A A #7#7

Second Second scanscan

B B #6#6 D D #5#5 Non-frequentNon-frequent

A A #7 #7 AA**

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 40: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

55

Anomaly detection: ATBARAnomaly detection: ATBAR

Anomalous association rules

B B #9#9 C C #7#7 D D #8#8First First scanscan

A A #7#7

Second Second scanscan

A A #7 #7 AA**

B B #6#6 D D #5#5

B B #9#9 BB** C C #7#7 CC** D D #8#8 DD**

C C #6#6 D D #7#7 D D #5#5

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 41: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

56

Anomaly detection: ATBARAnomaly detection: ATBAR

Anomalous association rules

Rule generation is immediate from the frequent and extended

itemsets obtained by ATBAR

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 42: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

57

Anomaly detection: ResultsAnomaly detection: Results

Experiments on health-related datasetsfrom the UCI Machine Learning Repository

Relatively small set of anomalous rules (typically, >90% reduction with respect to standard association rules)

Reasonable overhead needed to obtain anomalous association rules(about 20% in ATBAR w.r.t. TBAR)

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 43: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

58

Anomaly detection: ResultsAnomaly detection: Results

An example from the Census dataset:

if WORKCLASS: Local-govif WORKCLASS: Local-gov

then then

CAPGAIN: [99999.0 , 99999.0] (7 out of 7)CAPGAIN: [99999.0 , 99999.0] (7 out of 7)

when not CAPGAIN: [0.0 , 20051.0]when not CAPGAIN: [0.0 , 20051.0]

Usual Usual consequentconsequent

““Anomaly”Anomaly”

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 44: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

59

Anomalous association rules(novel characterization of potentially interesting knowledge)

An efficient algorithm for discovering anomalous association rules: ATBAR

Some heuristics for filtering the discovered anomalous association rules

Anomaly detection: ResultsAnomaly detection: Results

MotivationDefinitionDiscoveryVariationsVisualizationExtensionsApplications

ARTATBAR

Page 45: Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

Intelligent Databases and Information Systems research groupDepartment of Computer Science and Artificial IntelligenceE.T.S Ingeniería Informática – Universidad de Granada (Spain)

CEDI’2005CEDI’2005Taller de Minería de DatosTaller de Minería de Datos

Association Rules:Algorithms, variations,

extensions, and applications

Questions, comments, and suggestions…

Fernando [email protected]


Recommended