Generating Non-Redundant Association Rules Mohammed J. Zaki.

Post on 20-Dec-2015

217 views 2 download

Tags:

transcript

Generating Generating Non-RedundantNon-Redundant

Association RulesAssociation Rules

MohammedMohammed J. Zaki J. Zaki

Yaeer Master© Yaeer Master© 22

OutlineOutline

IntroductionIntroduction Association Rules – reminderAssociation Rules – reminder Closed Frequent ItemsetsClosed Frequent Itemsets Generating RulesGenerating Rules Complexity AnalysisComplexity Analysis Experimental EvaluationExperimental Evaluation

Yaeer Master© Yaeer Master© 33

IntroductionIntroduction Association Rule Discovery – Association Rule Discovery – The set of The set of

association rules can grow to be unwieldy association rules can grow to be unwieldy especially as we lower the frequency especially as we lower the frequency requirement (support).requirement (support).

Many rules are redundant.Many rules are redundant. Number of redundant rules can be Number of redundant rules can be

exponential in the length of the longest exponential in the length of the longest frequent itemset.frequent itemset.

For dense datasets it is not feasible to For dense datasets it is not feasible to mine all frequent itemsets.mine all frequent itemsets.

Yaeer Master© Yaeer Master© 44

IntroductionIntroduction

Solution:Solution: Using Using Closed Frequent ItemsetsClosed Frequent Itemsets::

The set is smaller in orders of magnitude.The set is smaller in orders of magnitude.No loss of information.No loss of information.Creating a “Generating Set”.Creating a “Generating Set”.

Algorithm for mining closed itemsets:Algorithm for mining closed itemsets:

CHARMCHARM

Yaeer Master© Yaeer Master© 55

Association Rules

Yaeer Master© Yaeer Master© 66

Mining Association RulesMining Association Rules

Yaeer Master© Yaeer Master© 77

Mining Association RulesMining Association Rules

Find all frequent itemsets:Find all frequent itemsets: 22mm : NP-Complete. : NP-Complete.Assuming a bound on transaction lengthAssuming a bound on transaction length

O (r · n · O (r · n · 22LL) .) .

Generating confident rules:Generating confident rules:For each itemset of size k, 2For each itemset of size k, 2k k potential rules. potential rules.Complexity: O (f · Complexity: O (f · 22LL).).

Num of max

frequent

itemsets

Num of

transactions

Longest

frequent

itemset

Num of

frequent

itemsets

Longest

frequent

itemset

Yaeer Master© Yaeer Master© 88

Closed Frequent Itemsets – Closed Frequent Itemsets – Defining a Galois connectionDefining a Galois connection

The MappingsThe Mappings:: Let:Let:

Define a Define a Galois Connection Galois Connection between the between the partially ordered sets P(I) , P(T).partially ordered sets P(I) , P(T).

Galois connection:Galois connection:For all a in A and b in B:For all a in A and b in B:

F (a) ≤ b ↔ G (b) ≤ aF (a) ≤ b ↔ G (b) ≤ a

Yaeer Master© Yaeer Master© 99

Galois Connection ContGalois Connection Cont..

Properties:Properties:

1.1.

2.2.

3.3.

))(())(( 22112211 XXttXXttXXXX

))(())(( 22112211 YYiiYYiiYYYY

))))(((( ))))(((( YYiittYYandandXXttiiXX

Yaeer Master© Yaeer Master© 1010

Galois ConnectionGalois Connection

Yaeer Master© Yaeer Master© 1111

ExampleExample

t (ACW) = t (A) ∩ t (C) ∩ t (W) t (ACW) = t (A) ∩ t (C) ∩ t (W)

= 1345 ∩ 123456 ∩ 12345= 1345 ∩ 123456 ∩ 12345

= 1345= 1345

i (245) = CDWi (245) = CDW

ACW ACW ACDW ACDW t (ACW)t (ACW) = 1345 = 1345 135 = 135 = t (ACDW)t (ACDW)

Yaeer Master© Yaeer Master© 1212

Closure OperatorClosure Operator

c: P(s) c: P(s) P(s) if satisfies the following: P(s) if satisfies the following:1.1.

2.2.

3.3.

Closure Composition:Closure Composition: ccit it (x) = i • t (x) = i(t(x))(x) = i • t (x) = i(t(x))

ccti ti (x)(x)

))((:: XXccXXExtensionExtension

))(())((:: YYccXXccYYXXtytyMonotoniciMonotonici

))(())))((((:: XXccXXccccyyIdempotencIdempotenc

Yaeer Master© Yaeer Master© 1313

Closure Operator – Round TripClosure Operator – Round Trip

Yaeer Master© Yaeer Master© 1414

Closed Itemset - DefinitionClosed Itemset - Definition

A A Closed Itemset Closed Itemset X is an Itemset that X is an Itemset that is same as its closureis same as its closure..

Example : Example :

ccit it (AC) = i(t(AC) = i(1345) = ACW(AC) = i(t(AC) = i(1345) = ACW

conclusion: AC is not closed.conclusion: AC is not closed.

ACW is closed.ACW is closed.

Yaeer Master© Yaeer Master© 1515

Closed Vs Frequent itemsetsClosed Vs Frequent itemsets

Yaeer Master© Yaeer Master© 1616

Concept - DefinitionConcept - Definition

For any Closed Itemset X, there exists a For any Closed Itemset X, there exists a Closed Tidset Y, with the property:Closed Tidset Y, with the property:Y = t(X).Y = t(X).

The Pair X × Y is called a The Pair X × Y is called a Concept.Concept.

Yaeer Master© Yaeer Master© 1717

Galois LatticeGalois Lattice

A concept xA concept x1 1 × × yy1 1 iis a sub concept of s a sub concept of

xx2 2 × × yy2 2 , If , If xx1 1 xx2 2 (if (if yy2 2 yy11).).

Let B(Let B(δδ) be the set of all concepts.) be the set of all concepts. The ordered set (B(The ordered set (B(δδ),≤) is a complete ),≤) is a complete

lattice, called the Galois lattice.lattice, called the Galois lattice.

Yaeer Master© Yaeer Master© 1818

Galois Lattice Of ConceptsGalois Lattice Of Concepts

Yaeer Master© Yaeer Master© 1919

Frequent Closed ItemSets Vs. Frequent Closed ItemSets Vs. Frequent ItemsetsFrequent Itemsets

Lattice operationsLattice operations Join:Join: Meet:Meet:

Frequent Concept:Frequent Concept:With support greater than minsup, With support greater than minsup,

We define the support is the cardinality of the We define the support is the cardinality of the closed tidset.closed tidset.

)()()()( 21212211 YYXXcYXYX it )()()()( 21212211 YYcXXYXYX ti

Yaeer Master© Yaeer Master© 2020

Join Meet ExampleJoin Meet Example

Join:Join:

(ACDW × 45) (ACDW × 45) (CDT × 56) =(CDT × 56) =

ccitit))ACDW ACDW CDT) × (45 CDT) × (45 56) = 56) =

ACDTW × 5ACDTW × 5

Meet:Meet:

(ACDW × 45) (ACDW × 45) (CDT 56) =(CDT 56) =

(ACDW (ACDW CDT) × CDT) × cctiti(45(455566) =) =

CD × 2456CD × 2456

Yaeer Master© Yaeer Master© 2121

Frequent ConceptsFrequent Concepts

Yaeer Master© Yaeer Master© 2222

Frequent ConceptsFrequent Concepts

Lemma 1:Lemma 1:An itemset’s (X) support is equal to the support An itemset’s (X) support is equal to the support

of its closure, i.e. of its closure, i.e. σ(X) = σ(cσ(X) = σ(citit(X)).(X)).

Therefore all frequent itemsets are uniquely Therefore all frequent itemsets are uniquely determined by the Closed itemsets and can determined by the Closed itemsets and can be determined by the join operation on the be determined by the join operation on the frequent conceptsfrequent concepts..

Yaeer Master© Yaeer Master© 2323

Redundant RulesRedundant Rules

Definition:Definition:A rule A rule RR1 1 ::

is more general than a rule is more general than a rule RR2 2 denoted denoted RR11 ‹ ‹ RR22 , ,

provided that provided that RR22 can be generated by adding can be generated by adding

additional items to the antecedent or consequent additional items to the antecedent or consequent of of RR1 1 ..

The The Non-RedundantNon-Redundant rules are those that are most rules are those that are most general (with equal confidence).general (with equal confidence).

iippii XXXX ii

2211

Yaeer Master© Yaeer Master© 2424

Rule GenerationRule Generation

Lemma 2:Lemma 2:Transitivity: Let Transitivity: Let XX11, X, X22, X, X33 be frequent closed be frequent closed

itemsets, withitemsets, with

If , thenIf , then

Observation: it is sufficient to consider rules among Observation: it is sufficient to consider rules among

adjacent concepts.adjacent concepts.

321 XXX

3322 XXXXqq2211 XXXX

pp

3311 XXXXpqpq

Yaeer Master© Yaeer Master© 2525

Rule Generation – 100% confRule Generation – 100% conf..

Lemma 3:Lemma 3:An association rule has confidence p = 1.0 An association rule has confidence p = 1.0

If and only if .If and only if .

100% confidence rules are those directed from 100% confidence rules are those directed from a super-concept to a sub-concept,a super-concept to a sub-concept,i.e. Down Arcs.i.e. Down Arcs.

2200..11

11 XXXX

))(())(( 2211 XXttXXtt

Yaeer Master© Yaeer Master© 2626

Rule Generation – 100% confRule Generation – 100% conf..

Yaeer Master© Yaeer Master© 2727

Rule Generation – 100% confRule Generation – 100% conf

Theorem 1.Theorem 1.Let R = {Let R = {RR1 1 ,…, ,…, RRnn} be a set of rules with 100%} be a set of rules with 100%

confidence (pconfidence (pii for all i), such that for all i), such that

for all rules Rfor all rules Rii..

Let RLet RII denote the 100% confidence rule denote the 100% confidence rule

Then all rules RThen all rules Rii ≠ R ≠ RI I are more specific than are more specific than

, and thus are redundant. , and thus are redundant.

))(( andand ))(( 2222221111ii

ititiiii

itit XXccIIXXXXccII

2200..11

11 IIII

Yaeer Master© Yaeer Master© 2828

Rule Generation – 100% confRule Generation – 100% conf

Example:Example:TWTWAA , TW , TWAC , CTWAC , CTWA A

ccit it (TW (TW A) = A) = ccitit (ATW) = ACTW (ATW) = ACTW

ccitit (TW (TW AC) = ACTW AC) = ACTW

ccitit (CTW (CTW A) = ACTW A) = ACTW

The most general

Yaeer Master© Yaeer Master© 2929

Rule Generation – Rule Generation – Confidence <100%Confidence <100%

Rules from sub-concepts to super-Rules from sub-concepts to super-concepts i.e. correspond to up-arcs.concepts i.e. correspond to up-arcs.

Rules between non adjacent concepts can Rules between non adjacent concepts can be derived by transitivity.be derived by transitivity.For example:For example:

CCW (with p= 0.83) and WW (with p= 0.83) and WA (q=0.8)A (q=0.8)

C C A (pq = 0.67)A (pq = 0.67)

Yaeer Master© Yaeer Master© 3030

Rule Generation – Rule Generation – Confidence <100%Confidence <100%

Yaeer Master© Yaeer Master© 3131

Rule Generation – Rule Generation – Confidence <100%Confidence <100%

Theorem 2.Theorem 2.Let R = {Let R = {RR1 1 ,…, ,…, RRnn} be a set of rules with } be a set of rules with

confidence p< 1.0 (pconfidence p< 1.0 (pii for all i), such that for all i), such that

for all rules Rfor all rules Rii..

Let RLet RII denote the rule denote the rule

Then all rules RThen all rules Rii ≠ R ≠ RI I are more specific than are more specific than

RRII , and thus are redundant. , and thus are redundant.

))(( andand ))(( 2211222211iiii

ititii

itit XXXXccIIXXccII

2211 IIII pp

Yaeer Master© Yaeer Master© 3232

Generating SetGenerating Set

Combining the two sets gives us a generating Combining the two sets gives us a generating set for rules with set for rules with

minconf = 50% and minsup = 80%minconf = 50% and minsup = 80%::

}}TW→A , A→W , W→C , T→C , D→C ,TW→A , A→W , W→C , T→C , D→C ,

W→A W→A (0.8) , (0.8) , CC→W →W (0.83)(0.83) } }

All association rules canAll association rules can

Be derived from this setBe derived from this set

Yaeer Master© Yaeer Master© 3333

Complexity of Rule GenerationComplexity of Rule Generation

Traditional:Traditional:

New Framework:New Framework: Best case: one closed itemset , no rules.Best case: one closed itemset , no rules. Worst case:Worst case:

All frequent itemsets are closed.All frequent itemsets are closed. Number of rules:Number of rules: Reduction factor:Reduction factor:

))22((2222222222 22000000

llllllllii

llii

llllllii

llii

iillllii

llii OO

))llOOlliill llll

ii

ll

ii

ll

ii

ll

ii22 (( ))((

0000

))22(( llOOll

ll22

Yaeer Master© Yaeer Master© 3434

Experimental EvaluationExperimental Evaluation

Yaeer Master© Yaeer Master© 3535

Experimental EvaluationExperimental Evaluation

Yaeer Master© Yaeer Master© 3636

Experimental EvaluationExperimental Evaluation

Yaeer Master© Yaeer Master© 3737

Number of RulesNumber of RulesTraditional Vs Closed itemsetTraditional Vs Closed itemset

Yaeer Master© Yaeer Master© 3838

Number of RulesNumber of RulesTraditional Vs Closed itemsetTraditional Vs Closed itemset

Yaeer Master© Yaeer Master© 3939

ConclusionConclusion

The new framework based on closed The new framework based on closed itemsets can drastically reduce the rule itemsets can drastically reduce the rule set, and can be presented to the user in a set, and can be presented to the user in a succinct manner.succinct manner.

Future work:Future work: Interactive visualization and exploration of Interactive visualization and exploration of

mined associations, generating rules on mined associations, generating rules on demand based on user’s interest.demand based on user’s interest.

Finding a minimal generating set.Finding a minimal generating set.