Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 2 times |
Generating Generating Non-RedundantNon-Redundant
Association RulesAssociation Rules
MohammedMohammed J. Zaki J. Zaki
Yaeer Master© Yaeer Master© 22
OutlineOutline
IntroductionIntroduction Association Rules – reminderAssociation Rules – reminder Closed Frequent ItemsetsClosed Frequent Itemsets Generating RulesGenerating Rules Complexity AnalysisComplexity Analysis Experimental EvaluationExperimental Evaluation
Yaeer Master© Yaeer Master© 33
IntroductionIntroduction Association Rule Discovery – Association Rule Discovery – The set of The set of
association rules can grow to be unwieldy association rules can grow to be unwieldy especially as we lower the frequency especially as we lower the frequency requirement (support).requirement (support).
Many rules are redundant.Many rules are redundant. Number of redundant rules can be Number of redundant rules can be
exponential in the length of the longest exponential in the length of the longest frequent itemset.frequent itemset.
For dense datasets it is not feasible to For dense datasets it is not feasible to mine all frequent itemsets.mine all frequent itemsets.
Yaeer Master© Yaeer Master© 44
IntroductionIntroduction
Solution:Solution: Using Using Closed Frequent ItemsetsClosed Frequent Itemsets::
The set is smaller in orders of magnitude.The set is smaller in orders of magnitude.No loss of information.No loss of information.Creating a “Generating Set”.Creating a “Generating Set”.
Algorithm for mining closed itemsets:Algorithm for mining closed itemsets:
CHARMCHARM
Yaeer Master© Yaeer Master© 55
Association Rules
Yaeer Master© Yaeer Master© 66
Mining Association RulesMining Association Rules
Yaeer Master© Yaeer Master© 77
Mining Association RulesMining Association Rules
Find all frequent itemsets:Find all frequent itemsets: 22mm : NP-Complete. : NP-Complete.Assuming a bound on transaction lengthAssuming a bound on transaction length
O (r · n · O (r · n · 22LL) .) .
Generating confident rules:Generating confident rules:For each itemset of size k, 2For each itemset of size k, 2k k potential rules. potential rules.Complexity: O (f · Complexity: O (f · 22LL).).
Num of max
frequent
itemsets
Num of
transactions
Longest
frequent
itemset
Num of
frequent
itemsets
Longest
frequent
itemset
Yaeer Master© Yaeer Master© 88
Closed Frequent Itemsets – Closed Frequent Itemsets – Defining a Galois connectionDefining a Galois connection
The MappingsThe Mappings:: Let:Let:
Define a Define a Galois Connection Galois Connection between the between the partially ordered sets P(I) , P(T).partially ordered sets P(I) , P(T).
Galois connection:Galois connection:For all a in A and b in B:For all a in A and b in B:
F (a) ≤ b ↔ G (b) ≤ aF (a) ≤ b ↔ G (b) ≤ a
Yaeer Master© Yaeer Master© 99
Galois Connection ContGalois Connection Cont..
Properties:Properties:
1.1.
2.2.
3.3.
))(())(( 22112211 XXttXXttXXXX
))(())(( 22112211 YYiiYYiiYYYY
))))(((( ))))(((( YYiittYYandandXXttiiXX
Yaeer Master© Yaeer Master© 1010
Galois ConnectionGalois Connection
Yaeer Master© Yaeer Master© 1111
ExampleExample
t (ACW) = t (A) ∩ t (C) ∩ t (W) t (ACW) = t (A) ∩ t (C) ∩ t (W)
= 1345 ∩ 123456 ∩ 12345= 1345 ∩ 123456 ∩ 12345
= 1345= 1345
i (245) = CDWi (245) = CDW
ACW ACW ACDW ACDW t (ACW)t (ACW) = 1345 = 1345 135 = 135 = t (ACDW)t (ACDW)
Yaeer Master© Yaeer Master© 1212
Closure OperatorClosure Operator
c: P(s) c: P(s) P(s) if satisfies the following: P(s) if satisfies the following:1.1.
2.2.
3.3.
Closure Composition:Closure Composition: ccit it (x) = i • t (x) = i(t(x))(x) = i • t (x) = i(t(x))
ccti ti (x)(x)
))((:: XXccXXExtensionExtension
))(())((:: YYccXXccYYXXtytyMonotoniciMonotonici
))(())))((((:: XXccXXccccyyIdempotencIdempotenc
Yaeer Master© Yaeer Master© 1313
Closure Operator – Round TripClosure Operator – Round Trip
Yaeer Master© Yaeer Master© 1414
Closed Itemset - DefinitionClosed Itemset - Definition
A A Closed Itemset Closed Itemset X is an Itemset that X is an Itemset that is same as its closureis same as its closure..
Example : Example :
ccit it (AC) = i(t(AC) = i(1345) = ACW(AC) = i(t(AC) = i(1345) = ACW
conclusion: AC is not closed.conclusion: AC is not closed.
ACW is closed.ACW is closed.
Yaeer Master© Yaeer Master© 1515
Closed Vs Frequent itemsetsClosed Vs Frequent itemsets
Yaeer Master© Yaeer Master© 1616
Concept - DefinitionConcept - Definition
For any Closed Itemset X, there exists a For any Closed Itemset X, there exists a Closed Tidset Y, with the property:Closed Tidset Y, with the property:Y = t(X).Y = t(X).
The Pair X × Y is called a The Pair X × Y is called a Concept.Concept.
Yaeer Master© Yaeer Master© 1717
Galois LatticeGalois Lattice
A concept xA concept x1 1 × × yy1 1 iis a sub concept of s a sub concept of
xx2 2 × × yy2 2 , If , If xx1 1 xx2 2 (if (if yy2 2 yy11).).
Let B(Let B(δδ) be the set of all concepts.) be the set of all concepts. The ordered set (B(The ordered set (B(δδ),≤) is a complete ),≤) is a complete
lattice, called the Galois lattice.lattice, called the Galois lattice.
Yaeer Master© Yaeer Master© 1818
Galois Lattice Of ConceptsGalois Lattice Of Concepts
Yaeer Master© Yaeer Master© 1919
Frequent Closed ItemSets Vs. Frequent Closed ItemSets Vs. Frequent ItemsetsFrequent Itemsets
Lattice operationsLattice operations Join:Join: Meet:Meet:
Frequent Concept:Frequent Concept:With support greater than minsup, With support greater than minsup,
We define the support is the cardinality of the We define the support is the cardinality of the closed tidset.closed tidset.
)()()()( 21212211 YYXXcYXYX it )()()()( 21212211 YYcXXYXYX ti
Yaeer Master© Yaeer Master© 2020
Join Meet ExampleJoin Meet Example
Join:Join:
(ACDW × 45) (ACDW × 45) (CDT × 56) =(CDT × 56) =
ccitit))ACDW ACDW CDT) × (45 CDT) × (45 56) = 56) =
ACDTW × 5ACDTW × 5
Meet:Meet:
(ACDW × 45) (ACDW × 45) (CDT 56) =(CDT 56) =
(ACDW (ACDW CDT) × CDT) × cctiti(45(455566) =) =
CD × 2456CD × 2456
Yaeer Master© Yaeer Master© 2121
Frequent ConceptsFrequent Concepts
Yaeer Master© Yaeer Master© 2222
Frequent ConceptsFrequent Concepts
Lemma 1:Lemma 1:An itemset’s (X) support is equal to the support An itemset’s (X) support is equal to the support
of its closure, i.e. of its closure, i.e. σ(X) = σ(cσ(X) = σ(citit(X)).(X)).
Therefore all frequent itemsets are uniquely Therefore all frequent itemsets are uniquely determined by the Closed itemsets and can determined by the Closed itemsets and can be determined by the join operation on the be determined by the join operation on the frequent conceptsfrequent concepts..
Yaeer Master© Yaeer Master© 2323
Redundant RulesRedundant Rules
Definition:Definition:A rule A rule RR1 1 ::
is more general than a rule is more general than a rule RR2 2 denoted denoted RR11 ‹ ‹ RR22 , ,
provided that provided that RR22 can be generated by adding can be generated by adding
additional items to the antecedent or consequent additional items to the antecedent or consequent of of RR1 1 ..
The The Non-RedundantNon-Redundant rules are those that are most rules are those that are most general (with equal confidence).general (with equal confidence).
iippii XXXX ii
2211
Yaeer Master© Yaeer Master© 2424
Rule GenerationRule Generation
Lemma 2:Lemma 2:Transitivity: Let Transitivity: Let XX11, X, X22, X, X33 be frequent closed be frequent closed
itemsets, withitemsets, with
If , thenIf , then
Observation: it is sufficient to consider rules among Observation: it is sufficient to consider rules among
adjacent concepts.adjacent concepts.
321 XXX
3322 XXXXqq2211 XXXX
pp
3311 XXXXpqpq
Yaeer Master© Yaeer Master© 2525
Rule Generation – 100% confRule Generation – 100% conf..
Lemma 3:Lemma 3:An association rule has confidence p = 1.0 An association rule has confidence p = 1.0
If and only if .If and only if .
100% confidence rules are those directed from 100% confidence rules are those directed from a super-concept to a sub-concept,a super-concept to a sub-concept,i.e. Down Arcs.i.e. Down Arcs.
2200..11
11 XXXX
))(())(( 2211 XXttXXtt
Yaeer Master© Yaeer Master© 2626
Rule Generation – 100% confRule Generation – 100% conf..
Yaeer Master© Yaeer Master© 2727
Rule Generation – 100% confRule Generation – 100% conf
Theorem 1.Theorem 1.Let R = {Let R = {RR1 1 ,…, ,…, RRnn} be a set of rules with 100%} be a set of rules with 100%
confidence (pconfidence (pii for all i), such that for all i), such that
for all rules Rfor all rules Rii..
Let RLet RII denote the 100% confidence rule denote the 100% confidence rule
Then all rules RThen all rules Rii ≠ R ≠ RI I are more specific than are more specific than
, and thus are redundant. , and thus are redundant.
))(( andand ))(( 2222221111ii
ititiiii
itit XXccIIXXXXccII
2200..11
11 IIII
Yaeer Master© Yaeer Master© 2828
Rule Generation – 100% confRule Generation – 100% conf
Example:Example:TWTWAA , TW , TWAC , CTWAC , CTWA A
ccit it (TW (TW A) = A) = ccitit (ATW) = ACTW (ATW) = ACTW
ccitit (TW (TW AC) = ACTW AC) = ACTW
ccitit (CTW (CTW A) = ACTW A) = ACTW
The most general
Yaeer Master© Yaeer Master© 2929
Rule Generation – Rule Generation – Confidence <100%Confidence <100%
Rules from sub-concepts to super-Rules from sub-concepts to super-concepts i.e. correspond to up-arcs.concepts i.e. correspond to up-arcs.
Rules between non adjacent concepts can Rules between non adjacent concepts can be derived by transitivity.be derived by transitivity.For example:For example:
CCW (with p= 0.83) and WW (with p= 0.83) and WA (q=0.8)A (q=0.8)
C C A (pq = 0.67)A (pq = 0.67)
Yaeer Master© Yaeer Master© 3030
Rule Generation – Rule Generation – Confidence <100%Confidence <100%
Yaeer Master© Yaeer Master© 3131
Rule Generation – Rule Generation – Confidence <100%Confidence <100%
Theorem 2.Theorem 2.Let R = {Let R = {RR1 1 ,…, ,…, RRnn} be a set of rules with } be a set of rules with
confidence p< 1.0 (pconfidence p< 1.0 (pii for all i), such that for all i), such that
for all rules Rfor all rules Rii..
Let RLet RII denote the rule denote the rule
Then all rules RThen all rules Rii ≠ R ≠ RI I are more specific than are more specific than
RRII , and thus are redundant. , and thus are redundant.
))(( andand ))(( 2211222211iiii
ititii
itit XXXXccIIXXccII
2211 IIII pp
Yaeer Master© Yaeer Master© 3232
Generating SetGenerating Set
Combining the two sets gives us a generating Combining the two sets gives us a generating set for rules with set for rules with
minconf = 50% and minsup = 80%minconf = 50% and minsup = 80%::
}}TW→A , A→W , W→C , T→C , D→C ,TW→A , A→W , W→C , T→C , D→C ,
W→A W→A (0.8) , (0.8) , CC→W →W (0.83)(0.83) } }
All association rules canAll association rules can
Be derived from this setBe derived from this set
Yaeer Master© Yaeer Master© 3333
Complexity of Rule GenerationComplexity of Rule Generation
Traditional:Traditional:
New Framework:New Framework: Best case: one closed itemset , no rules.Best case: one closed itemset , no rules. Worst case:Worst case:
All frequent itemsets are closed.All frequent itemsets are closed. Number of rules:Number of rules: Reduction factor:Reduction factor:
))22((2222222222 22000000
llllllllii
llii
llllllii
llii
iillllii
llii OO
))llOOlliill llll
ii
ll
ii
ll
ii
ll
ii22 (( ))((
0000
))22(( llOOll
ll22
Yaeer Master© Yaeer Master© 3434
Experimental EvaluationExperimental Evaluation
Yaeer Master© Yaeer Master© 3535
Experimental EvaluationExperimental Evaluation
Yaeer Master© Yaeer Master© 3636
Experimental EvaluationExperimental Evaluation
Yaeer Master© Yaeer Master© 3737
Number of RulesNumber of RulesTraditional Vs Closed itemsetTraditional Vs Closed itemset
Yaeer Master© Yaeer Master© 3838
Number of RulesNumber of RulesTraditional Vs Closed itemsetTraditional Vs Closed itemset
Yaeer Master© Yaeer Master© 3939
ConclusionConclusion
The new framework based on closed The new framework based on closed itemsets can drastically reduce the rule itemsets can drastically reduce the rule set, and can be presented to the user in a set, and can be presented to the user in a succinct manner.succinct manner.
Future work:Future work: Interactive visualization and exploration of Interactive visualization and exploration of
mined associations, generating rules on mined associations, generating rules on demand based on user’s interest.demand based on user’s interest.
Finding a minimal generating set.Finding a minimal generating set.