1998 년 8 월 7 일 Data Engineering Lab 성 유진 1
Exploratory Mining and Pruning Optimization of Constrained
Associations Rules
1998 년 8 월 7 일 Data Engineering Lab 성 유진 2
Abstract
• Standpoint of supporting human-centered discovery of Knowledge– lack of user exploration and control– lack of focus– rigid notion of relationship
• Constrained association queries– pruning using monotonicity, succinctness
1998 년 8 월 7 일 Data Engineering Lab 성 유진 3
Introduction• Problem1 (Lack of User Exploration
and Control)– Mining Process => Black Box – (user can’t preempt and needs to wait for hours)– establish clear breakpoints to allow user
feedback
• Problem2 (Lack of Focus)– on which to focus the mining to find association between sets of items whose
types do not overlap
1998 년 8 월 7 일 Data Engineering Lab 성 유진 4
associations from item sets whose total price is at least $1,000
– provide a rich interface for the user to express focus (CAQ)
• Problem3 (Rigid notion of Relationship)– significance metrics :– separate criteria for selecting candidates for
the antecedent and consequent: association from items to sets of types pepsi => snacks
1998 년 8 월 7 일 Data Engineering Lab 성 유진 5
1998 년 8 월 7 일 Data Engineering Lab 성 유진 6
Architecture• Phase 1
– user initially specifies CAQ• includes a set of constraints C• C is applicable to the antecedent and consequent
– output: • pairs of candidates(Sa, Sc)
• Sa, Sc have support over thresholds
– user can add, delete, of modify the constraints as many times as desired
1998 년 8 월 7 일 Data Engineering Lab 성 유진 7
• Phase 2– significance metric – a threshold for the metric– whatever further conditions to be imposed ont
the antecedent and consequent classical association mining - confidence (as significance metric) - confidence threshold - require ( SaSc) be frequent
1998 년 8 월 7 일 Data Engineering Lab 성 유진 8
1998 년 8 월 7 일 Data Engineering Lab 성 유진 9
Constrained Association Queries• CAQ
– S Item : S is a set variable on the Item domain
– {(S1, S2) |C}, C is a set of constraints on S1, S2
– frequent constraints freq(Si)– trans(TID, Itemset), iteminfo(Item, Type, Price)– S.price 100 : all items in S are of price less
than of equal to $100– {snacks, sodas} S.Type
1998 년 8 월 7 일 Data Engineering Lab 성 유진 10
• CAQ Examples– {(S1, S2) | S1 Item & S2 Item & count(S1) = 1 & count(S2) = 1
& freq(S1) & freq(S2)}• S1.Type S2.Type and max(S1.Price) avg(S2.Price)
– {(S1, S2) | agg1(S1.Price) 100 & agg2(S2.Price 1000}
– {(S1, S2) | S1.Type {Snacks} & S2.Type {beers} & max(S1.Price) min(S2.Price)
• Sound/Complete– algorithm is sound if it only finds frequent sets that satisfy the
given constraints– algorithm is complete if all frequent sets satisfying the given
constraints are found
1998 년 8 월 7 일 Data Engineering Lab 성 유진 11
• Goal– to push the constraints as deeply as possible
inside the computation of frequent set– classical algorithm + test them for constraint
satisfaction => too inefficient– sound/complete : anti-monotone, succinctness
1998 년 8 월 7 일 Data Engineering Lab 성 유진 12
Anti-Monotone Constraints
• Find constraints which satisfy anti-monotone – prune away a significant num of candidates
• Definition – A 1-var constraint C is anti-monotone iff for all sets
S, S’: • S S’ & S satisfies C S’ satisfies C
• Identify which constraints are anti-monotone– Fig3– min(S) v (anti-monotone) , min(S) v (not )
1998 년 8 월 7 일 Data Engineering Lab 성 유진 13
1998 년 8 월 7 일 Data Engineering Lab 성 유진 14
Succinct Constraints• once-and-for-all (before any iteration takes place)
– not generate and test paradigm– how to
• succinctness • member generating functions
– definition• SATc(Item) : the set of item sets satisfying C , pruned space
– C1 S.Price 100 , pruned space for C1 contains only item sets such that each item in the set has a price at least $100
• selection predicate, p
1998 년 8 월 7 일 Data Engineering Lab 성 유진 15
1998 년 8 월 7 일 Data Engineering Lab 성 유진 16
Example
C1 S.Price 100 , let Item1 = price 100 (Item):
C1 is succinct because its pruned space SATc1(Item) is simply 2item1
C2 {snacks, sodas} S.Type : Let Item2, Item3 ,
Item4 be the sets type = ‘snacks’(Item), type = ‘sodas’(Item) , type ‘snacks’
type ‘sodas’ (Item)
C2 is succint SATC2(Item) can be expressed as 2item - 2item2 - 2item3 - 2item4 - 2item2 item4 - 2item3 item4
1998 년 8 월 7 일 Data Engineering Lab 성 유진 17
Example
C1 S.Price 100, MGF = {X |X Item1 & C }
C2 {snacks, sodas} S.Type, MGF = {X1 X2 X3|
X1 Item2 & X1 & X2 Item3 & X2 & X3 Item4}
1998 년 8 월 7 일 Data Engineering Lab 성 유진 18
Algorithms • Algorithm Apriori+
– computes the frequent set => among frequent set, those which satisfy constraints become answer set
• Algorithm Hybrid(m)– in case (C - Cfreq ) is more selective , apriori+ is
inefficient – First check Cfreq for m iterations – to reduce the remaining I/O cost, it switches to
checking (C- Cfreq)
1998 년 8 월 7 일 Data Engineering Lab 성 유진 19
1998 년 8 월 7 일 Data Engineering Lab 성 유진 20
CAP algorithm• 4 Cases succinct and Anti-monotone
– Replace C1 in the Apriori Algorithm by C1c
succinct but not anti-monotone
1998 년 8 월 7 일 Data Engineering Lab 성 유진 21
Anti-monotone but Non-succinct – Define Ck as in apriori algorithm, drop the candidates S if S fails
C– constraint satisfaction is tested before counting is done
neither– Induce any weaker constraint C’ from C, depending on
whether C’ is anti-monotone and /or sucinct, use the above strategies
– Once all frequent sets are generated, test them for satisfaction of C
1998 년 8 월 7 일 Data Engineering Lab 성 유진 22