Date post: | 14-Apr-2017 |
Category: |
Education |
Upload: | ayesha-ali |
View: | 119 times |
Download: | 2 times |
Association Rule Mining
Ayesha Ali
Association Analysis
• Discovery of Association Rules – showing attribute-value conditions that occur
frequently together in a set of data, e.g. market basket
– Given a set of data, find rules that will predict the occurrence of a data item based on the occurrences of other items in the data
• A rule has the form body head⇒– buys(Omar, “milk”) buys(Omar, “sugar”)⇒
Association Analysis
Association AnalysisLocation Business Type
1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food
2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food
3 Carpenter, Electrician, Barber, Hardware Store,
4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop
5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food
6 Internet Café, Gym, Games Shop, Shorts Shop, Fast Food, Bakery
Association Rule: X Y ; (Fast Food, Bakery) (Convenience Store)
Support S: Fraction of items that contain both X and Y = P(X U Y) S(Fast Food, Bakery, Convenience Store) = 2/6 = .33
Confidence C: how often items in Y appear in locations that contain X = P(X U Y) C[(Fast Food, Bakery) (Convenience Store)] = P(X U Y) / P(X)
= 0.33/0.50 = .66
Association Analysis
• Given a set of transactions T, the goal of association rule mining is to find all rules having– support ≥ minsup threshold– confidence ≥ minconf threshold
• Brute-force approach:– List all possible association rules– Compute the support and confidence for each rule– Prune rules that fail the minsup and minconf thresholds
⇒ Computationally prohibitive!
Association AnalysisLocation Business Type
1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Meat Shop
2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food
3 Carpenter, Electrician, Barber, Hardware Store, Meat Shop
4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop
5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food
6 Internet Café, Gym, Sweets Shop, Shorts Shop, Fast Food, Bakery
Association Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .55(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66
Association AnalysisAssociation Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .66(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66
Observations
Above rules are binary partitions of given item set Identical Support but different Confidence Support and Confidence thresholds may be different
Mining Association Rules
• Two-step approach:
Step 1. Frequent Itemset GenerationGenerate all itemsets whose support ≥ minsup
Step 2. Rule GenerationGenerate high confidence rules from each frequent itemset,where each rule is a binary partitioning of a frequent itemset
Note: Frequent itemset generation is still computationally expensive
Mining Association Rules
• Frequent Item Generation
Lattice Graph of possible item sets
Mining Association Rules
• Brute-force approach:– Each node in the lattice graphs is a candidate frequent itemset– Count the support of each candidate by scanning the database
– N = 6– w = (Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Bookstore, Petrol Pump, Library, Carpenter,
Electrician, Hardware Store, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop, Hospital, Pharmacy, Sports Shop, Gym, Internet Café) = 20
– M = 220 = 1048576– Complexity ~ O (NMw)
Mining Association Rules
W Unique Items in Item set
Mining Association Rules
• Frequent Itemset Generation – Reduce the number of candidates (M)– Reduce the number of transactions/locations (N)– Reduce the number of comparisons (NM)• Use efficient data structures to store the candidates• No need to match every candidate against every
transaction/location
Reducing the number of candidates
• Apriori principle:– If an itemset is frequent, then all of its subsets must also
be frequent• Important Support property:
– Support of an itemset never exceeds the support of its subsets
– This is known as the anti-monotone property of support
Reducing the number of candidates
Applying Apriori principle
Reducing the number of candidates
• N = 20• All Possible candidate sets;
– NC1 + NC2 + NC3 + … + NCN
• Minimum Occurrence Based Filtering
Set m= 2 and L = 1While (L < N){
Scan DB: List = Create Occurrence Frequency Table of candidate sets of Length LIf no candidate in List then Break;
Filter all candidate sets with Occurrence Frequency < mCreate new candidate set of Length (L=L+1) from List
}
Filter Minimum Occurrences
m < 2
Reducing the number of candidatesBusiness Type Count
Barber 2
Bakery 2
Book tore 1
Carpenter 1
Convenience Store 3
Electrician 1
Fast Food 3
Flower Shop 1
Gym 1
Games Shop 1
Hardware Store 1
Hospital 1
Internet Café 1
Library 1
Meat Shop 1
Petrol Pump 1
Pharmacy 1
Sports Shop 1
Sweets Shop 1
Vegetable Market 1
Business Type CountBarber 2
Bakery 2
Convenience Store 3
Fast Food 3
Filter
Scan 1
Business Type Count(Barber, Bakery) 1
(Barber, Convenience Store) 1
(Barber, Fast Food) 1
(Bakery, Convenience Store) 2
(Bakery, Fast Food) 3
(Convenience Store, Fast Food) 3
Pairs of Two Items; 4C2 = 6
Business Type Count(Bakery, Convenience Store) 2
(Bakery, Fast Food) 3
(Convenience Store, Fast Food) 3
Filter Minimum Occurrences m < 2
L1
L2