8/10/2019 Kdd2014 Hamalainen Webb Discovery
1/116
Tutorial KDD14 New York
STATISTICALLY SOUNDPATTERN DISCOVERY
Wilhelmiina HmlinenUniversity of EasternFinland
Geoff WebbMonash UniversityAustralia
[email protected]://www.cs.joensuu.fi/pages/whamalai/kdd14/sspdtutorial.html
SSPD tutorial KDD14 p. 1
http://www.cs.joensuu.fi/pages/whamalai/kdd14/http://localhost/var/www/apps/conversion/tmp/scratch_4/sspdtutorial.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_4/sspdtutorial.htmlhttp://www.cs.joensuu.fi/pages/whamalai/kdd14/8/10/2019 Kdd2014 Hamalainen Webb Discovery
2/116
Statistically sound pattern discovery: Problem
000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111POPULATIONclean and accurateusually infinite SAMPLEmay contain noiseREAL PATTERNS
(with some tool)
PATTERNS FOUND
FROM THE SAMPLE
?
REAL WORLDIDEAL WORLD
SSPD tutorial KDD14 p. 2
000000111111
8/10/2019 Kdd2014 Hamalainen Webb Discovery
3/116
Statistically sound pattern discovery: Problem
000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111 000 000 000 000 000 000 000 000 000 111 111 111 111 111 111 111 111 111
POPULATION
clean and accurate
usually infinite SAMPLE
may contain noise
REAL PATTERNS
REAL WORLDIDEAL WORLD
?
(with some tool)
PATTERNS FOUND
FROM THE SAMPLE
SSPD tutorial KDD14 p. 3
8/10/2019 Kdd2014 Hamalainen Webb Discovery
4/116
Statistically Sound vs. Unsound DM?
Pattern-type-rst :Given a desired classicalpattern, invent a searchmethod.
Method-rst :Invent a new patterntype which has an easysearch method
e.g., an antimonotonicinterestingness propertyTricks to sell it:
overload statisticaltermsdont specify exactly
SSPD tutorial KDD14 p. 4
8/10/2019 Kdd2014 Hamalainen Webb Discovery
5/116
Statistically Sound vs. Unsound DM?
Pattern-type-rst :Given a desired classicalpattern, invent a searchmethod.
+ easy to interpretecorrectly
+ informative+ likely to hold in future
computationally de-manding
Method-rst :Invent a new patterntype which has an easysearch method
difcult to interprete misleading
information no guarantees on
validity+ computationally easy
SSPD tutorial KDD14 p. 5
8/10/2019 Kdd2014 Hamalainen Webb Discovery
6/116
Statistically sound pattern discovery: Scope
Other patterns?Statistical dependencypatterns
Dependency rules Correlated itemsets
PATTERNS MODELS
timeseries?
Discussion
Part I(Wilhelmiina)
Part II(Geoff)
graphs?
loglinear models classifiers
SSPD tutorial KDD14 p. 6
8/10/2019 Kdd2014 Hamalainen Webb Discovery
7/116
Contents
Overview (statistical dependency patterns)Part I
Dependency rulesStatistical signicance testing
Coffee break (10:00-10:30)Signicance of improvement
Part IICorrelated itemsets (self-sufcient itemsets)Signicance tests for genuine set dependencies
DiscussionSSPD tutorial KDD14 p. 7
8/10/2019 Kdd2014 Hamalainen Webb Discovery
8/116
Statistical dependence: Many interpretations!
Events ( X = x) and (Y = y) are statistically indepen-dent , if P( X = x, Y = y) = P( X = x)P (Y = y).
When variables (or variable-value combinations) arestatistically dependent ?When the dependency is genuine? measures for the strength and signicance ofdependenceHow to dene mutual dependence between three ormore variables?
SSPD tutorial KDD14 p. 8
8/10/2019 Kdd2014 Hamalainen Webb Discovery
9/116
Statistical dependence: 3 main interpretations
Let A, B, C binary variables. Notate A ( A = 0) and A ( A = 1)1. Dependency rule AB C : must be = P( ABC ) P ( AB)P (C ) > 0 (positive dependence).2. Full probability model :1 = P( ABC ) P ( AB)P (C ),2 = P( A BC ) P ( A B)P (C ),3 = P(
ABC )
P (
AB)P (C ) and
4 = P( A BC ) P ( A B)P (C ).If 1 = 2 = 3 = 4 = 0, no dependenceOtherwise decide from i (i = 1 , . . . , 4) (with someequation)
SSPD tutorial KDD14 p. 9
8/10/2019 Kdd2014 Hamalainen Webb Discovery
10/116
Statistical dependence: 3 interpretations
3. Correlated set ABC Starting point mutual independence:P ( A = a, B = b, C = c) = P( A = a)P ( B = b)P (C = c) forall a, b, c {0, 1}different variations (and names)! e.g.
(i) P( ABC ) > P( A)P ( B)P (C ) (positive dependence) or(ii) P( A = a, B = b, C = c) P( A = a)P ( B = b)P (C = c)for some a, b, c {0, 1}
+ extra criteriaIn addition, conditional independence sometimes usefulP ( B = b, C = c
| A = a) = P( B = b
| A = a)P (C = c
| A = a)
SSPD tutorial KDD14 p. 10
8/10/2019 Kdd2014 Hamalainen Webb Discovery
11/116
Statistical dependence: no single correct denition
One of the most important problems in the philosophy of natural sciences is in addition to the well-known one regarding the essence of the concept of probability itself to make precise the premises which would make it possible to regard any given real events as independent .
A.N. Kolmogorov
SSPD tutorial KDD14 p. 11
8/10/2019 Kdd2014 Hamalainen Webb Discovery
12/116
Part I Contents
1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing
3.1 Approaches
3.2 Sampling models3.3 Multiple testing problem4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 12
8/10/2019 Kdd2014 Hamalainen Webb Discovery
13/116
1. Statistical dependency rules
Requirements for a genuine statistical dependency rule X A:
(i) Statistical dependence(ii) Statistically signicant
likely not due to chance(iii) Non-redundantnot a side-product of another dependencyadded value
Why?SSPD tutorial KDD14 p. 13
8/10/2019 Kdd2014 Hamalainen Webb Discovery
14/116
Example: Dependency rules on atherosclerosis
1. Statistical dependencies:smoking atherosclerosissports atherosclerosisABCA1-R219K atherosclerosis ?
2. Statistical signicance?
spruce sprout extract atherosclerosis ?dark chocolate atherosclerosis3. Redundancy?
stress, smoking atherosclerosissmoking, coffee atherosclerosis ?high cholesterol, sports atherosclerosis ?male, male pattern baldness
atherosclerosis ?
SSPD tutorial KDD14 p. 14
8/10/2019 Kdd2014 Hamalainen Webb Discovery
15/116
Part I Contents
1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing
3.1 Approaches
3.2 Sampling models3.3 Multiple testing problem4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 15
8/10/2019 Kdd2014 Hamalainen Webb Discovery
16/116
2. Variable-based vs. Value-based interpretation
Meaning of dependency rule X A
1. Variable-based: dependency between binary variables X and APositive dependency X A the same as X AEqually strong as negative dependency between X and A (or X and A)
2. Value-based: positive dependency between values
X =
1 and A =
1different from X A which may be weak!
SSPD tutorial KDD14 p. 16
8/10/2019 Kdd2014 Hamalainen Webb Discovery
17/116
Strength of statistical dependence
The most common measures:
1. Variable-based: leverage( X , A) = P( XA) P ( X )P ( A)
2. Value-based: lift ( X , A) =
P ( XA)P ( X )P ( A)
=P ( A| X )P ( A)
=P ( X | A)P ( X )
P ( A| X ) = condence of the ruleRemember: X
( X = 1) and A
( A = 1)
SSPD tutorial KDD14 p. 17
8/10/2019 Kdd2014 Hamalainen Webb Discovery
18/116
Contingency table
A A All X f r ( XA) = f r ( X
A) =
n[P ( X )P ( A) + ] n[P ( X )P ( A) ] f r ( X ) X f r ( XA) = f r ( X A) =
n[P (
X )P ( A)
] n[P
( X )P (
A) + ] f r (
X )
All f r ( A) f r ( A) nAll value combinations have the same
|
|!
depends on the value combination
f r ( X )=absolute frequency of X P ( X )=relative frequency of X
SSPD tutorial KDD14 p. 18
8/10/2019 Kdd2014 Hamalainen Webb Discovery
19/116
Example: The Apple problem
Variables: Taste, smell, colour, size, weight, variety, grower,. . .
(55 sweet + 45 bitter)100 apples
SSPD tutorial KDD14 p. 19
8/10/2019 Kdd2014 Hamalainen Webb Discovery
20/116
Rule RED SWEET ( Y A ) P ( A|Y ) = 0 .92 , P( A|Y ) = 1 .0 A=sweet, A=bitter = 0 .22 , = 1 .67 Y =red, Y =green
60 red apples(55 sweet)
Basket 1 Basket 2
40 green apples(all bitter) SSPD tutorial KDD14 p. 20
8/10/2019 Kdd2014 Hamalainen Webb Discovery
21/116
Rule RED and BIG SWEET ( X A ) P ( A| X ) = 1 .0, P( A| X ) = 0 .75 X =(red big) = 0 .18 , = 1 .82 X =(green small)
Basket 1
40 large red apples(all sweet)
40 green + 20 small red apples
Basket 2
(45 bitter)SSPD tutorial KDD14 p. 21
When the value-based interpretation could be
8/10/2019 Kdd2014 Hamalainen Webb Discovery
22/116
When the value based interpretation could be useful? Example
D=disease, X =allele combinationP ( X ) small and P( D| X ) = 1 .0
( X , D) = P( D)1 can be large
P ( D
| X )
P( D)
P ( D| X ) P( D)
( X , D) = P( X )P ( D) small .
D
X
Now dependency strong in the value-based but weak in thevariable-based interpretation!
(Usually, variable-based dependencies tend to be morereliable)SSPD tutorial KDD14 p. 22
8/10/2019 Kdd2014 Hamalainen Webb Discovery
23/116
Part I Contents
1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing
3.1 Approaches3.2 Sampling models3.3 Multiple testing problem
4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 23
8/10/2019 Kdd2014 Hamalainen Webb Discovery
24/116
3. Statistical signicance of X A
What is the probability of the observed or a strongerdependency, if X and A were independent? If smallprobability, then X A likely genuine (not due tochance).
Signicant X A is likely to hold in future (in similardata sets)How to estimate the probability??How small the probability should be?
Fisherian vs. Neyman-Pearsonian schoolsmultiple testing problem
SSPD tutorial KDD14 p. 24
8/10/2019 Kdd2014 Hamalainen Webb Discovery
25/116
3.1 Main approaches
SIGNIFICANCETESTING
EMPIRICALANALYTIC
FREQUENTIST BAYESIAN
different schools
different sampling modelsSSPD tutorial KDD14 p. 25
8/10/2019 Kdd2014 Hamalainen Webb Discovery
26/116
Analytic approaches
H 0: X and A independent (null hypothesis) H 1: X and A positively dependent (research hypothesis)
Frequentist: Calculate p = P(observed or stronger dependency | H 0)Bayesian:(i) Set P( H 0) and P( H 1)(ii) Calculate P(observed or stronger dependency | H 0) andP (observed or stronger dependency
| H
1)
(iii) Derive (with Bayes rule)P ( H 0|observed or stronger dependency) andP ( H 1
|observed or stronger dependency)
SSPD tutorial KDD14 p. 26
8/10/2019 Kdd2014 Hamalainen Webb Discovery
27/116
Analytic approaches: pros and cons
+ p-values relatively fast to calculate+ can be used as search criteria How to dene the distribution under H 0? (assumptions) If data not representative, the discoveries cannot be
generalized to the whole populationdescribe only the sample data or other similarsamplesrandom samples not always possible (innitepopulation)
SSPD tutorial KDD14 p. 27
Note: Differences between Fisherian vs.
8/10/2019 Kdd2014 Hamalainen Webb Discovery
28/116
Neyman-Pearsonian schools
signicance testing vs. hypothesis testingrole of nominal p-values (thresholds 0 .05 , 0 .01)many textbooks represent a hybrid approach
see Hubbard & Bayarri
SSPD tutorial KDD14 p. 28
8/10/2019 Kdd2014 Hamalainen Webb Discovery
29/116
Empirical approach (randomization testing)
Generate random data sets according to H 0 and testhow many of them contain the observed or strongerdependency X A.(i) Fix a permutation scheme (how to express H 0 + which
properties of the original data should hold)(ii) Generate a random subset {d 1 , . . . , d b} of all possiblepermutations(iii)
p = |{d i|contains observed or stronger dependency }|b
SSPD tutorial KDD14 p. 29
8/10/2019 Kdd2014 Hamalainen Webb Discovery
30/116
Empirical approach: pros and cons
+ no assumptions on any underlying parametricdistribution
+ can test null hypotheses for which no closed form testexists+ offers an approach to multiple testing problem
Later
+ data doesnt have to be a random sample discoveries hold for the whole population ...
... dened by the permutation scheme often not clear (but critical), how to permutate data! computationally heavy (b: efciency vs. quality
trade-off) How to apply during search??SSPD tutorial KDD14 p. 30
8/10/2019 Kdd2014 Hamalainen Webb Discovery
31/116
Note: Randomization test vs. Fishers exact test
When testing signicance of X A
a natural permutation scheme xes N =
n, N X =
f r ( X ), N A = f r ( A)randomization test generates some randomcontingency tables with these constraintsfull permutation test = Fishers exact test studies allcontingency tables
faster to compute (analytically)produces more reliable results
No need for randomization tests, here!
SSPD tutorial KDD14 p. 31
8/10/2019 Kdd2014 Hamalainen Webb Discovery
32/116
Part I Contents
1. Statistical dependency rules2. Variable- and value-based interpretations3. Statistical signicance testing
3.1 Approaches3.2 Sampling models
variable-basedvalue-based
3.3 Multiple testing problem4. Redundancy and signicance of improvement5. Search strategies
SSPD tutorial KDD14 p. 32
8/10/2019 Kdd2014 Hamalainen Webb Discovery
33/116
3.2 Sampling models
= dening the distribution under H 0
What do we assume xed?
Variable-based dependencies: classical samplingmodels (Statistics)
Value-based dependencies: several suggestions (Datamining)
SSPD tutorial KDD14 p. 33
8/10/2019 Kdd2014 Hamalainen Webb Discovery
34/116
Basic idea
Given a sampling model MT =set of all possible contingency tables.
1. Dene probability P(T i|M) for contingency tables T i T2. Dene an extremeness relation T i T j
T i contains at least as strong dependency X Aas T j doesdepends on the strength measure, e.g.
(var-based) or (val-based)3. Calculate p = T i T 0 P(T i|M)(T 0=our table)
SSPD tutorial KDD14 p. 34
8/10/2019 Kdd2014 Hamalainen Webb Discovery
35/116
Sampling models for variable-based dependencies
3 basic models:
1. Multinomial ( N = n xed)2. Double binomial ( N = n, N X = f r ( X ) xed)3. Hypergeometric ( Fishers exact test)
( N = n, N A = f r ( A), N X = f r ( X ) xed)
+ asymptotic measures (like 2)
SSPD tutorial KDD14 p. 35
8/10/2019 Kdd2014 Hamalainen Webb Discovery
36/116
Multinomial model
Independence assumption: In the innite urn, p XA = p X p A.( p XA=probability of red sweet apples)
n apples
a sample of
INFINITE URN
SSPD tutorial KDD14 p. 36
8/10/2019 Kdd2014 Hamalainen Webb Discovery
37/116
Multinomial model
T i is dened by random variables N XA, N X A, N XA, N X A
P ( N XA, N X A, N XA, N X A|n, p X , p A) = n
N XA, N X A, N XA, N X A p N X X (1
p X )n N X p N A A (1
p A)n N A.
p =T i T 0
P ( N XA, N X A, N XA, N X A|n, p X , p A) p X and p A can be estimated from the data
SSPD tutorial KDD14 p. 37
8/10/2019 Kdd2014 Hamalainen Webb Discovery
38/116
Double binomial model
Independence assumption: p A| X = p A = p A| X TWO INFINITE URNS:
fr( X) green apples
a sample ofa sample of
fr(X) red apples
SSPD tutorial KDD14 p. 38
8/10/2019 Kdd2014 Hamalainen Webb Discovery
39/116
Double binomial model
Probability of red sweet apples:
P ( N XA| f r ( X ), p A) = f r ( X ) N XA p N XA A (1 p A) f r ( X ) N XA
Probability of green sweet apples:
P ( N
XA
| f r (
X ), p A) =
f r ( X ) N XA
p N XA A (1
p A) f r ( X ) N XA
SSPD tutorial KDD14 p. 39
8/10/2019 Kdd2014 Hamalainen Webb Discovery
40/116
Double binomial model
T i is dened by variables N XA and N XA.
P ( N XA, N XA|n, f r ( X ), f r ( X ), p A) = f r ( X )
N XA f r ( X ) N XA
p N A A (1
p A)n N A
p =T i T 0
P ( N XA, N XA|n, f r ( X ), f r ( X ), p A)
SSPD tutorial KDD14 p. 40
H i d l (Fi h )
8/10/2019 Kdd2014 Hamalainen Webb Discovery
41/116
Hypergeometric model (Fishers exact test)
fr(A)n )(
How many other similar urns have
at least as strong dependency as ours?
ALL
SIMILAR URNS
OUR URN n apples
fr(A) sweet + fr( A) bitter
fr(X) red + fr( X) green
SSPD tutorial KDD14 p. 41
Lik i f ll i
8/10/2019 Kdd2014 Hamalainen Webb Discovery
42/116
Like in a full permutation test
A A A A
A A A
A A A
A A A
A A A A
A A A A
A A A A A A A
1 2 3 4 5 6 987 10
X X
A A
A A A
A A A
A
A A A
u r n
1
u r n
2
u r n
1 2 0
fr(A)=3
fr(X)=6
n=10
SSPD tutorial KDD14 p. 42
H t i d l (Fi h t t t)
8/10/2019 Kdd2014 Hamalainen Webb Discovery
43/116
Hypergeometric model (Fishers exact test)
The number of all possible similar urns (xed N = n, N X = f r ( X ) and N A = f r ( A)) is
f r ( A)
i= 0
f r ( X )i f r ( X ) f r ( A) i =
n f r ( A)
Now (T i T 0) ( N XA f r ( XA)). Easy!
pF = i= 0
f r ( X ) f r ( XA)+ i
f r (
X )
f r ( X A)+ i n
f r ( A)
SSPD tutorial KDD14 p. 43
E l C i f l
8/10/2019 Kdd2014 Hamalainen Webb Discovery
44/116
Example: Comparison of p-values
0
0.1
0.2
0.3
0.4
0.5
0.6
15 17 19 21 23 25 27 29
p
fr(XA)
fr(X)=50, fr(A)=30, n=100
Fisherdouble binmultinom
SSPD tutorial KDD14 p. 44
Example: Comparison of values
8/10/2019 Kdd2014 Hamalainen Webb Discovery
45/116
Example: Comparison of p-values
0
2e-05
4e-05
6e-05
8e-05
0.0001
200 250
p
fr(XA)
fr(X)=300, fr(A)=500, n=1000
Fisherdouble binmultinom
SSPD tutorial KDD14 p. 45
Example: Comparison of p values
8/10/2019 Kdd2014 Hamalainen Webb Discovery
46/116
Example: Comparison of p-values
f r XA multi- double Fishernomial binomial (hyperg.)
180 1.7e-05 1.8e-05 2.2e-05200 2.3e-12 2.2e-12 3.0e-12220 1.4e-22 7.3e-23 1.1e-22240 2.9e-36 3.0e-37 4.4e-37260 1.5e-53 4.2e-56 3.5e-56280 1.3e-74 2.9e-80 1.6e-81300 9.3e-100 3.5e-111 2.5e-119
SSPD tutorial KDD14 p. 46
Asymptotic measures
8/10/2019 Kdd2014 Hamalainen Webb Discovery
47/116
Asymptotic measures
Idea: p-values are estimated indirectly
1. Select some nicely behaving measure M e.g. M follows asymptotically the normal or the 2distribution
2. Estimate P( M val), where M =
val in our dataEasy! (look at statistical tables)But the accuracy can be poor
SSPD tutorial KDD14 p. 47
The 2 measure
8/10/2019 Kdd2014 Hamalainen Webb Discovery
48/116
The -measure
2 =1
i= 0
1
j= 0
n(P ( X = i, A = j) P ( X = i)P ( A = j))2P ( X = i)P ( A = j)
=n(P ( X , A) P ( X )P ( A))2P ( X )P ( X )P ( A)P ( A)
=n2
P ( X )P ( X )P ( A)P ( A)very sensitive to underlying assumptions!all P( X = i)P ( A = j) should be sufciently largethe corresponding hypergeometric distributionshouldnt be too skewed
SSPD tutorial KDD14 p. 48
Mutual information
8/10/2019 Kdd2014 Hamalainen Webb Discovery
49/116
Mutual information
MI =
log P( XA)P ( XA)P ( X A)P ( X A)P ( XA)P ( XA)P ( X A)P ( X A)
P ( X )P ( X )P ( X )P ( X )P ( A)P ( A)P ( A)P ( A)
2n MI =log likelihood ratiofollows asymptotically the 2-distributionusually gives more reliable results than the 2-measure
SSPD tutorial KDD14 p. 49
Comparison: Sampling models for variable-based dependencies
8/10/2019 Kdd2014 Hamalainen Webb Discovery
50/116
dependencies
Multinomial: impractical but useful for theoreticalresults
Double binomial: not exchangeable p( X A) p( A X ) (in general)Hypergeometric (Fishers exact test): recommended,
enables efcient search, reliable resultsAsymptotic: often sensitive to underlying assumptions 2 very sensitive, not recommended MI reliable, enables efcient search, approximates pF
SSPD tutorial KDD14 p. 50
Sampling models for value-based dependencies
8/10/2019 Kdd2014 Hamalainen Webb Discovery
51/116
Sampling models for value based dependencies
Main choices:
1. Classical sampling models but with a differentextremeness relation
use lift to dene a stronger dependencyMultinomial and Double binomial: can differ muchfrom var-basedHypergeometric: leads to Fishers exact test, again!
2. Binomial models + corresponding asymptoticmeasures
SSPD tutorial KDD14 p. 51
8/10/2019 Kdd2014 Hamalainen Webb Discovery
52/116
Binomial model 1 (classical binomial test)
8/10/2019 Kdd2014 Hamalainen Webb Discovery
53/116
Binomial model 1 (classical binomial test)
Probability of getting exactly N XA sweet red apples andn N XA green or bitter apples is
p( N XA|n, p XA) = n
N XA( p XA) N XA(1 p XA)n N XA
p( N XA f r ( XA)|n, p XA) =n
i= f r ( XA)
ni
( p XA)i(1 p XA)ni
(or i = f r ( XA), . . . , min{ f r ( X ), f r ( A)})Use estimate p XA = P( X )P ( A)Note: N X and N A unxed
SSPD tutorial KDD14 p. 53
Corresponding asymptotic measure
8/10/2019 Kdd2014 Hamalainen Webb Discovery
54/116
p g y p
z-score:
z1( X A) = f r ( X , A) = f r ( X , A) nP ( X )P ( A) nP ( X )P ( A)(1 P ( X )P ( A))=
n( X , A) P ( X )P ( A)(1 P ( X )P ( A))
= nP ( XA)( ( X , A)
1)
( X , A) P ( X , A) .follows asymptotically the normal distribution
SSPD tutorial KDD14 p. 54
Binomial model 2 (suggested in DM)
8/10/2019 Kdd2014 Hamalainen Webb Discovery
55/116
( gg )
Like the double binomial model, but forget the other urn!
fr( X) green apples
a sample ofa sample of
fr(X) red apples
CONSIDER ONE FROM TWO INFINITE URNS:
SSPD tutorial KDD14 p. 55
Binomial model 2
8/10/2019 Kdd2014 Hamalainen Webb Discovery
56/116
Binomial model 2
p( N XA
f r ( XA)
| f r ( X ), P ( A)) =
f r ( X )
i= f r ( XA)
f r ( X )
iP ( A)iP (
A) f r ( X )i
Corresponding z-score:
z2 = f r ( XA)
=
f r ( XA) f r ( X )P ( A)
f r ( X )P ( A)P (
A)
= n( X , A)
P ( X )P ( A)P ( A)= f r ( X )(P ( A| X ) P ( A)) P ( A)P ( A)
SSPD tutorial KDD14 p. 56
J-measure
8/10/2019 Kdd2014 Hamalainen Webb Discovery
57/116
J measure
one urn version of MI
J = P( XA)log P( XA)P ( X )P ( A)
+ P ( X A)log P( X A)P ( X )P ( A)
SSPD tutorial KDD14 p. 57
Example: Comparison of p-values
8/10/2019 Kdd2014 Hamalainen Webb Discovery
58/116
0
0.1
0.2
0.3
0.4
0.5
0.6
19 21 23 25
p
fr(XA)
fr(X)=25, fr(A)=75, n=100
bin1bin2
Fisherdouble binmultinom
0
0.1
0.2
0.3
0.4
0.5
0.6
19 21 23 25
p
fr(XA)
fr(X)=75, fr(A)=25, n=100
bin1bin2
Fisherdouble binmultinom
SSPD tutorial KDD14 p. 58
Comparison: Sampling models for value-based dependencies
8/10/2019 Kdd2014 Hamalainen Webb Discovery
59/116
Multinomial, Hypergeometric, classical Binomial + its z-score: p( X A) = P( A X )Double binomial, alternative Binomial + its z-score: p( X A) P( A X ) (in general)The alternative Binomial, its z-score and J can
disagree with the other measures (only the X -urn vs.whole data) z-score easy to integrate into search, but may be
unreliable for infrequent patterns (classical)Binomial test in post-pruning improves quality!
SSPD tutorial KDD14 p. 59
Part I Contents
8/10/2019 Kdd2014 Hamalainen Webb Discovery
60/116
1. Statistical dependency rules2. Variable- and value-based interpretations
3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem
4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 60
3.3 Multiple testing problem
8/10/2019 Kdd2014 Hamalainen Webb Discovery
61/116
The more patterns we test, the more spurious patternswe are likely to accept.
If threshold = 0 .05 , there is 5% probability that aspurious dependency passes the test.
If we test 10 000 rules, we are likely to accept 500spurious rules!
SSPD tutorial KDD14 p. 61
Solutions to Multiple testing problem
8/10/2019 Kdd2014 Hamalainen Webb Discovery
62/116
1. Direct adjustment approach : adjust (stricterthresholds)
easiest to integrate into the search2. Holdout approach : Save part of the data for testing Webb3. Randomization test approaches : Estimate the overallsignicance of all discoveries or adjust the individual
p-values empirically
e.g. Gionis et al., Hanhijrvi et al.
SSPD tutorial KDD14 p. 62
Contingency table for m signicance tests
8/10/2019 Kdd2014 Hamalainen Webb Discovery
63/116
spurious rule genuine rule All H 0 true H 1 true
declared V S Rsignicant false positives true positivesdeclared U T m
R
insignicant true negatives false negativesAll m0 m m0 m
SSPD tutorial KDD14 p. 63
Direct adjustment: Two approaches
8/10/2019 Kdd2014 Hamalainen Webb Discovery
64/116
(i) Control familywise errorrate = probablity of accept-ing at least one false dis-covery
FWER = P(V 1)
(ii) Control false discoveryrate = expected proportionof false discoveries
FDR = E V R
spurious rule genuine rule Alldecl. sign. V S R
decl. insign U T m RAll m0 m m0 m
SSPD tutorial KDD14 p. 64
(i) Control familywise error rate FWER
8/10/2019 Kdd2014 Hamalainen Webb Discovery
65/116
Decide = FWER and calculate a new stricter threhold .
If tests are mutually independent: = 1
(1
)m
idk correction: = 1 (1 )
1m
If they are not independent: m
Bonferroni correction : = m
conservative (may lose genuine discoveries)How to estimate m?
may be explicit and implicit testing during searchHolm-Bonferroni method more powerful
but less suitable for the search (all p-values shouldbe known, rst)SSPD tutorial KDD14 p. 65
(ii) Control false discovery rate FDR
8/10/2019 Kdd2014 Hamalainen Webb Discovery
66/116
BenjaminiHochbergYekutieli procedure
1. Decide q = FDR2. Order patterns r i by their p-values
Result r 1 , . . . , r m such that p1 . . . pm3. Search the largest k such that pk
k
q
mc(m)if tests mutually independent or positivelydependent, c(m) = 1
otherwise c(m) = mi= 1 1i ln( m) + 0.584. Save patterns r 1 , . . . , r k (as signicant) and rejectr k + 1, . . . , r m
SSPD tutorial KDD14 p. 66
Hold-out approach
8/10/2019 Kdd2014 Hamalainen Webb Discovery
67/116
Powerful because m is quite small!
Data
Explor-atory
Holdout
PatternDiscovery
Patterns
Statistical
EvaluationSound
PatternsM. T.
correctionAny
hypothesis
test
Limitedtype-2
error
SSPD tutorial KDD14 p. 67
Randomization test approaches
8/10/2019 Kdd2014 Hamalainen Webb Discovery
68/116
1. Estimate the overall signicance of discoveries at oncee.g. What is the probability to nd K 0 dependency
rules whose strength is at least min M ?Empirical p-value
pemp = |{d i
| K i
K 0
}|+ 1
b + 1
d 0 original setd 1 , . . . , d b random setsK 1 , . . . , K b numbers of discovered patterns from set d i
Gionis et al. SSPD tutorial KDD14 p. 68
Randomization test approaches (cont.)
8/10/2019 Kdd2014 Hamalainen Webb Discovery
69/116
2. Use randomization tests to correct individual p-valuese.g., How many sets contained better rules than X
A
? p = |{d i|(Si )(min p(Y B |d i) p( X A |d 0)}|
b + 1 ,
d 0 original setd 1 , . . . , d b random sets
Si=set of patterns returned from set d i
Hanhijrvi
SSPD tutorial KDD14 p. 69
Randomization test approaches
8/10/2019 Kdd2014 Hamalainen Webb Discovery
70/116
+ dependencies between patterns not a problem more powerful control over FWER+ one can impose extra constraints (e.g. that a certainpattern holds with a given frequency and condence) most techniques assume subset pivotality the
complete hypothesis and all subsets of true nullhypotheses have the same distribution of the measurestatistic
Remember also points mentioned in the single hypothesistesting
SSPD tutorial KDD14 p. 70
Part I Contents
8/10/2019 Kdd2014 Hamalainen Webb Discovery
71/116
1. Statistical dependency rules2. Variable- and value-based interpretations
3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem
4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 71
4. Redundancy and signicance of improvement
8/10/2019 Kdd2014 Hamalainen Webb Discovery
72/116
When X A is redundant with respect to Y A(Y X )? Improves it signicantly?Examples of redundant dependency rules:
smoking, coffee
atherosclerosis
coffee has no effect on smoking atherosclerosis high cholesterol, sports atherosclerosis sports makes the dependency only weakermale, male pattern baldness atherosclerosis adding male hardly any signicant improvement
SSPD tutorial KDD14 p. 72
Redundancy and signicance of improvement
8/10/2019 Kdd2014 Hamalainen Webb Discovery
73/116
Value-based: X A is productive if P( A| X ) > P( A|Y ) forall Y X Variable-based: X A is redundant if there is Y X such that M (Y A) is better than M ( X A) with thegiven goodness measure M
X
A is non-redundant if for all Y X M ( X
A) is
better than M (Y A)When the improvement is signicant?
SSPD tutorial KDD14 p. 73
Value-based: Signicance of productivity
8/10/2019 Kdd2014 Hamalainen Webb Discovery
74/116
Hypergeometric model:
p(YQ A|Y A) =i
f r (YQ)
f r (YQA)+
i f r (Y Q)
f r (Y QA)i f r (Y ) f r (YA)
probability of the observed or a stronger conditionaldependency Q A, given Y , in a value-based model.
also asymptotic measures ( 2, MI )
SSPD tutorial KDD14 p. 74
Apple problem: value-based
8/10/2019 Kdd2014 Hamalainen Webb Discovery
75/116
p(YQ A|Y A) = 0 .0029 Y =red, Q=large
Basket 1 Basket 2
40 green apples40 large red apples
(all sweet) (all bitter)
20 small
red apples
(15 sweet)
SSPD tutorial KDD14 p. 75
Apple problem: variable-based?
8/10/2019 Kdd2014 Hamalainen Webb Discovery
76/116
p(Y A|(YQ) A) = 2 .9e 10
8/10/2019 Kdd2014 Hamalainen Webb Discovery
77/116
Part I Contents
8/10/2019 Kdd2014 Hamalainen Webb Discovery
78/116
1. Statistical dependency rules2. Variable- and value-based interpretations
3. Statistical signicance testing3.1 Approaches3.2 Sampling models3.3 Multiple testing problem
4. Redundancy and signicance of improvement
5. Search strategies
SSPD tutorial KDD14 p. 78
5. Search strategies
8/10/2019 Kdd2014 Hamalainen Webb Discovery
79/116
1. Search for the strongest rules (with , etc.) that passthe signicance test for productivity
MagnumOpus (Webb 2005)
2. Search for the most signicant non-redundant rules(with Fishers p etc.)
Kingsher (Hmlinen 2012)
3. Search for frequent sets, construct association rules,prune with statistical measures, and lternon-redundant rules??
No way!closed sets? redundancy problemtheir minimal generators?
SSPD tutorial KDD14 p. 79
Main problem: non-monotonicity of statistical dependence
8/10/2019 Kdd2014 Hamalainen Webb Discovery
80/116
AB C can express a signicant dependency even if A and C as well as B and C mutually independentIn the worst case, the only signicant dependencyinvolves all attributes A1 . . . Ak (e.g. A1 . . . Ak 1 Ak )
1) A greedy heuristic does not work!
2) Studying only simplest dependency rules does not
reveal everything!
ABCA1-R219K alzheimerABCA1-R219K, female alzheimer
SSPD tutorial KDD14 p. 80
End of Part I
8/10/2019 Kdd2014 Hamalainen Webb Discovery
81/116
Questions?
SSPD tutorial KDD14 p. 81 . . . .
8/10/2019 Kdd2014 Hamalainen Webb Discovery
82/116
Statistically sound pattern discoveryPart II: Itemsets
Wilhelmiina HmlinenGeoff Webb
http://www.cs.joensuu.fi/~whamalai/ecmlpkdd13/sspdtutorial.html
Overview
8/10/2019 Kdd2014 Hamalainen Webb Discovery
83/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Most association discovery techniques find rules Association is often conceived as a relationship
between two parts so rules provide an intuitive representation
However, when many items are all mutually
interdependent, a plethora of rules results Itemsets can provide a more intuitiverepresentation in many contexts
However, it is less obvious how to identifypotentially interesting itemsets than potentiallyinteresting rules
Rules
8/10/2019 Kdd2014 Hamalainen Webb Discovery
84/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
bruises?=true ring-type=pendant[Coverage=3376; Support=3184; Lift=1.93; p
8/10/2019 Kdd2014 Hamalainen Webb Discovery
85/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
stalk-surface-above-ring=smooth & ring-type=pendant stalk-surface-below-ring=smooth[Coverage=3664; Support=3328; Lift=1.49; p=3.05E-072]bruises?=true stalk-surface-above-ring=smooth[Coverage=3376; Support=3232; Lift=1.50; p
8/10/2019 Kdd2014 Hamalainen Webb Discovery
86/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
bruises?=true,stalk-surface-above-ring=smooth,stalk-surface-below-ring=smooth,ring-type=pendant[Coverage=2776; Leverage=0.1143; p
8/10/2019 Kdd2014 Hamalainen Webb Discovery
87/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
An association between two items will berepresented by two rules
three items nine rules four items twenty-eight rules ...
It may not be apparent that all the resulting rulesrepresent a single multi-item association
Reference: Webb 2011
But how to find itemsets?
8/10/2019 Kdd2014 Hamalainen Webb Discovery
88/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Association is conceived as deviation fromindependence between two parts
Itemsets may have many parts
Main approaches
8/10/2019 Kdd2014 Hamalainen Webb Discovery
89/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Consider all partitions Randomisation testing
Incremental mining Information theoretic
Main approaches
8/10/2019 Kdd2014 Hamalainen Webb Discovery
90/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Consider all partitions Randomisation testing
Incremental mining models
Information theoretic mainly models not statistical
All partitions
M l b d f i l
8/10/2019 Kdd2014 Hamalainen Webb Discovery
91/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Most rule-based measures of interest relateto difference between the joint frequency andexpected frequency under independencebetween antecedent and consequent
However itemsets do not have an antecedentand consequent
Does not work to consider deviation fromexpectation if all items are independent ofeach other If P( x , y ) P(x )P(y ) and P(x , y , z ) =
P(x , y )P(z ) then P(x , y , z ) P(x )P(y )P(z )
AttendingKDD14, InNewYork, AgeIsEvenReferences: Webb, 2010; Webb, & Vreeken, 2014
Productive itemsets
A i i lik l b i i if i
8/10/2019 Kdd2014 Hamalainen Webb Discovery
92/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
An itemset is unlikely to be interesting if itsfrequency can be predicted by assuming
independence between any partition thereof Pregnant , Oedema , AgeIsEven Pregnant , Oedema
AgeIsEven Male , PoorEyesight , ProstateCancer , Glasses
Male , ProstateCancer PoorEyesight, Glasses
References: Webb, 2010; Webb, & Vreeken, 2014
Measuring degree of positive association
M d f i i i i
8/10/2019 Kdd2014 Hamalainen Webb Discovery
93/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Measure degree of positive association asdeviation from the maximum of the expectedfrequency under an assumption of
independence between any partition of theitemseteg
References: Webb, 2010; Webb, & Vreeken, 2014
Statistical test for productivity
N ll h th i
8/10/2019 Kdd2014 Hamalainen Webb Discovery
94/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Null hypothesis: , Use a Fisher exact test on every partition Equivalent to testing that every rule is
significant
No correction for multiple testingnull hypothesis only rejected ifcorresponding null hypothesis isrejected for every partition
this increases the risk of Type 2 ratherthan Type 1 error
References: Webb, 2010; Webb, & Vreeken, 2014
Redundancy
If it X i f
8/10/2019 Kdd2014 Hamalainen Webb Discovery
95/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
If item X is a necessary consequence ofanother set of items Y then { X } Y should beassociated with everything with which Y is
associated. Eg pregnant female and pregnant oedema , female , pregnant , oedema is not likely to be
interesting if pregnant , oedema is known Discard itemsets I where X I , Y X , P(Y ) =
P(X ) I = { female , pregnant, oedema } X = { female , pregnant } Y = {pregnant }
Note, no statistical testReferences: Webb, 2010; Webb, & Vreeken, 2014
Independent productivity
Suppose that heat oxygen and fuel are all
8/10/2019 Kdd2014 Hamalainen Webb Discovery
96/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Suppose that heat , oxygen and fuel are allrequired for fire .
heat , oxygen and fuel are associated with fire So too are : heat , oxygen
heat , fuel oxygen, fuel heat
oxygen fuel
But these six are potentially misleading giventhe full association References: Webb, 2010; Webb, & Vreeken, 2014
Independent productivity
An itemset is unlikely to be interesting if its
8/10/2019 Kdd2014 Hamalainen Webb Discovery
97/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
An itemset is unlikely to be interesting if itsfrequency can be predicted from the
frequency of its specialisations If both X and X Y are non-redundantand productive then X is only likely to beinteresting if it holds with respect to Y .
,
P is the set of all non-redundant andproductive patterns
References: Webb, 2010; Webb, & Vreeken, 2014
Assessing independent productivity
Given fuel oxygen
8/10/2019 Kdd2014 Hamalainen Webb Discovery
98/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Given fuel, oxygen,heat, fire
to assess fuel,oxygen, fire check whether
association holds indata without heat
References: Webb, 2010
Assessing independent productivity
Given fuel oxygen
8/10/2019 Kdd2014 Hamalainen Webb Discovery
99/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Given fuel, oxygen,heat, fire
to assess fuel,oxygen, fire check whether
association holds indata without heat
References: Webb, 2010
Assessing independent productivity
Given fuel oxygen
8/10/2019 Kdd2014 Hamalainen Webb Discovery
100/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Given fuel, oxygen,heat, fire
to assess fuel,oxygen check whether
association holds indata without heat or
fire
References: Webb, 2010
Assessing independent productivity
Given fuel oxygen
8/10/2019 Kdd2014 Hamalainen Webb Discovery
101/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Given fuel, oxygen,heat, fire
to assess fuel,oxygen check whether
association holds indata without heat or
fire
References: Webb, 2010
Mushroom
118 items 8124 examples
8/10/2019 Kdd2014 Hamalainen Webb Discovery
102/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
118 items, 8124 examples 9676 non-redundant productive itemsets ( =0.05)
3164 are not independently productiveedible=e, odor=n[support=3408, leverage=0.194559, p
8/10/2019 Kdd2014 Hamalainen Webb Discovery
103/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
We typically want to discover associationsthat generalise beyond the given data
The massive search involved in associationdiscovery results in a massive risk of falsediscoveries associations that appear to hold in the
sample but do not hold in the generatingprocess
Reference: Webb 2007
Bonferroni correction for multiple testing
Divide critical value by size of search spaceEg retail
8/10/2019 Kdd2014 Hamalainen Webb Discovery
104/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
v de c t ca va ue by s e o sea c space Eg retail 16470 items Critical value = 0.05/2 16470 < 5E-4000
Use layered critical values Sum of all critical values cannot exceed familywise criticalvalue
Allocate different familywise critical values to different
itemset sizes | | divided by all itemsets of size | I | The critical value for itemset I is thus
| | | |
Eg retail 16470 items Critical value for itemsets of size 2 = . = 1.84E-10
Reference: Webb 2007, 2008
Randomization testing
Randomization testing can be used to find
8/10/2019 Kdd2014 Hamalainen Webb Discovery
105/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Randomization testing can be used to findsignificant itemsets.
All the advantages and disadvantagesenumerated for dependency rules. Not possible to efficiently test for
productivity or independent productivityusing randomisation testing.
References: Megiddo & Srikant, 1998; Gionis et al 2007
Incremental and interactive mining
Iteratively find the most informative itemset
8/10/2019 Kdd2014 Hamalainen Webb Discovery
106/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
Iteratively find the most informative itemsetrelative to those found so far
May have human-in-the-loop Aim to model the full joint distribution will tend to develop more succinct
collections of itemsets than self-sufficient itemsets
will necessarily choose between one ofmany potential such collections.
References: Hanhijarvi et al 2009; Lijffijt et al 2014
Belgian lottery
{43, 44, 45}
8/10/2019 Kdd2014 Hamalainen Webb Discovery
107/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
{43, 44, 45} 902 frequent itemsets (min sup = 1%)
All are closed and all are non-derivable KRIMP selects 232 itemsets. MTV selects no itemsets.
DOCWORD.NIPS Top-25 leverageitemsets
kaufmann,morgani d i i
top,bottom
8/10/2019 Kdd2014 Hamalainen Webb Discovery
108/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
trained,trainingreport,technical
san,mateomit,cambridgedescent,gradient
mateo,morganimage,imagessan,mateo,morgan
mit,pressgrant,supportedmorgan,advances
springer,verlag
san,morgankaufmann,mateo
san,kaufmann,mateodistribution,probabilityconference,international
conference,proceedinghidden,trainedkaufmann,mateo,morgan
learn,learnedsan,kaufmann,mateo,morganhidden,training
Reference: Webb, & Vreeken, 2014
WORDOC.NIPS Top-25 leverage rules
kaufmann morganmorgan kaufmann abstract,neural,morgan kaufmannresult,kaufmann morgan
8/10/2019 Kdd2014 Hamalainen Webb Discovery
109/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
morgan kaufmannabstract,morgan kaufmannabstract,kaufmann morgan
references,morgan kaufmannreferences,kaufmann morganabstract,references,morgan
kaufmann
abstract,references,kaufmannmorgansystem,morgan kaufmannsystem,kaufmann morgan
neural,kaufmann morganneural,morgan kaufmannabstract,system,kaufmann morganabstract,system,morgan kaufmann
abstract,neural,kaufmann morgan
result,kaufmann morganresult,morgan kaufmannreferences,system,morgan kaufmann
neural,references,kaufmann morganneural,references,morgan kaufmannabstract,references,system,morgan
kaufmann
abstract,references,system,kaufmannmorganabstract,result,kaufmann morganabstract,neural,references,kaufmann
morgan
Reference: Webb, & Vreeken, 2014
WORDOC.NIPS Top-25 frequent (closed)itemsets
abstract,referencesb t t lt
references,systemf t
8/10/2019 Kdd2014 Hamalainen Webb Discovery
110/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
abstract,resultreferences,result
abstract,functionabstract,references,resultabstract,neural
abstract,systemfunction,referencesabstract,set
abstract,function,referencesneural,referencesfunction,result
abstract,neural,references
references,setneural,result
abstract,function,resultabstract,introductionabstract,references,system
abstract,references,setresult,systemresult,set
abstract,neural,resultabstract,networkabstract,number
Reference: Webb, & Vreeken, 2014
WORDOC.NIPS Top-25 lift self-sufficientitemsets
duane,leapfrogamericana periplaneta
ekman,hagerlpnn petek
8/10/2019 Kdd2014 Hamalainen Webb Discovery
111/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
americana,periplanetaalessandro,sperduti
crippa,ghisellichorale,harmonizationiiiiiiii,iiiiiiiiiii
artery,coronarykerszberg,linsternuno,vasconcelos
brasher,krugmizumori,postsubiculumimplantable,pickard
zag,zig
lpnn,petekpetek,schmidbauer
chorale,harmonetdeerwester,dumaisharmonet,harmonization
fodor,pylyshynjeremy,bonetornstein,uhlenbeck
nakashima,satoshitaube,postsubiculumiceg,implantable
Closure of duane,leapfrog(all words in all 4 documents)
abstract, according, algorithm, approach, approximation,bayesian, carlo, case, cases, computation, computer, defined,
8/10/2019 Kdd2014 Hamalainen Webb Discovery
112/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
y , , , , p , p , ,department, discarded, distribution, duane, dynamic,dynamical, energy, equation, error, estimate, exp, form,found, framework, function, gaussian, general, gradient,hamiltonian, hidden, hybrid, input, integral, iteration, keeping,kinetic, large, leapfrog, learning, letter, level, linear, log, low,
mackay, marginal, mean, method, metropolis, model,momentum, monte, neal, network, neural, noise, non, number,obtained, output, parameter, performance, phase, physic,
point, posterior, prediction, prior, probability, problem,references, rejection, required, result, run, sample, sampling,science, set, simulating, simulation, small, space, squared,step, system, task, term, test, training, uniformly, unit,university, values, vol, weight, zero
Itemsets Summary
More attention has been paid to finding associations efficientlythan to which ones to find
8/10/2019 Kdd2014 Hamalainen Webb Discovery
113/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
While we cannot be certain what will be interesting, the followingprobably won't frequency explained by independence between a partition frequency explained by specialisations
Statistical testing is essential Itemsets often provide a much more succinct summary of
association than rules rules provide more fine grained detail rules useful if there is a specific item of interest
Self-Sufficient Itemsets
capture all of these principles support comprehensible explanations for why itemsets arerejected
can be discovered efficiently often find small sets of patterns
(mushroom: 9,676, retail: 13,663) Reference: Novak et al 2009
Software
OPUS Miner can be downloaded from :h // h d / bb/
8/10/2019 Kdd2014 Hamalainen Webb Discovery
114/116
intro itemsets productivity redundancy independent productivity multiple testing randomisation examples conclusion
http://www.csse.monash.edu.au/~webb/Software/opus_miner.tgz
Statistically sound pattern discoveryCurrent state and future challenges
Efficient and reliable algorithms for binary andcategorical data
8/10/2019 Kdd2014 Hamalainen Webb Discovery
115/116
g branch-and-bound style no minimum frequencies (or harmless like
5/n) Numeric variables
impact rules allow numerical consequences(Webb) main challenge : Numerical variables in thecondition part of rule and in itemset How to integrate an optimal discretization
into search? How to detect all redundant patterns? Long patterns
End!
Questions?
8/10/2019 Kdd2014 Hamalainen Webb Discovery
116/116
All material:http://cs.joensuu.fi/pages/whamalai/kdd14/ssp
dtutorial.html