+ All Categories
Home > Documents > So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo...

So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo...

Date post: 21-Jan-2016
Category:
Upload: sara-cobb
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
35
So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012 on Aug.13.
Transcript
Page 1: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

So HiraiThe University of Tokyo

Currently NTT DATA Corp.

Kenji YamanishiThe University of Tokyo

WITMSE 2012, Amsterdam, Netherland

Presented at KDD 2012  on Aug.13.

Page 2: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Contents

Problem SettingSignificanceProposed Algorithm  :  Sequential Dynamic Model Selection with NML(normalized maximum likelihood)

codingHow to compute the NML coding for Gaussian

mixturesExperimental Results Marketing Applications Conclusion

2

Page 3: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Problem Setting (1/2)

3

TimeChange Change

Clustering change detection---Tracking changes of clustering structures in a

sequential setting to detect novelty in dataEx. Market analysis

The structure of customer groups changes over time

Detect changes of the number of clusters as well as their assignment

Page 4: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Problem Setting (2/2)

4

F ED C

BA

F

ED

CB

A

F ED C

BA

FE

DC

BA

αβ

Examples of clustering structure changes

Existing customers change their patterns

New customer s emerge to form a new group

There exist various types of clustering structures

Page 5: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Related works

Evolutionally clustering [Chakrabrti et. al., 2006]Hypothesis testing approach[Song and Wang,

2005]Kalman filter approach [Krempl et. al., 2011]Graph Scope [Sun et. al., 2007]Variational Bayes approach[Sato, 2001]

5

Clustering change detection issue

Page 6: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

SignificanceA novel clustering change detection algorithm

Key idea: ・ Sequential dynamic model selection (sequential DMS ) ・ NML(normalized maximum likelihood) code-length as criteria ……..First formulae for NML for Gaussian mixture models

6

Empirical demonstration of its superiority over existing methods

Shown using artificial data sets

Demonstration of its validity in market analysisShown using real beer consumption data sets

Page 7: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

7

Page 8: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Proposed Alg. – background of DMS –

Batch DMS criterion :

8

Dynamic Model Selection ( DMS )[Yamanishi and Maruyama, 2007]

Total code-length

Code-length of data seq.

Code-length of model seq.

Minimum w.r.t.

~Extension of MDL (Minimum Description Length) principle[Rissanen, 1978] into model “sequence” selection

Page 9: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Proposed Alg. – Sequential DMS –

At each time t, given , sequentially select for clustering

9

Sequential dynamic model selection (SDMS) Alg.

Code-length for data clustering ~ NML (normalized maximum likelihoood)  coding

Code-length for transition of clustering structure

Minimumw.r.t. Kt, Zt

Sequential variant of DMS criterion

[Yamanishi and Maruyama, 2007]

s.t.

Page 10: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Proposed Alg. – model transition –

Run EM alg. with initial values below:Case 1

# of clusters does not changeInitial parameter values remain the same

Case 2# of clusters decreases (e.g. , merging)Assign data in a certain cluster to other ones randomly

Case 3# of clusters increases (e.g., splitting)

Set data to a new cluster randomly

10

Consider three patterns of clustering changes

Case 2

Case 3

Page 11: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Proposed Alg. – code-length for transition –Model transition probability distribution Suppose K transits to neighbors only

Employ Krichevsky-Trofimov (KT) estimate[Krichevsky and Trofimov,

1981]

11

Code-length of the model transition

Page 12: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

12

Page 13: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Criteria – NML code-length –Model (Gaussian mixture model) :

NML (normalized maximum likelihood) code-length :

Shortest code-length in the sense of minimax criterion [Shatarkov 1987] 13

Normalization

term

Page 14: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

For Continuous DataNormalization term

In case of , the data ranges over all domains

Problem:NML for Gaussian distribution

Normalization term diverges

NML for mixture distribution Normalization term is computationally intractable This comes from combinational difficulties

14

Page 15: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

For Continuous Data (Example)For the one-dimension Gaussian distribution

(σ2 is given)

Normalization term

15

Page 16: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Approximate computation (1/2)

16

Use sufficient statistics

g1 : Gaussian distributiong2 : Wishart distribution

Page 17: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Criteria – NML for GMM –

Restrict the range of data so that the MLE lies in a bounded range specified by a parameter

17

Efficiently computing an approximate variant of the NML code-length for a GMM

[Hirai and Yamanishi, 2011]

The normalization term does not divergeBut still highly depends on the parameters :

Page 18: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

NMLThe normalization term is calculated as follows :

18

where,

: number of data, : dim. of data

Page 19: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Criteria – RNML code-length –

Re-normalize around the MLE of parameter by restricting the range of data

19

Modify NML to develop the re-normalized maximum likelihood coding (RNML)

[Rissanen, Roos, Myllymaki 2010][Hirai and Yamanishi, 2012]

Less dependent on hyper-parameter

Page 20: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

20

Criteria – RNML code-length –

Page 21: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

RNML code-lengthTheorem [Hirai and Yamanishi 2012]

RNML code-length for GMM is calculated as follows :

21

Definition

ProblemComputing

, costs .

1

Page 22: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Criteria – efficient computing of RNML –Straightforward computation of RNML requires

time⇒ But we can compute it efficiently

Theorem [Kontkanen and Myllymaki, 07]

22

)

Page 23: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Can compute the normalization term in for “mixture” models

Criteria – efficient computing of RNML –Straightforward computation of RNML requires

time⇒ But we can compute it efficiently

Theorem [Hirai and Yamanishi, 2012]The normalization term satisfies

recurrsive formula

23

2 2

Page 24: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

24

Page 25: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Experimental Results – data generation –

Generate artificial data set according to GMM with

25

Page 26: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Experimental Results – comparison criteria –

AR (accuracy rate) :Average rate of correctly estimating the true number of clusters over all time

IR (identification rate) :Probability of correctly identifying change-points and change themselves

FAR (false alarm rate) :Rate of the number of false alarms over all detected change-points

26

Employ three comparison metrics

Page 27: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Experimental Results – artificial data –

27

Our alg. with NML was able to detect true change-points and identify the true # of clusters with higher probability than AIC

and BIC

Average Number of clusters Over Time

AIC:Akaike’s information criteria [Akaike1974]BIC:Bayesian information criteria [Shwarz 1978]

RNML

AIC BIC

AR0.90

30.103 0.135

IR0.38

00.005 0.020

FAR 0.2600.02

00.718

Page 28: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Comparison w. r. t. KL-divergenceEvaluated change detection accuracies by

varying the Kullback-Leibler divergence (KLD) between the distributions before and after the change points

28

The larger the KLD between GMMs before and after the change-point was, the more accurately it was detected in terms of IR (identification rate).

Page 29: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Experimental Results – vs SW Alg. –

SW algorithm : Hypothesis testing whether clusters are identical or not, then make splitting, merging, etc. [Song and Wang, 2005]

29

The sequential DMS with RNML significantly outperformed SW-alg.

AR IR FAR

Proposed 0.988 0.950 0.050

SW-RNML 0.369 0.300 0.503

SW-BIC 0.019 0.000 0.841 Data : size/time = 512

Page 30: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Experimental Results – market analysis –

30

Data set provided by MACROMILL, Inc.

Clustering customers to detect their structure changes

Our alg. detected clustering changes that

corresponded to the year’s ending demand

Beer 1

Beer 2

. . .

User 1

350 700 . . .

User 2

1050 350 . . .

. . . . . . . . . . . .

Beer 1

Beer 2

. . .

User 1

350 700 . . .

User 2

1050 350 . . .

. . . . . . . . . . . .

Beer 1

Beer 2

. . .

User 1

350 700 . . .

User 2

1050 350 . . .

. . . . . . . . . . . .

14 kinds of beer

78 days

Page 31: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

The cluster change in change-point : 1/1,2

31

Many of customers changed their patterns

to purchase Beer-A and Third-Beer at the

year’s end

Page 32: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Conclusion

32

Page 33: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Why is NML ?

33

The shortest code-length in the sense of Shtarkov’s minimax criterion

[Shtarkov, 1987]

Minimum is attained by Q= NML distribution

MaximumLikelihoodEstimator

For a given class :

Page 34: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Restrict the range of data

34

Restrict the range of data forShtarkov’s minimax criterion

[Shtarkov, 1987]

For a given class :

Restrict the range of data.

We change the Shtarkov’s minimax

criterion itself

Page 35: So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.

Comparison with non-parametric Bayes

Sequential Dynamic Model Selection works better than non-parametric Bayes (Infinite HMM, etc.)

[Comparison of Dynamic Model Selection with Infinite HMM for Statistical Model Change Detection

Sakurai and Yamanishi, to appear in ITW 2012]

35


Recommended