A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The...

Post on 03-Jul-2020

2 views 0 download

transcript

A Nonparametric Approach for Multiple Change PointAnalysis of Multivariate Data

David S. MattesonDepartment of Statistical Science

Cornell University

matteson@cornell.eduwww.stat.cornell.edu/~matteson

Joint work with: Nicholas A. James, ORIE, Cornell University

Sponsorship: National Science Foundation

2014 October

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 1 / 40

Introduction

Change Point Analysis

The process of detecting distributional changes within time ordered data

Framework:

I Retrospective, offline analysis

I Multivariate observations

I Estimation: number of change points and their positions

I Hierarchical algorithms

Applications:

I Genetics

I Finance

I Emergency Medical Services

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 2 / 40

Introduction

Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd

Partition into k homogeneous, temporally contiguous subsets

I k is unknownI Size of each subset is unknown

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

Introduction

Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd

Partition into k homogeneous, temporally contiguous subsets

I k is unknownI Size of each subset is unknown

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

Introduction

Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd

Partition into k homogeneous, temporally contiguous subsets

I k is unknownI Size of each subset is unknown

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

Introduction

Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd

Partition into k homogeneous, temporally contiguous subsets

I k is unknownI Size of each subset is unknown

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

Cluster Analysis

Cluster Analysis

Change point analysis is similar to cluster analysis

In cluster analysis we also wish to partition the observations intohomogeneous subsets

I Subsets may not be contiguous in time without some constraints

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 4 / 40

Cluster Analysis

Cluster Analysis

Change point analysis is similar to cluster analysis

In cluster analysis we also wish to partition the observations intohomogeneous subsets

I Subsets may not be contiguous in time without some constraints

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 4 / 40

Cluster Analysis

Cluster Analysis

Change point analysis is similar to cluster analysis

In cluster analysis we also wish to partition the observations intohomogeneous subsets

I Subsets may not be contiguous in time without some constraints

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 4 / 40

Hierarchical Estimation

Hierarchical Estimation

Apply methods from clustering to find change points

Exhaustive search is not practical: O(nk), in general.

May consider Dynamic Programming

We use a hierarchical or sequential approach: O(kn2)

I Divisive: Clusters are divided until each observation is its own cluster

I Agglomerative: Clusters are merged until all observations belong to asingle cluster

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

Hierarchical Estimation

Hierarchical Estimation

Apply methods from clustering to find change points

Exhaustive search is not practical: O(nk), in general.

May consider Dynamic Programming

We use a hierarchical or sequential approach: O(kn2)

I Divisive: Clusters are divided until each observation is its own cluster

I Agglomerative: Clusters are merged until all observations belong to asingle cluster

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

Hierarchical Estimation

Hierarchical Estimation

Apply methods from clustering to find change points

Exhaustive search is not practical: O(nk), in general.

May consider Dynamic Programming

We use a hierarchical or sequential approach: O(kn2)

I Divisive: Clusters are divided until each observation is its own cluster

I Agglomerative: Clusters are merged until all observations belong to asingle cluster

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

Hierarchical Estimation

Hierarchical Estimation: Divisive Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 6 / 40

Hierarchical Estimation

Hierarchical Estimation: Divisive Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 6 / 40

Hierarchical Estimation

Hierarchical Estimation: Divisive Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 6 / 40

Hierarchical Estimation

Hierarchical Estimation: Agglomerative Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 7 / 40

Hierarchical Estimation

Hierarchical Estimation: Agglomerative Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 7 / 40

Hierarchical Estimation

Hierarchical Estimation: Agglomerative Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 7 / 40

Hierarchical Estimation

Hierarchical Estimation: Agglomerative Progression

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 7 / 40

Multivariate Homogeneity

Measuring Multivariate Homogeneity

Suppose X,Y ∈ Rd with X ∼ Fx ⊥⊥ Y ∼ Fy

Let φx(t) = E(e i〈t,X〉) and φy (t) = E

(e i〈t,Y〉) characteristic functions

Define a divergence between Fx and Fy as

E(X,Y; w) =

∫Rd

|φx(t)− φy (t)|2 w(t) dt,

w(t) denotes an arbitrary positive weight function, for which E exists

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 8 / 40

Multivariate Homogeneity

A Weight Function

A convenient choice for w(t) > 0 (Szekely and Rizzo, 2005):

w(t;α) =

(2πd/2Γ(1− α/2)

α2αΓ((d + α)/2)|t|d+α

)−1

in which Γ(x) is the gamma function

Note: for any fixed (d , α), w(t;α) ∝ |t|−(d+α)

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 9 / 40

Multivariate Homogeneity

Equivalent Divergence MeasuresLet X and Y be independent, and (X′,Y′) be an iid copy of (X,Y)

Theorem

Suppose that E(|X|α + |Y|α) <∞, for some α ∈ (0, 2], then

E(X,Y;α) =

∫Rd

|φx(t)− φy (t)|2(

2πd/2Γ(1− α/2)

α2αΓ((d + α)/2)|t|d+α

)−1

dt

= 2E|X− Y|α − E|X− X′|α − E|Y − Y′|α

< ∞

I If 0 < α < 2 then E(X,Y;α) = 0 if and only if X and Y areidentically distributed

I If α = 2 then E(X,Y;α) = 0 if and only if EX = EY

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 10 / 40

Multivariate Homogeneity

An Empirical Measure (U-statistics)

Let Xn = {Xi : i = 1, . . . , n} and Ym = {Yj : j = 1, . . . ,m} beindependent iid samples from the distribution of X ,Y ∈ Rd , respectively,such that E |X |α,E |Y |α <∞ for some α ∈ (0, 2)

Define

E(Xn,Ym;α) =

2

mn

n∑i=1

m∑j=1

|Xi − Yj |α −(

n

2

)−1∑1≤i<k≤n

|Xi − Xk |α −(

m

2

)−1 ∑1≤j<k≤m

|Yj − Yk |α

and

Q(Xn,Ym;α) =mn

m + nE(Xn,Ym;α)

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 11 / 40

Multivariate Homogeneity

Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)

E(Xn,Ym;α)→ E(X ,Y ;α)

almost surely, as min(m, n)→∞.

Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,

Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1

λiQi

in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2

1, seeRizzo and Szekely (2010).

Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,

Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

Multivariate Homogeneity

Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)

E(Xn,Ym;α)→ E(X ,Y ;α)

almost surely, as min(m, n)→∞.

Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,

Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1

λiQi

in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2

1, seeRizzo and Szekely (2010).

Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,

Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

Multivariate Homogeneity

Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)

E(Xn,Ym;α)→ E(X ,Y ;α)

almost surely, as min(m, n)→∞.

Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,

Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1

λiQi

in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2

1, seeRizzo and Szekely (2010).

Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,

Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

Single Change Point

Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.

Suppose heterogeneous sample with observations from two distributions.

Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .

Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.

A change point location τT is then estimated as

τT = argmaxτ

QT (Xτ ,Yτ ;α).

Theorem

If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then

τT/Ta.s.−→ γ, as T →∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

Single Change Point

Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.

Suppose heterogeneous sample with observations from two distributions.

Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .

Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.

A change point location τT is then estimated as

τT = argmaxτ

QT (Xτ ,Yτ ;α).

Theorem

If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then

τT/Ta.s.−→ γ, as T →∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

Single Change Point

Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.

Suppose heterogeneous sample with observations from two distributions.

Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .

Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.

A change point location τT is then estimated as

τT = argmaxτ

QT (Xτ ,Yτ ;α).

Theorem

If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then

τT/Ta.s.−→ γ, as T →∞.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

Multiple Change Points

Multiple Change Points: Unknown Locations

A generalized bisection approach for sequential estimation

For 1 ≤ τ < κ ≤ T , define:

Xτ = {Z1,Z2, . . . ,Zτ} and Yτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

A change point location τ is then estimated as

(τ , κ) = argmax(τ,κ)

Q(Xτ ,Yτ (κ);α).

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 14 / 40

Multiple Change Points

Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1

This partitions the observations into k clusters C1, C2, . . . , Ck

Given these clusters, we then apply the single change point procedurewithin each of the k clusters.

For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)

Now let i∗ = argmaxi∈{1,...,k}

Q[Xτ(i),Yτ(i)(κ(i));α],

in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci

Denote test statistic as

qk = Q(Xτk ,Yτk (κk);α),

τk = τ(i∗) is kth estimated change point, located within cluster Ci∗

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

Multiple Change Points

Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1

This partitions the observations into k clusters C1, C2, . . . , Ck

Given these clusters, we then apply the single change point procedurewithin each of the k clusters.

For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)

Now let i∗ = argmaxi∈{1,...,k}

Q[Xτ(i),Yτ(i)(κ(i));α],

in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci

Denote test statistic as

qk = Q(Xτk ,Yτk (κk);α),

τk = τ(i∗) is kth estimated change point, located within cluster Ci∗

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

Multiple Change Points

Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1

This partitions the observations into k clusters C1, C2, . . . , Ck

Given these clusters, we then apply the single change point procedurewithin each of the k clusters.

For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)

Now let i∗ = argmaxi∈{1,...,k}

Q[Xτ(i),Yτ(i)(κ(i));α],

in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci

Denote test statistic as

qk = Q(Xτk ,Yτk (κk);α),

τk = τ(i∗) is kth estimated change point, located within cluster Ci∗

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Estimation

The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}

Recall, a change point location τ is estimated as

(τ , κ) = argmax(τ,κ)

Q(Aτ ,Bτ (κ);α)

Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation Test

Distribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)

∣∣τ=τ

is unknown

Significance of proposed change point measured via permutation test

Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ

is insignificant: STOP

If significant, condition on location, and repeat within clusters:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change Points

Once again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm Inference

The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test

However, only permute within each cluster:

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 19 / 40

The E-Divisive Algorithm ecp Package

The ‘ecp’ R package (CRAN)Signature:

e.divisive(X, sig.lvl=0.05, R=199, k=NULL, min.size=30, alpha=1)

Arguments:

I X - A T × d matrix representation of a length T time series, withd-dimensional observations.

I sig.lvl - The significance level used for the permutation test.

I R - The maximum number of permutations to perform in thepermutation test.

I k - The number of change points to return. If this is NULL only thestatistically significant estimated change points are returned.

I min.size - The minimum number of observations btw change points.

I alpha - The index for test statistic.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 20 / 40

The E-Divisive Algorithm ecp Package

The ‘ecp’ R package (CRAN)Returned list:

I k.hat - Number of clusters created by the estimated change points.

I order.found - The order in which the change points were estimated.

I estimates - Locations of the statistically significant change points.

I considered.last - Location of the last change point, that was notfound to be statistically significant at the given significance level.

I permutations - The number of permutations performed by each ofthe sequential permutation test.

I cluster - The estimated cluster membership vector.

I p.values - Approximate p-values estimated from each permutationtest.

Complexity is O(kT 2)David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 21 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)

For two partitions U & V , the Rand Index considers all pairs ofobservations:

Define

{A} Pairs in same cluster under U and in same cluster under V

{B} Pairs in different cluster under U and in different cluster under V

Rand index =#A + #B(T

2

)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)

Adjusted Rand =Index− Expected Index

Max Index− Expected Index=

Rand− Expected Rand

1− Expected Rand

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

Simulation

A change in variance for univariate normal data

Method Correct k Average Adjusted Rand

MultiRank 22/100 0.504

E-Divisive 95/100 0.909

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 23 / 40

Simulation

A change in correlation for bivariate normal data

Method Correct k Average Adjused Rand

MultiRank 72/100 0.166

E-Divisive 92/100 0.997

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 24 / 40

Simulation

1,000 simulations, 2 CP: N(0,1), N(µ,1), N(0,1)

Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive

1501 0.940 0.948 0.867 0.8852 0.977 0.991 0.949 0.9814 0.981 1.000 0.958 1.000

3001 0.970 0.972 0.933 0.9372 0.989 0.996 0.975 0.9914 0.991 1.000 0.979 1.000

6001 0.986 0.986 0.968 0.9692 0.994 0.998 0.987 0.9964 0.995 1.000 0.990 1.000

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 25 / 40

Simulation

1,000 simulations, 2 CP: N(0,1), N(0, σ2), N(0,1)

Average Rand Average Adj. RandT σ2 MultiRank E-Divisive MultiRank E-Divisive

1502 0.731 0.902 0.471 0.7855 0.764 0.976 0.521 0.948

10 0.764 0.989 0.519 0.975

3002 0.744 0.924 0.490 0.8345 0.759 0.990 0.511 0.978

10 0.759 0.995 0.512 0.989

6002 0.742 0.970 0.488 0.9335 0.753 0.996 0.500 0.990

10 0.753 0.998 0.501 0.995

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 26 / 40

Simulation

1,000 simulations, 2 CP: N(0,1), tν(0, 1), N(0,1)

Average Rand Average Adj. RandT ν MultiRank E-Divisive MultiRank E-Divisive

15016 0.632 0.798 0.327 0.5648 0.651 0.830 0.353 0.6312 0.679 0.846 0.395 0.666

30016 0.640 0.755 0.341 0.4928 0.639 0.769 0.338 0.5222 0.680 0.809 0.396 0.596

60016 0.655 0.735 0.365 0.4698 0.653 0.727 0.359 0.4582 0.697 0.813 0.420 0.608

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 27 / 40

Simulation

1,000 simulations, 2 CP: N2(0, I ),N2(µ, I ),N2(0, I )

Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive

3001 0.656 0.698 0.363 0.4062 0.713 0.732 0.446 0.4683 0.743 0.778 0.489 0.549

6001 0.991 0.994 0.981 0.9872 0.995 1.000 0.989 0.9993 0.996 1.000 0.990 1.000

9001 0.994 0.996 0.987 0.9912 0.997 1.000 0.993 0.9993 0.997 1.000 0.993 1.000

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 28 / 40

Simulation

1,000 simulations, 2 CP: N2(0,Σ),N2(0, I ),N2(0,Σ)

Σ =

(1 ρρ 1

)Average Rand Average Adj. Rand

T ρ MultiRank E-Divisive MultiRank E-Divisive

3000.5 0.663 0.729 0.373 0.4550.7 0.712 0.728 0.444 0.4620.9 0.745 0.743 0.491 0.488

6000.5 0.674 0.676 0.391 0.3860.7 0.724 0.672 0.462 0.3700.9 0.745 0.834 0.492 0.673

9000.5 0.692 0.635 0.415 0.3220.7 0.724 0.678 0.464 0.3980.9 0.747 0.966 0.494 0.928

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 29 / 40

Simulation

1,000 simulations, 2 CP: Nd(0,Σ),Nd(0, I ),Nd(0,Σ)

Σw.o./noise =

0BBBBB@1 ρ ρ · · · ρρ 1 ρ · · · ρρ ρ 1 · · · ρ...

......

. . ....

ρ ρ ρ · · · 1

1CCCCCA Σw/noise =

0BBBBB@1 ρ 0 · · · 0ρ 1 0 · · · 00 0 1 · · · 0...

......

. . ....

0 0 0 · · · 1

1CCCCCAWithout Noise With Noise

T d Avg. Rand Avg. Adj. Rand Avg. Rand Avg. Adj. Rand

3002 0.767 0.522 0.774 0.5435 0.912 0.816 0.736 0.4639 0.970 0.935 0.736 0.459

6002 0.817 0.648 0.836 0.8165 0.993 0.984 0.631 0.6269 0.998 0.995 0.666 0.648

9002 0.970 0.937 0.968 0.9335 0.998 0.996 0.644 0.3429 0.999 0.999 0.612 0.284

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 30 / 40

Applications Genetics

Genetics DataWe applied E-divisive to the aCGH mico-array dataset of 43 individuals with abladder tumor (Bleakley and Vert, 2011); relative hybridization intensity profile forone individual.MultiRank (Lung-Yut-Fong et al., 2011) k = 17 adjRand = 0.677KCPA (Arlot et al., 2012) k = 41 adjRand = 0.658PELT (Killick et al., 2012) k = 47 adjRand = 0.853

0 500 1000 1500 2000

−0.5

0.51.5

MultiRank

Index

Sign

al

0 500 1000 1500 2000

−0.5

0.51.5

KCPA

Index

Sign

al

0 500 1000 1500 2000

−0.5

0.51.5

PELT

Index

Sign

al

0 500 1000 1500 2000

−0.5

0.51.5

E−Divisive

Index

Sign

al

Figure: Top: MultiRank. Bottom: E-divisiveDavid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 31 / 40

Applications Finance

Financial Data: Cisco Systems

The E-divisive procedure was applied to the monthly log returns of theDow 30

Marginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 32 / 40

Applications Finance

Financial Data: Cisco SystemsMarginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 33 / 40

Applications Finance

Financial Data: S&P 500 Index

S&P 500: May 20, 1999 − April 25, 2011

Date

log

retu

rns

2000 2002 2004 2006 2008 2010

−0.

10−

0.05

0.00

0.05

0.10

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 34 / 40

Agglomerative Algorithm

An Agglomerative Algorithm

Given a partition of k clusters C = {C1,C2, . . . ,Ck}, clusters may or maynot be single observations

Consider combining a pair of adjacent clusters

The partition that maximizes the goodness-of-fit statistic determineschange point locations

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 35 / 40

Agglomerative Algorithm

An Agglomerative Algorithm: Goodness-of-Fit

Goodness-of-fit statistic S(k): sum the E-distances between adjacentclusters

Given clusters C = {C1,C2, . . . ,Ck} with ni = #Ci , define

S(k) =k−1∑i=1

(nini+1

ni + ni+1

)Eαni ,ni+1

(Ci ,Ci+1),

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 36 / 40

Agglomerative Algorithm

An Agglomerative Algorithm

The partitioning which maximized S(k) is then used to estimate changepoint locations.

Figure: Progression of the goodness of fit statistic, and where it is maximized.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 37 / 40

Agglomerative Algorithm Application: EMS

EMS Priority One Response for Toronto 2007

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 38 / 40

Agglomerative Algorithm Application: EMS

EMS Priority One Response for Toronto 2007

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 39 / 40

Bibliography

Bibliographyhttp://www.stat.cornell.edu/∼matteson/

Bleakley, K., and Vert, J.-P. (2011), The group fused Lasso for multiple change-pointdetection,, Technical Report HAL-00602121, Bioinformatics Center (CBIO).

Hoeffding, W. (1961), The Strong Law of Large Numbers for U-Statistics,, TechnicalReport 302, North Carolina State University. Dept. of Statistics.

Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification,2(1), 193 – 218.

James, N. A., and Matteson, D. S. (2013), “ecp: An R Package for NonparametricMultiple Change Point Analysis of Multivariate Data,” arXiv:1309.3295, .

Lung-Yut-Fong, A., Levy-Leduc, C., and Cappe, O. (2011), “Homogeneity andchange-point detection tests for multivariate data using rank statistics,”.

Matteson, D. S., and James, N. A. (2013), “A Nonparametric Approach for MultipleChange Point Analysis of Multivariate Data,” Journal of the American StatisticalAssociation, To Appear.

Rizzo, M. L., and Szekely, G. J. (2010), “Disco Analysis: A Nonparametric Extension ofAnalysis of Variance,” The Annals of Applied Statistics, 4(2), 1034–1055.

Szekely, G. J., and Rizzo, M. L. (2005), “Hierarchical Clustering via JointBetween-Within Distances: Extending Ward’s Minimum Variance Method,” Journal ofClassification, 22(2), 151 – 183.

David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 40 / 40