+ All Categories
Home > Documents > Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum...

Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum...

Date post: 19-Jun-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
43
Detecting Change in Multivariate Data Streams Using Minimum Subgraphs Robert Koyak Operations Research Dept. Naval Postgraduate School Collaborative work with Dave Ruth, Emily Craparo, and Kevin Wood
Transcript
Page 1: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Detecting Change in Multivariate Data Streams Using Minimum Subgraphs

Robert Koyak

Operations Research Dept.

Naval Postgraduate School

Collaborative work with Dave Ruth, Emily Craparo, and Kevin Wood

Page 2: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Basic Setup

0

1 2

1

( ) :

( ) :

j

N

N

F j

H

F F F

H

,

Have observations assumed to be sampled

independently from unknown, multivariate

distributions distribution of observation

T

Homogeneity Hypothesis

Heterogeneity Hypothesis

1 2 1 1

1 1

{2, , }

, ,

( , ) max ( , )

{ 1, , }

k k k

j r jk r j

k N

F F F F F

F F F F

j k N

here exists some such that

and

is

strictly positive and nondecreasing for

2

Page 3: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Heterogeneity includes:

• A single change in distribution at a known change point (“two-sample problem”)

• A single change in distribution at an unknown change point

• Directional drift (in mean or other features) that begins at an unknown point in the observation sequence

3

Page 4: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Distance Matrix

4

1 2 3

1 2 3

( , )

, ,

, ,

distance matrix (Euclidean,

Manhattan, etc.)

Maa, Pearl, and Bartoszynski (1996) :

independent, ~

independent, ~

if and only if

i j

i j i j i j

D d N N

d d

Y Y Y F

Z Z Z G

F G

y y y y

1 2 1 2 3 3( , ) ( , ) ( , ) d Y Y d Z Z d Y ZLL

Page 5: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

5

The distance matrix has the information needed to express departure from the homogeneity hypothesis. For the types of departure we want to detect, this information should be expressed in particular ways. How can we unlock it?

Page 6: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

6

The strategy we will explore is to fit a minimum subgraph (of some type) to the data treated as vertices in a complete, undirected graph. From the subgraph a statistic is derived that is sensitive to the departures from homogeneity that we wish to detect.

Page 7: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

A Graph-Theoretic Approach

7

( , )

( , ), ,

| | ( 1) / 2

ˆ ˆ ˆ( , )

ˆ

N N

N

N

G V E

G V E V

E N N

G V E

G

Complete undirected graph

Subgraph family (e.g. spanning trees,

k-factors, Hamiltonian paths or circuits)

Minimum subgraph is defined by

argmin

G

G

( , )

ˆˆ ( )

Ni ji j E

d

GThe test statistic is

G

Page 8: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Minimum Spanning Trees (MSTs)

• Friedman and Rafsky (1979) used MSTs to define a multivariate extension of the runs test in the context of the two-sample problem

• The test statistic is the number of edges in the MST that join vertices belonging to different samples

• Small values of the statistic are evidence against homogeneity

8

Page 9: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

9

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

7474

80

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

7474

80

MST for breast cancer mortality rates, 1969 to 1988 (N = 20), relative to 1968 base. Next, treat Sample 1 as the years 1969–1978 and Sample 2 as the years 1979–1988

Page 10: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

10

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

7474

80

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

7474

80

There are edges that join vertices in different samples. The p-value, obtained by a permutation test, is about 0.41

ˆ 11MST

Page 11: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Is anything really happening?

11

Spearman rank correlations vs. time, p-values: Philadelphia .0004 Schuylkill .01

Page 12: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Minimum Non-bipartite Matching (MNBM)

• Also known as unipartite matching, 1-factor

• Rosenbaum (2005) defined a “cross-match” test using MNBM analogous to that of Friedman and Rafsky

• The test statistic is the number of edges in the MNBM that join vertices belonging to different samples

• Small values of the statistic are evidence against homogeneity

12

Page 13: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Cross-match test (Rosenbaum)

13

2

/ 2

2

( ) 2

(number of matching edges)

Group 1 has observations

Group 2 has observations

number of cross-matches

number of matches within Group 1

C

C

k r

n N

k

N k

M

M

M M k

n k r NP M r

k r r k

1

,

0 ( ), , / 2r k n k

Page 14: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

14

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

MNBM fit to the breast cancer mortality data. Count the number of edges that join vertices in different groups

Page 15: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

15

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

There are edges that join vertices in different samples. The p-value, obtained from the exact null distribution, is about 0.87

ˆ 6CM

Page 16: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Extensions of the Cross-Match Test

16

1 :

Ruth (2009) and Ruth & Koyak (2011) introduce

two extensions of the cross-match test to detect

departures from homogeneity in the direction

of

(1) An exact, simultaneous cross-match test for

an

H

0 10 1

ˆ( , )

1 1ˆ2 4( , )

ˆ ˆ( ) min ( ) ( , , )

ˆ

| | ( 1)

SCM CM

SPM

unspecified change-point

(2) A sum of (vertex) pair maxima test

kk k k

i j E

i j E

k q k k

i j

i j N N

Page 17: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

17

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

Sch

uylk

ill

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

0.95 1.05 1.15 1.25

1.0

1.2

1.4

1.6

Philadelphia

69 70

71 72

73

74

75

76

77

78

79

80

81

82 83

84

8586

87

88

74

80

SCM test has exact p-value of 0.59 for testing against an unspecified change-point SPM test has approximate p-value of 0.41

Page 18: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Some Theory

• Friedman & Rafsky’s – Asymptotic normality under H0

– Universal consistency under H1 for the two-sample problem (Henze & Penrose, 1999)

• Rosenbaum’s – Asymptotic normality under H0

– Consistency under restrictive assumptions

• Ruth’s SPM test – Asymptotic normality under H0

– Consistency remains to be proven

18

ˆMST

ˆCM

ˆSPM

Page 19: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Ensemble Tests

19

Problem with graph-theoretic tests: a single minimum

subgraph contains very limited information about and

as such these tests are not very powerful

Tukey suggested fitting multiple "orthogonal" MST

D

s in

Friedman & Rafsky's test and combining them (in a

manner that was not specified)

Two subgraphs are orthgonal if they share no common

edges

For MSTs this is problematic: existence of a

/ 2

fixed number

of orthogonal MSTs (even two) is not assured!

For MNBMs we are assured at least orthogonal

subgraphs (Anderson, 1971) constructed sequentially

N

Page 20: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

0.95 1.00 1.05 1.10 1.15 1.20 1.25

1.0

1.2

1.4

1.6

Philadelphia

Schuylk

ill

69 70

71

72

7374

7576

77

78

79

808182 83

84

8586

87

88

First MNBM Fit to the Breast Cancer Mortality Data

Page 21: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

0.95 1.00 1.05 1.10 1.15 1.20 1.25

1.0

1.2

1.4

1.6

Philadelphia

Schuylk

ill

69 70

71

72

7374

7576

77

78

79

808182 83

84

8586

87

88

First Two MNBMs Fit to the Breast Cancer Mortality Data

Page 22: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

0.95 1.00 1.05 1.10 1.15 1.20 1.25

1.0

1.2

1.4

1.6

Philadelphia

Schuylk

ill

69 70

71

72

7374

7576

77

78

79

808182 83

84

8586

87

88

First Three NMBMs Fit to the Breast Cancer Mortality Data

Page 23: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Structure of Ensembles • Ensemble pairs decompose into Hamiltonian cycles

each having an even number of vertices

– Under H0 all 1-factors are equally likely but it is not true that all ensemble 2-factors are equally likely!

– However, conditional on the cyclic structure uniformity is true

– Second-order properties do not depend on the cyclic structure

• Ensemble 3-factors have more complex cyclic behavior and also exhibit triangles

– Prevalence of triangles depends on the dimensionality of the data:

lower dimension = more triangles 23

Page 24: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Ensemble Tests

24

/ 2

Ruth (2009) proposed an Ensemble Sum of Pair

Maxima (ESPM) test based on fitting a sequence

of orthogonal MNBMs and taking the

cumulative sums of the SPM statistics. The test

takes the followi

n N

1

{1, , } ,

1

2 2

,

ˆ ˆmax ( )

( 1)( 1) / 180, ( 1) / 3

ESPM SPM

ng form:

k

N k n k N

j

N k N

c j

c N N N kN N

Page 25: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Ensemble Tests

25

1

0 ,

1

ˆ ( )

/ ( 1)

SPM(1) Under the process has the

same first two moments as a Brownian bridge,

(2) Although the summands individually are asymptotically

normal

k

N k N k N

j

k

H B t c j

t k N

, the same is not true of the process itself!

(3) Unless the dimensionality of the observations is very large,

classical Brownian bridge theory (Shorack & Wellner, 1987)

produces critical values that violate the nominal level

(4) Ruth (2009) produced critical values for different values of

and dimensionality using extensive simulationsN d

Page 26: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Simulated critical values for N = 200

26

Page 27: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

100 Simulated , Bivar. Normal, Homogeneous

27

Critical (.05) = 1.19

( )N kB t

Page 28: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

100 Simulated , Bivar. Normal, Mean Jump

28

Critical (.05) = 1.19

( )N kB t

Page 29: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

= .05 critical value

= .01 critical value

Number of Orthogonal Matchings (k )

Norm

aliz

ed P

roce

ss

()

NB

ESPMˆ 2.24 has p-value less than .01

Heterogeneity is signaled when six or more matchings are used

()

kt

Page 30: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Power simulations, N = 200, jump at observation 101, = norm of mean vector after the jump, nominal .05-level tests

30

(a) Multivariate normal, mean , 5p

Jump Drift

SCM SPM ESPM JJS SCM SPM ESPM JJS

0 .05 .06 .04 .05 .05 .04 .06 .07

.5 .09 .10 .60 .52 .05 .07 .27 .22

1.0 .33 .41 1.00 1.00 .16 .20 .84 .85

(b) Multivariate normal, mean , 20p

Jump Drift

SCM SPM ESPM JJS SCM SPM ESPM JJS

0 .05 .05 .05 .03 .05 .05 .05 .04

.5 .07 .09 .33 .20 .05 .07 .13 .09

1.0 .16 .22 .95 .95 .09 .11 .56 .49

(c) Multivariate normal, covariance matrix, 5p

Jump Drift

SCM SPM ESPM JJS SCM SPM ESPM JJS

0 .05 .06 .05 .04 .05 .05 .05 .05

.5 .42 .51 .97 .15 .20 .27 .52 .27

1.0 .99 .99 1.00 .24 .77 .79 1.00 .54

Page 31: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Power simulations, N = 200, jump at observation 101, nominal .05-level tests

31

(c) Multivariate normal, covariance matrix, 5p

Jump Drift

SCM SPM ESPM JJS SCM SPM ESPM JJS

0 .05 .06 .05 .04 .05 .05 .05 .05

.5 .42 .51 .97 .15 .20 .27 .52 .27

1.0 .99 .99 1.00 .24 .77 .79 1.00 .54

(d) Multivariate normal mixture, mean , 5p

Jump Drift

SCM SPM ESPM JJS SCM SPM ESPM JJS

0 .05 .05 .04 .27 .04 .04 .06 .28

.5 .08 .09 .56 .38 .07 .07 .21 .33

1.0 .25 .36 .99 .85 .12 .15 .76 .55

1+ mult.

norm

Page 32: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Graph-theoretic Tests: Some Challenges and Possible Directions

1. Computational

2. Theoretical

3. Alternate graph-theoretic approaches

4. Adaptation to real-world problems

32

Page 33: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Computational Challenges

33

2

4

( log( ))

.

( log( ))

Nm N

m N

N N

Finding a MNBM requires computation

time using the Blossom V algorithm (Kolmogorov,

2009). For the complete graph, For ensemble

tests the order of computation is about

wh

1000N

m N

ich is prohibitive with large sample sizes

(e.g. ).

Possible strategies:

(1) Use a greedy algorithm

(2) Restrict the edge set ( )

(3) Try something else

Page 34: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Faster Matchings?

34

Simple greedy heuristics are difficult to extend

to multiple matchings

Edge restriction heuristics. Sufficient conditions

for a perfect matching to exist ( even) include

-- A regular grap

N

/ 2

( )

h of degree

-- A connected, claw-free graph

-- A Delaunay triangulation

Necessary and sufficient conditions: Tutte's

Theorem

odd for all

N

V S S S V

Page 35: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Are MNBM tests universally consistent?

35

Asymptotic theory for MNBM is not straight-

forward even for a single matching, let alone

ensembles.

Aldous & Steele (1992) theory for MSTs exploits

perturbation localizability of MSTs (not applicable

to matchings).

Interesting recent work: "Poisson Matching"

(Holroyd . 2008)et al

Page 36: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

36

,

1

( )

{0, 1}, 1, 1, ,

MNBM is a solution to the integer linear program

Minimize:

Subject to:

By replacing the integrality constraints with the

interval constraints

i j i j

n

i j i j i j

ii j

n

i jf x d

x x j n

x

1 12 4

0 1

ˆ ˆ ˆ| | ( 1)RSPM

a solution can be

obtained using LP. A "relaxed" SPM statistic can

be defined by

i j

i j i j

n n

i j i j

x

j x i j x N N

Page 37: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

37

12

ˆ {0, ,1}

ˆ0 , 1, ,

Solutions to RNBM satisfy

To fit ensembles enforce the constraints

over a sequence of

problems. There is no assurance that solutions

will be "nested", howeve

i j

i j

x

x k k n

r, which complicates

theory

Performance of relaxed MNBM statistics

compares favorably with that of regular MNBM

What about nearest neighbors?

Page 38: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

38

Page 39: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

39

Page 40: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

40

Page 41: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Possible Applications

• Process control (off-line, on-line)

• Mechanical prognostics

• Threat detection

• Syndromic surveillance

In high-dimensional problems, it may be useful to couple graph-theoretic methods with methods to reduce dimensionality

41

Page 42: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

Dimension reduction

42

( , )

( , )

min ( )

s.t. ( ) argmin ( )

{0, 1}

Consider the optimization problem

Vector projects into a low -dimensional space

to minimize the sum of pair i

X E

ij

i j E

Ti j ij

i j

p

r

r

i j x

x

w p'

w

x

w

x w w y y

w

w

ndex differences in

the resulting minimum- weight matching

Page 43: Detecting Change in Multivariate Data Streams Using ...seminar/Slides/RobertKoyak.pdf · Minimum Non-bipartite Matching (MNBM) •Also known as unipartite matching, 1-factor •Rosenbaum

• Simplification 1: use Manhattan distance:

• Simplification 2: use relaxed matching instead of exact matching; enforce minimum-weight matching using strong duality.

43

,ij r

r

ijr ijr ir jrd d y yd w

{0,1 ( , )} , ,

,( , )

( , )

( , )

min

s.t.

p

v i j r

V

i j

i j E

i ijr

r

r i j

V i

v

v ijr

v j E r

r

r

i

A

j x

a

x

d w i j E

d w

w p'

w x 0 π

1x


Recommended