+ All Categories
Home > Documents > Changepoint Detection over Graphs with the Spectral Scan...

Changepoint Detection over Graphs with the Spectral Scan...

Date post: 15-Apr-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
9
545 Changepoint Detection over Graphs with the Spectral Scan Statistic James Sharpnack Alessandro Rinaldo Aarti Singh Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Statistics Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract We consider the change-point detection prob- lem of deciding, based on noisy measure- ments, whether an unknown signal over a given graph is constant or is instead piece- wise constant over two induced subgraphs of relatively low cut size. We analyze the corre- sponding generalized likelihood ratio (GLR) statistic and relate it to the problem of find- ing a sparsest cut in a graph. We develop a tractable relaxation of the GLR statistic based on the combinatorial Laplacian of the graph, which we call the spectral scan statis- tic, and analyze its properties. We show how its performance as a testing procedure de- pends directly on the spectrum of the graph, and use this result to explicitly derive its asymptotic properties on few graph topolo- gies. Finally, we demonstrate both theoret- ically and by simulations that the spectral scan statistic can outperform naive testing procedures based on edge thresholding and χ 2 testing. 1 Introduction In this article we are concerned with the basic but fundamental task of deciding whether a given graph, over which a noisy signal is observed, contains a clus- ter of anomalous or activated nodes comprising an in- duced subgraph. Such a problem is highly relevant in a variety of scientific areas, such as surveillance, dis- ease outbreak detection, biomedical imaging, sensor network detection, gene network analysis, environmen- tal monitoring and malware detection over a computer Appearing in Proceedings of the 16 th International Con- ference on Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, USA. Volume 31 of JMLR: W&CP 31. Copyright 2013 by the authors. network. Recent theoretical contributions in the sta- tistical literature (see, e.g., [4, 3, 2, 1]) have detailed the inherent difficulty of such testing problems in rela- tively simplified settings and under specific conditions on the graph topology. A natural algorithm for detec- tion of anomalous clusters of activity in graphs is the generalized likelihood ratio test (GLRT) or scan statis- tic, a computationally intensive procedure that entails scanning all clusters in our class for anomalous acti- vation. Unfortunately, its performance over general graphs is not well understood, and little attention has been paid to determining alternative, computationally tractable, procedures. In this article we assume that the class of clusters of constant signal consists of sub-graphs of small cut size. We believe this is a natural and realistic assumption which, as we demonstrate below, allows us to explic- itly incorporate into the detection problem the prop- erties of the graph topology through its spectrum. In particular, we show that the GLRT is an integer pro- gram with a term in the objective that corresponds to the sparsest cut in a graph, a known NP-hard prob- lem [27]. With this in mind, we propose a relaxation of the GLRT, called the spectral scan statistic, which is based on the combinatorial Laplacian of the graph and, importantly, is a computationally efficient pro- gram. As our main result, we derive theoretical guar- antees for the performance of the spectral scan statis- tic, that hold for any graph and are based on the spec- trum of the combinatorial Laplacian. For comparison purposes, we derive theoretical guarantees for two sim- ple estimators, the edge thresholding and the χ 2 test. We conclude our study by applying the main result to balanced binary trees, the lattice, and Kronecker graphs, giving precise asymptotic results. Simulations for these models verify that the spectral scan statistic dominates the simple estimators. Before we elaborate on the statistical setup, we will examine two real-world examples of graph structured signals with low cut size. Disease Detection in Human Networks. Many
Transcript
Page 1: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

545

Changepoint Detection over Graphs with the Spectral Scan Statistic

James Sharpnack Alessandro Rinaldo Aarti SinghMachine Learning Department

Carnegie Mellon UniversityPittsburgh, PA [email protected]

Statistics DepartmentCarnegie Mellon University

Pittsburgh, PA [email protected]

Machine Learning DepartmentCarnegie Mellon University

Pittsburgh, PA [email protected]

Abstract

We consider the change-point detection prob-lem of deciding, based on noisy measure-ments, whether an unknown signal over agiven graph is constant or is instead piece-wise constant over two induced subgraphs ofrelatively low cut size. We analyze the corre-sponding generalized likelihood ratio (GLR)statistic and relate it to the problem of find-ing a sparsest cut in a graph. We developa tractable relaxation of the GLR statisticbased on the combinatorial Laplacian of thegraph, which we call the spectral scan statis-tic, and analyze its properties. We show howits performance as a testing procedure de-pends directly on the spectrum of the graph,and use this result to explicitly derive itsasymptotic properties on few graph topolo-gies. Finally, we demonstrate both theoret-ically and by simulations that the spectralscan statistic can outperform naive testingprocedures based on edge thresholding andχ2 testing.

1 Introduction

In this article we are concerned with the basic butfundamental task of deciding whether a given graph,over which a noisy signal is observed, contains a clus-ter of anomalous or activated nodes comprising an in-duced subgraph. Such a problem is highly relevant ina variety of scientific areas, such as surveillance, dis-ease outbreak detection, biomedical imaging, sensornetwork detection, gene network analysis, environmen-tal monitoring and malware detection over a computer

Appearing in Proceedings of the 16th International Con-ference on Artificial Intelligence and Statistics (AISTATS)2013, Scottsdale, AZ, USA. Volume 31 of JMLR: W&CP31. Copyright 2013 by the authors.

network. Recent theoretical contributions in the sta-tistical literature (see, e.g., [4, 3, 2, 1]) have detailedthe inherent difficulty of such testing problems in rela-tively simplified settings and under specific conditionson the graph topology. A natural algorithm for detec-tion of anomalous clusters of activity in graphs is thegeneralized likelihood ratio test (GLRT) or scan statis-tic, a computationally intensive procedure that entailsscanning all clusters in our class for anomalous acti-vation. Unfortunately, its performance over generalgraphs is not well understood, and little attention hasbeen paid to determining alternative, computationallytractable, procedures.

In this article we assume that the class of clusters ofconstant signal consists of sub-graphs of small cut size.We believe this is a natural and realistic assumptionwhich, as we demonstrate below, allows us to explic-itly incorporate into the detection problem the prop-erties of the graph topology through its spectrum. Inparticular, we show that the GLRT is an integer pro-gram with a term in the objective that corresponds tothe sparsest cut in a graph, a known NP-hard prob-lem [27]. With this in mind, we propose a relaxationof the GLRT, called the spectral scan statistic, whichis based on the combinatorial Laplacian of the graphand, importantly, is a computationally efficient pro-gram. As our main result, we derive theoretical guar-antees for the performance of the spectral scan statis-tic, that hold for any graph and are based on the spec-trum of the combinatorial Laplacian. For comparisonpurposes, we derive theoretical guarantees for two sim-ple estimators, the edge thresholding and the χ2 test.We conclude our study by applying the main resultto balanced binary trees, the lattice, and Kroneckergraphs, giving precise asymptotic results. Simulationsfor these models verify that the spectral scan statisticdominates the simple estimators. Before we elaborateon the statistical setup, we will examine two real-worldexamples of graph structured signals with low cut size.

Disease Detection in Human Networks. Many

Page 2: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

546

Changepoint Detection over Graphs with the Spectral Scan Statistic

common experimental techniques in virology reportvarious indicators of a virus, such as antibody pro-tein concentrations (western blot, enzyme-linked im-munosorbent assay) or measuring virus concentrationsdirectly (the plaque assay). One popular method, thewestern blot [8], reports concentrations by the shade ofbands from an x-ray film darkened by a luminescentcompound. Infectious diseases diffuse within humannetworks, so we can exploit this network structure inthe detection of infectious diseases, then we may beable to detect and localize an incipient infection un-der low signal-to-noise ratios (very light bands in thewestern blot).

Sensor Networks. Sensor networks might be de-ployed for detecting nuclear substances, water contam-inants, or activity in video surveillance. Water sup-ply contamination is a common cause for outbreaks ofcholera, gastroenteritis, E. coli, and polio. The designof sensor networks for water supply was the subject ofan engineering challenge in [31]. Because of the poten-tial for large scale health problems, it is of interest todetect contaminated water under low signal-to-noiseregimes. As we will see, by exploiting the graph struc-ture (in this case, the pipe network for the water sup-ply), one can detect activity in networks when the ac-tivity is very faint. Furthermore, the graph structureprovides a versatile framework for modeling environ-mental constraints.

Contributions. Our contributions are as follows. (1)We define a new class of signals based on the notionof small cut size that reflects in a natural way thetopological properties of the graph. (2) We analyzethe corresponding GLR statistic and show that it is,in fact, related to the problem of finding the sparsestcut. We then develop a computationally efficient re-laxation of the GLR statistic, called the spectral scanstatistic and analyze its properties. In our main the-oretical result, we show that the performance of thespectral scan statistic depends explicitly on the spec-tral properties of the graph. (3) Using such results weare able to characterize in a very explicit form the per-formance of the spectral scan statistic on a few notablegraph topologies and demonstrate its superiority overnaive detectors, such as the edge thresholding and theχ2 test. (4) Finally, we have formulated the detectionproblem under more general and realistic scenarios,which involve composite null and alternative hypothe-ses as opposed to simple hypotheses as is customaryin the theoretical statistical literature on this subject.

Related Work. Normal means testing in high-dimensions is a well established and fundamental prob-lem in statistics (see, e.g., [19]). A significant portionof the recent work in this area, [4, 3, 2, 1], has focusedon incorporating structural assumptions on the signal,

as a way to mitigate the effect of high-dimensionalityand also because many real-life problems can be rep-resented as instances of the normal means problemwith graph-structured signals (see, for an example,[20]). These contributions have considered the GLRTwhen the alternative hypothesis takes on the form ofa combinatorial space. However, the performance ofsuch test has been analyzed only for certain types ofgraphs, and it is unclear to what extent those analysesextend to general graph topologies. Moreover, whilemuch is known about the theoretical performance ofthe GLRT, little attention is paid to its computationalfeasibility. Another line of research relevant to ourproblem is the optimal fail detection with nuisance pa-rameters and matched subspace detection in the signalprocessing literature (see, e.g. [34, 6, 16, 15]). Thoughour problem can be cast as a special case of the moregeneral problem of optimal testing of a linear subspaceunder nuisance parameters considered in that line ofwork, the focus on a graph-structured signal, as wellas the type of analysis based on the interplay betweenthe scan statistic and the spectral properties of thegraph contained in our work, are novel.

1.1 Problem Setup

In this section, we formalize the problem of detectinga change of signal from a single set of noisy observa-tions recorded at the vertices of the graph. For a givenconnected, undirected, possibly weighted large graphG = (V,E,W) on |V | = n nodes, we observe one re-alization of the random vector

y = β + ε, (1)

where β ∈ Rn and ε ∼ N(0, σ2In), with σ2 known.We will assume that there are two groups of constantsignal for β, namely that there exists a subset C ⊂ Vsuch that β is constant within both C and it com-plement C = V \C. We formalize this assumption bywriting

β = µ1 + δ1C , (2)

where µ, δ ∈ R are unknown parameters, 1 ∈ Rn is an-dimensional vector of ones and 1C is the indicatorfunction of the subset C. The parameter µ can bethought of as the magnitude of the background signaland is a nuisance parameter, while δ quantifies thethe gap in signal between the two clusters. Settingβ = 1>β/n, we will use ‖β−β‖ to measure the energyof the signal (note that this quantity is independent ofµ) where ‖.‖ always denotes the `2 norm. We willdefine the signal-to-noise ratio (SNR) to be

‖β − β‖σ

=

√|C||C|n

δ

σ.

Page 3: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

547

James Sharpnack, Alessandro Rinaldo, Aarti Singh

We will not assume any knowledge of the true cluster-ing (C, C), other than that it belongs to a given classC of bi-partitions (C, C) of V such that C and C areboth large and have low cut size. Formally, we define,for some ρ > 0,

C = C(ρ) =

{C ⊂ V,C 6= ∅ :

|∂C||C||C|

≤ ρ

n

}, (3)

where ∂C = {(i, j) ∈ E : i ∈ C, j ∈ C} is the bound-ary of C. Note that C is a symmetric class in the sensethat C ∈ C if and only if C ∈ C. We are interestedin the problem of testing whether the gap parameterδ in equation (2) is zero (i.e. the signal β is constant)or it is non-zero for some C ∈ C, regardless of thevalue of µ. Thus, we can naturally cast our structuredchange-point detection problem as the following com-posite hypothesis testing problem:

H0 : β ∈ Θ0 vs H1 : β ∈ Θ1, (4)

where Θ0 = {µ1, µ ∈ R} and Θ1 = {1µ + 1Cδ, µ ∈R, δ ∈ R \ {0}, C ∈ C}. Notice that the alternativecan be written as the union of alternatives of the formHC

1 : β ∈ ΘC1 := {1µ+1Cδ, µ ∈ R, δ ∈ R\{0}}, C ∈ C.

Notice that it is not required that C is a connected setof vertices.

To make our analysis meaningful, we measure the dif-ficulty of the detection problem in terms of the en-ergy parameter by assuming that, for some η > 0,‖β − β‖ ≥ η, ∀β ∈ Θ1. Thus, we can think ofη as the minimal degree of separation between thenull and alternative hypotheses. Below we will ana-lyze asymptotic conditions under which the hypothesistesting problem described above is feasible, in a sensemade precise in the next definition, when the size ofthe graph n increases. To this end, we will furtherassume that the relevant parameters of the model, η,σ, δ and ρ change with n as well, even though we willnot make such dependence explicit in our notation forease of readability. Our results establish conditions forasymptotic disinguishability as a function of the SNRη/σ and ρ and the spectrum of the graph G.

Definition 1. Let Pβ denote the distribution of y in-duced by the model (1), where β ∈ Θ0∪Θ1. For a givenstatistic S(y) and threshold τ ∈ R, let T = T (y) be 1if S(y) > τ and 0 otherwise. We say that the hypothe-ses H0 and H1 are asymptotically distinguishedby the test T if

supβ∈H0

Pβ{T = 1} → 0 and supβ∈H1

Pβ{T = 0} → 0,

(5)where the limit is taken as n → ∞. We say thatH0 and H1 are asymptotically indistinguishableif there does not exist any test for which the abovelimits hold.

Notation. We will need some mathematical termi-nology from algebraic graph theory ([17]). A centralobject to our analysis is the combinatorial Laplacianmatrix L = D −W, where W is the weight matrixof the graph G and D = diag{dv}v∈V is the diago-nal matrix of node degrees, dv =

∑w∈V Wv,w, v ∈ V .

If the graph is weighted then Wv,w reflects this. Wewill denote the eigenvalues of L with {λi}ni=1, whichwe will always take in increasing order. Since G isconnected, the smaller eigenvalue λ1 = 0, with cor-responding eigenvector, 1. λ2 is known as the alge-braic connectivity, which is known to provide boundsfor the minimum cut sparsity via Cheeger’s inequality.Throughout this study we use Bachmann-Landau no-tation for asymptotic statements: if an/bn → 0 thenan = o(bn) and bn = ω(an). If an/bn → c for somec > 0 then we write an � bn. When y ∈ Rn is a vec-tor then y = 1

n

∑ni=1 yi, but for a set C ⊆ V then we

define C = V \C.

2 Methods

The hypothesis testing problem at hand presents twochallenges: (1) the model contains a nuisance param-eter µ ∈ R and (2) the alternative hypothesis is com-prised of a union of hypotheses indexed by C ∈ C.The existence of the nuisance paramter sets our prob-lem further apart from existing work of structured nor-mal means problems (see, e.g. [4, 3, 2, 1]), which re-lies on a simplified framework consisting of a simplenull hypothesis and a composite hypothesis consist-ing of unions of simple alternatives. We will eliminatethe interference caused by the nuisance parameter byconsidering test procedures that are independent ofµ. The formal justification for this choice is based onthe theory of optimal invariant hypothesis testing (see,e.g. [23]) and of uniformly best constant power tests(see [39]). Due to space limitations we will not providethe details and refer the reader to [15, 16, 14, 13, 34, 6]and references therein for in depth-treatments of theseissues related to the model at hand.

For the simpler problem of testing H0 versus HC1 for

some C ⊂ V , the optimal test is based on the likeli-hood ratio (LR) statistic (see the proof of Lemma 2below for a derivation)

2 log ΛC(y) = 2 log

(supβ∈Θ1

fβ(y)

supβ∈Θ0fβ(y)

)

=1

σ2

|V ||C||C|

(∑v∈C

yv

)2

,

(6)

where y = y − y and fβ is the Lebesgue density ofPβ. This test rejects H0 for large values of ΛC(y).Optimality follows from the fact that the statistical

Page 4: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

548

Changepoint Detection over Graphs with the Spectral Scan Statistic

model we consider has the monotone likelihood ratioproperty.

When testing against composite alternatives, like inour case, it is customary to consider instead the gen-eralized likelihood ratio (GLR) or scan statistic, whichin our case reduces to

g = maxC∈C(ρ)

2σ2 log ΛC(y).

Through manipulations of the likelihoods, we find thatthe GLR statistic has a very convenient form which istied to the spectral properties of the graph G via itsLaplacian.

Lemma 2. Let y = y − 1( 1n

∑v∈V yv) and K = I −

1n11>. Then the GLR statistic is

g = maxx∈{0,1}n

x>yy>x

x>Kxs.t.

x>Lx

x>Kx≤ ρ, (7)

where L is the combinatorial Laplacian of the graph G.

The proof is provided in the appendix. The savvyreader will notice the connection between the graphconstrained scan statistic (7) and the graph sparsestcut program. By Lagrangian duality, we see that theprogram (7) is equivalent to (for some Lagrangian pa-rameter ν)

minC⊆V

|∂C||C||C|

− ν(∑i∈C yi)

2

|C||C|

the first term of which is precisely the sparsest cutobjective, and the second term drives the solution Cto have positive within cluster empirical correlations.The sparsest cut program is known to be NP-hard,with poly-time algorithms known for trees and planargraphs [27]. Because of this fact, approximate algo-rithms have been proposed over the past two decades,most notably the uniform multicommodity flow ap-proach of [24, 37] and the semi-definite relaxation ofthe cut metric [5]. [18] observed that the minimum cutsparsity is bounded by the algebraic connectivity (λ2),suggesting the Fiedler vector (i.e. the second eignen-vector of L) to be an appropriate relaxation of thecharacteristic vector of the cut. Moreover, the wellknown Cheeger inequality shows that the minimumcut sparsity (in a regular graph) is bounded by thealgebraic connectivity (see [9]). We will follow the tra-dition of bounding sparsity with the algebraic connec-tivity, and provide a surrogate estimator to the scanstatistic based on this simple spectral relaxation.

Proposition 3. Define the Spectral Scan Statistic(SSS) as

s = supx∈Rn

(x>y)2 s.t. x>Lx ≤ ρ, ‖x‖ ≤ 1,x>1 = 0.

Then the GLR statistic is bounded by the SSS: g ≤ s.

Proof. First let us notice that K = I − 1n11> is the

projection onto the subspace orthogonal to 1. BecauseK is thus idempotent, y1 = 0, and since L1 = 0 wecan rewrite

g = maxx∈{0,1}n\{0,1}

(Kx)>yy>(Kx)

(Kx)>(Kx)

s.t.(Kx)>L(Kx)

(Kx)>(Kx)≤ ρ

So, we have the following relaxation,

g ≤ maxx6=0,x>1=0

x>yy>x

x>xs.t.

x>Lx

x>x≤ ρ = s

Notice that because the domain X = {x ∈ Rn :x>Lx ≤ ρ, ‖x‖ ≤ 1,x>1 = 0} is symmetric aroundthe origin, this is precisely the square of the solutionto

√s = sup

x∈Rnx>y s.t. x>Lx ≤ ρ, ‖x‖ ≤ 1,x>1 = 0,

(8)where we have used the fact that x>y = ((I −1n11>)x)>y = x>y because x>1 = 0 within X .

Remark 4. Through a reparametrization we can showthat the program (8) has a linear objective and onlyquadratic constraints. After forming the Lagrangianwe can show that this is equivalent to

infν0,ν1≥0

ν0ρ+ ν1 +1

4y>[ν0L + ν1I]−1y

which can be solved by first order interior point meth-ods over the parameters ν0, ν1 where the gradient cal-culation requires the solution to a linear system. Fur-thermore, the linear systems are semidefinite, diago-nally dominant, hence by the recent work of [21], hasa running time of O(|E| log n) modulo logarithmic pre-cision factors.

The formulation in (8) shows that the SSS is relatedto the supremum of a Gaussian process over X . Thisfact will turn out to be extremely convenient, as weshow next.

3 Theoretical Analysis

We first derive a simple condition for asymptotic in-distinguishability based on testing the null versus asingle component in the alternative. A more refinedanalysis of the lower bound for the general hypothesis(4) is beyond the scope of this article. Recall that,under alternative hypothesis, ‖β − β‖ ≥ η uniformlyover Θ1.

Page 5: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

549

James Sharpnack, Alessandro Rinaldo, Aarti Singh

Theorem 5. (1) H0 and H1 are asymptotically indis-tinguishable if η/σ = o(1).(2) Suppose that there is a subset of clusters C′ ⊆ 2V

such that all the elements of C′ are disjoint, of thesame size (|C| = c for all C ∈ C′), and

∀C ∈ C′, n|∂C||C||C|

≤ ρ

2

i.e., elements of C′ belong to the alternative hypoth-esis with ρ/2 cut sparsity. Furthermore assume thatc|C′|n → 1. Consider the observation model (1), and

the testing problem given by (4). Then H0 and H1 areasymptotically indistinguishable if

η

σ= o(|C′|1/4)

The proof is in the appendix. We will analyze theperformance of the SSS statistic by relying on its rep-resentation (8) as the square of the supremum of aGaussian process. We draw heavily on the theory ofthe generic chaining, perfected in [38], which essen-tially reduces the problem of computing bounds onthe expected supremum of Gaussian processes to geo-metric properties of its index space.

Theorem 6. The following hold with probability atleast 1− δ. Under the null H0,

s ≤

√2σ2∑i>1

min{1, ρλ−1i }+

√2σ2 log

2

δ

2

,

while under the alternative H1,

s ≥

(η −

√2σ2 log

2

δ

)2

.

Proof. For a detailed proof, please see the appendix.We use generic chaining to control the process{x>y}x∈X appearing in the SSS. First, we notice thatthe index set X is the intersection of an ellipsoid andthe unit ball, which is the intuition behind the follow-ing lemma.

Lemma 7. Let L have spectrum {λi}ni=1. Then underH0,

E supx∈X

x>y ≤√

2σ2∑i>1

min{1, ρλ−1i }.

The proof is provided in the appendix and is a di-rect result of Lemma 14 from [22]. We can then usethe well known phenomena, that the supremum ofa Gaussian process concentrates around it’s expecta-tion. Hence, by Lemma 14 the first statement in The-orem 6 holds. The second statement follows by apply-ing standard concentration results to the univariate

Gaussian β−β‖β−β‖y and noticing that β−β

‖β−β‖ ∈ X and

E (β−β)>

‖β−β‖ y = ‖β − β‖ ≥ η under H1.

As a corollary we will provide sufficient conditions forasymptotic distinguishability that depend on the spec-trum of the Laplacian L. As we will show in the nextsection, these conditions can be applied to a number ofgraph topologies whose spectral properties are known.

Corollary 8. The null and alternative, as described inThm. 6, are asymptotically distinguished by the SSS,s, and the GLRT, g, if

η

σ= ω

√∑i>1

min{1, ρλ−1i }

(9)

Other stronger sufficient conditions are

η

σ= ω

(√k +

(n− k)ρ

λk+1

)(10)

if k is large enough that λk+1 > ρ.

Proof. To see equation (9) we note that, due to Theo-rem 6, if√

2σ2∑i>1

min{1, ρλ−1i }+

√2σ2 log

2

δ

= o

(η −

√2σ2 log

2

δ

)

then we attain asymptotic distinguishability by choos-ing any threshold τ between, and sufficiently far from,the left and right hand side of the previous display. Toshow equation (10) we note that by choosing k suchthat λk+1 > ρ we see that∑

1<i≤k

min{1, ρλ−1i } ≤ k

⇒∑i>k

min{1, ρλ−1i } ≤ (n− k)

ρ

λk+1.

Interestingly, there are no logarithmic terms in (9) thatusually accompany uniform bounds of this type, whichis attributed to the generic chaining.

For comparison, we consider the performance oftwo naive procedures for detection: the energy de-tector, which reject H0 if ‖y‖2 is too large andthe edge thresholding detector, which reject H0 ifmax(v,w)∈E |yv − yw| is large. The following is a clas-sical result that can be found in [19].

Page 6: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

550

Changepoint Detection over Graphs with the Spectral Scan Statistic

Theorem 9. H0 and H1 are asymptotically distin-guished by ‖y‖ if

η

σ= ω(n1/4).

while ‖y‖ fails to asymptotically distinguish H0 fromH1 if

η

σ= o(n1/4)

In [35] the authors examined the problem of exact re-covery of cluster boundaries in the graph-structurednormal means problem by taking differences betweenobservations corresponding to adjacent nodes. The fol-lowing result stems from Theorem 2.1 of [35], and thefact that |C||C|/n scales like min{|C|, |C|} up to a fac-tor of 2.

Theorem 10. H0 and H1 are asymptotically distin-guished by max(v,w)∈E |yv − yw| if

η

σ= ω

(√max

C∈C,|C|≤n/2|C| log n

).

Hence, if maxC∈C,|C|≤n/2 |C| is large then the edgethresholding statistic may be dominated by the SSS,because the bound in Corollary 8 is always smallerthan

√n.

4 Specific Graph Models

In this section we demonstrate the power and flexi-bility of Theorem 6 by analyzing in detail the perfor-mance of the spectral scan statistic over three graphtopologies: balanced binary trees, the 2 dimensionallattice and the Kronecker graphs (see [26, 25]).

4.1 Balanced Binary Trees

We begin the analysis of the spectral scan statisticby applying it to the balanced binary tree (BBT) ofdepth `. The class of signals that we will consider haveclusters of constant signal which are subtrees of size atleast cnα for 0 < c ≤ 1/2, 0 < α ≤ 1. Hence, the cutsize of the signals are 1 and ρ = [cnα(1− cnα−1)]−1.

Corollary 11. Let G be the balanced binary tree withn vertices, and ρ = n[cnα(n− cnα)]−1.(a) The spectral scan statistic can asymptotically dis-tinguish H0 from H1 if the SNR satisfies

η

σ= ω(n

1−α2 log n).

(b) H0 and H1 are asymptotically indistinguishable if

η

σ= o(n

1−α4 ).

The lower bound in part (b) is a direct result of Theo-rem 5 (b). This result shows that when α is near to 1then there is little gap between the upper bound of theSSS and the lower bound. To illustrate our claim, wesimulate the probability of correct discovery of change-points (rejecting H0 when the truth is H1) versus theprobability of false alarm (falsely rejecting H0). Wecompare the following estimators: the energy statis-tic, edge differencing, the SSS, and the unconstrainedGLRT. The unconstrained GLRT is formed by choos-ing the cluster C without the constraint C ∈ C, whichis formed by merely ordering the elements of y andgreedily adding the components to C until the RHSof (7) is maximized. These are given for the four es-timators in Figure 1 and for the SSS as n = 2`+1 − 1increases. In these simulations a subtree at level 2 (ofsize n/4) was chosen as C, the gap-to-noise ratio isfixed at δ/σ = 0.8, and ρ = 4/n. We see that evenin the low n regime, exploiting the graph structure isessential to improve the power of testing H0 againstH1. As n increases with δ/σ fixed the performance ofthe SSS dramatically increases.

4.2 Lattice

We will analyze the performance guarantees of the SSSover the 2-dimensional lattice graph with p verticesalong each dimension (n = p2).

Corollary 12. Let G be the p × p square lattice(n = p2), and let ρ = Cn−(1−α)/2 for α ∈ [0, 1).(a) The spectral scan statistic can asymptotically dis-tinguish H0 from H1 if the SNR satisfies

η

σ= ω(n

1+α4

√log n)

(b) H0 and H1 are asymptotically indistinguishable ifthe SNR is weaker than

η

σ= o(n

α4 )

The proof of (a) is in the appendix. Unfortunately, theupper bound in (a) is larger than that provided for theenergy statistic in Theorem 9. Our experiments (Fig-ure 1) suggest though that these upper bounds can begreatly improved. The lower bound, Corollary 12 (b),holds because we can form C′ of Theorem 5 (b) fromdisjoint squares of size a constant multiple of n1−α

making |C′| � nα. We demonstrate the improvementof the SSS over competing tests in Figure 1. In thesesimulations a

√n/2 ×

√n/2 square was chosen to be

C with ρ = 4/√n. Despite the weaker guarantee in

Corollary 12 the SSS demonstrates the importance ofexploiting the graph structure.

Page 7: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

551

James Sharpnack, Alessandro Rinaldo, Aarti Singh

Figure 1: Above: the simulated probability of correct discovery (power) against false alarm (size) of the SSS compared tothe energy detector, edge thresholding and the unconstrained GLRT of the BBT (left), Lattice (middle), and Kroneckergraph (right). Below: the performance of the SSS as n increases.

4.3 Kronecker Graphs

Much of the research in complex networks has focusedon observing statistical phenomena that are commonacross many data sources. Most notably, the degreedistribution of real world graphs obey a power law[11] and networks are often found to have small di-ameter [29]. A class of graphs that satisfy these prop-erties, while providing a simple modeling platform, arethe Kronecker graphs (see [26, 25]). Let H1 and H2

be graphs on p vertices with Laplacians L1,L2 andedge sets E1, E2 respectively. The Kronecker prod-uct, H1 ⊗H2, is the graph over vertices [p]× [p] suchthat there is an edge ((i1, i2), (j1, j2)) if i1 = j1 and(i2, j2) ∈ E2 or i2 = j2 and (i1, j1) ∈ E1. We willconstruct graphs that have a multi-scale topology us-ing the Kronecker product. Let the multiplication of agraph by a scalar indicate that we multiply each edgeweight by that scalar. First letH be a connected graphwith p vertices. Then the graph G for ` > 0 levels isdefined as

1

p`−1H ⊗ 1

p`−2H ⊗ ...⊗ 1

pH ⊗H

The choice of multipliers ensures that it is easier tomake cuts at the more coarse scale. Notice that all ofthe previous results have held for weighted graphs.

Corollary 13. Let G be the Kronecker product graphdescribed above with n = p` vertices, and consideronly signals with cuts within the k coarsest scale (ρ ∝p2k−`−1).(a) The spectral scan statistic can asymptotically dis-tinguish H0 from H1 if the SNR satisfies

η

σ= ω(p2(`+ 2)n(2k+1)/`)

(b) H0 and H1 are asymptotically indistinguishable if

η

σ= o(n

k4` /√p)

The proof and an explanation of ρ is in the appendix.Again, we demonstrate the improvement of the SSSover competing tests in Figure 1. For these simulationsthe base graph H was chosen to be two triangles (K3)connected by a single edge (p = 6). At the coarsestscale one of the K3 subgraphs was chosen to be C withρ = 4/n.

5 Discussion

We studied the problem of tractably detecting change-points in networks under Gaussian noise. To this endwe developed the spectral scan statistic as a compu-tationally feasible alternative to the generalized likeli-hood ratio test. We completely characterized the per-formance of the SSS for any graph in terms of the spec-trum of the combinatorial Laplacian. For comparisonpurposes, we developed theoretical guarantees for twosimple estimators. We applied the main result to threegraph models: binary balanced trees, the lattice andKronecker graph. We see that not only is it statis-tically suboptimal to ignore graph structure, but forcoarse cuts in the balanced binary tree and the Kro-necker graph the SSS gives near optimal performance.This claim is backed by both simulation and theory.

Acknowledgements

This research is supported in part by AFOSR undergrant FA9550-10-1-0382.

Page 8: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

552

Changepoint Detection over Graphs with the Spectral Scan Statistic

References

[1] L. Addario-Berry, N. Broutin, L. Devroye, andG. Lugosi. On combinatorial testing problems.The Annals of Statistics, 38(5):3063–3092, 2010.

[2] E. Arias-Castro, E. Candes, and A. Durand. De-tection of an anomalous cluster in a network. TheAnnals of Statistics, 39(1):278–304, 2011.

[3] E. Arias-Castro, E. Candes, H. Helgason, andO. Zeitouni. Searching for a trail of evidence in amaze. The Annals of Statistics, 36(4):1726–1757,2008.

[4] E. Arias-Castro, D. Donoho, and X. Huo. Near-optimal detection of geometric objects by fastmultiscale methods. IEEE Trans. Inform. The-ory, 51(7):2402–2425, 2005.

[5] S. Arora, S. Rao, and U. Vazirani. Expanderflows, geometric embeddings and graph partition-ing. Journal of the ACM (JACM), 56(2):5, 2009.

[6] B. Baygun and A. O. Hero. Optimal simultane-ous detection and estimation under a false alarmconstraint. Signal Processing, IEEE Transactionson, 41(3):688–703, 1995.

[7] C. Borell. The brunn-minkowski inequalityin gauss space. Inventiones Mathematicae,30(2):207–216, 1975.

[8] W. N. Burnette. western blotting: electrophoretictransfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocelluloseand radiographic detection with antibody and ra-dioiodinated protein a. Analytical biochemistry,112(2):195–203, 1981.

[9] F. Chung. Discrete isoperimetric inequalities.Surveys in Differential Geometry IX, Interna-tional Press, pages 53–82, 2004.

[10] B. Cirelson, I. Ibragimov, and V. Sudakov. Normsof gaussian sample functions. In Proceedings ofthe Third JapanUSSR Symposium on ProbabilityTheory, pages 20–41. Springer, 1976.

[11] M. Faloutsos, P. Faloutsos, and C. Faloutsos. Onpower-law relationships of the internet topology.In ACM SIGCOMM Computer CommunicationReview, volume 29, pages 251–262. ACM, 1999.

[12] M. Fiedler. Eigenvectors of acyclic matrices.Czechoslovak Mathematical Journal, 25(4):607–618, 1975.

[13] L. Fillatre. Asymptotically uniformly minimaxdetection and isolation in network monitoring. toappear in Signal Processing, IEEE Transactionson.

[14] L. Fillatre and I. Nikiforov. Non-bayesian detec-tion and detectability of anomalies from a few

noisy tomographic projections. Signal Processing,IEEE Transactions on, 55(2):401–413, 2007.

[15] M. Fouladirad, L. Freitag, and I. Nikiforov. Opti-mal fault detection with nuisance parameters anda general covariance matrix. International Jour-nal of Adaptive Control and Signal Processing,22(5):431–439, 2008.

[16] M. Fouladirad and I. Nikiforov. Optimal statisti-cal fault detection with nuisance parameters. Au-tomatica, 41(7):1157–1171, 2005.

[17] C. Godsil, G. Royle, and C. Godsil. Algebraicgraph theory, volume 8. Springer New York, 2001.

[18] L. Hagen and A. Kahng. New spectral meth-ods for ratio cut partitioning and clustering.Computer-Aided Design of Integrated Circuitsand Systems, IEEE Transactions on, 11(9):1074–1085, 1992.

[19] Y. Ingster and I. Suslina. Nonparametricgoodness-of-fit testing under Gaussian models,volume 169. Springer Verlag, 2003.

[20] L. Jacob, P. Neuvial, and S. Dudoit. Gains inpower from structured two-sample tests of meanson graphs. Arxiv preprint arXiv:1009.5173, 2010.

[21] I. Koutis, A. Levin, and R. Peng. Faster spectralsparsification and numerical algorithms for sddmatrices. arXiv preprint arXiv:1209.5821, 2012.

[22] M. Ledoux. The concentration of measure phe-nomenon, volume 89. Amer Mathematical Soci-ety, 2001.

[23] E. Lehmann and J. Romano. Testing statisticalhypotheses. Springer Verlag, 2005.

[24] T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommod-ity flow problems with applications to approxi-mation algorithms. In Foundations of ComputerScience, 1988., 29th Annual Symposium on, pages422–431. IEEE, 1988.

[25] J. Leskovec, D. Chakrabarti, J. Kleinberg,C. Faloutsos, and Z. Ghahramani. Kroneckergraphs: An approach to modeling networks. TheJournal of Machine Learning Research, 11:985–1042, 2010.

[26] J. Leskovec and C. Faloutsos. Scalable modelingof real graphs using kronecker multiplication. InProceedings of the 24th international conferenceon Machine learning, pages 497–504. ACM, 2007.

[27] D. Matula and F. Shahrokhi. Sparsest cuts andbottlenecks in graphs. Discrete Applied Mathe-matics, 27(1):113–123, 1990.

[28] R. Merris. Laplacian graph eigenvectors. Lin-ear algebra and its applications, 278(1):221–236,1998.

Page 9: Changepoint Detection over Graphs with the Spectral Scan Statisticproceedings.mlr.press/v31/sharpnack13b.pdf · 2017-05-29 · 546 Changepoint Detection over Graphs with the Spectral

553

James Sharpnack, Alessandro Rinaldo, Aarti Singh

[29] S. Milgram. The small world problem. Psychologytoday, 2(1):60–67, 1967.

[30] J. Molitierno, M. Neumann, and . SHADER.Tight bounds on the algebraic connectivity of abalanced binary tree. Electronic Journal of Lin-ear Algebra, 6:62–71, 2000.

[31] A. Ostfeld, J. G. Uber, E. Salomons, J. W.Berry, W. E. Hart, C. A. Phillips, J.-P. Wat-son, G. Dorini, P. Jonkergouw, Z. Kapelan, et al.The battle of the water sensor networks (bwsn):A design challenge for engineers and algorithms.Journal of Water Resources Planning and Man-agement, 134(6):556–568, 2008.

[32] O. Rojo. The spectrum of the laplacian matrixof a balanced binary tree. Linear algebra and itsapplications, 349(1):203–219, 2002.

[33] O. Rojo and R. Soto. The spectra of the adjacencymatrix and laplacian matrix for some balancedtrees. Linear algebra and its applications, 403:97–117, 2005.

[34] L. L. Scharf and B. Friedlander. Matched sub-space detectors. Signal Processing, IEEE Trans-actions on, 42(8):2146–2157, 1994.

[35] J. Sharpnack, A. Rinaldo, and A. Singh. Spar-sistency of the edge lasso over graphs. AIStats(JMLR WCP), 22:1028–1036, 2012.

[36] J. Sharpnack and A. Singh. Identifying graph-structured activation patterns in networks. InProceedings of Neural Information ProcessingSystems, NIPS, 2010.

[37] D. Shmoys. Cut problems and their applicationto divide-and-conquer. Approximation algorithmsfor NP-hard problems, pages 192–235, 1997.

[38] M. Talagrand. The generic chaining. Springer,2005.

[39] A. Wald. Tests of statistical hypotheses concern-ing several parameters when the number of obser-vations is large. Transactions of American Math-ematical Society, 54:426–482, 1943.

.


Recommended