+ All Categories
Home > Documents > Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for...

Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for...

Date post: 12-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
48
Statistics for dynamic network analysis Dependence between point processes Combining p-values Statistical questions related to the analysis of dynamic network data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nicholas A Heard (Imperial College London) Daniel J Lawson (University of Bristol) Axel Gandy (Imperial College London) 5th November 2014 P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data
Transcript
Page 1: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Statistical questions related to the analysis of dynamicnetwork data

Patrick Rubin-Delanchy

University of Bristol & Heilbronn Institute for Mathematical Research

Joint work with

Nicholas A Heard (Imperial College London)Daniel J Lawson (University of Bristol)Axel Gandy (Imperial College London)

5th November 2014

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 2: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Dynamic network data

Data recording interactions between entities through time. Examples:

social networks: Twitter, Facebook, ...

email: e.g. the Enron email corpus (made public by the Federal Energy RegulatoryCommission), is a store of emails sent and received by Enron top-executivesleading up to the scandal

collaboration networks: academic, cosponsorship of legislation [3], music

recommender systems (e.g. Netflix challenge, music recommendation)

computer networks (e.g. LANL, Imperial College London)

biological networks (e.g. neural networks)

Example in music (with thanks to Theo Dickson):MusicBrainz.org is an open-source user created database of song details. 40GB,850,000 different artists, 13.5 million recordings and 722,667 collaborations.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 3: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Cyber-security

United Nations International Telecommunications Union announces [1]:

almost 3 billion internet users by end 2014

mobile-cellular subscriptions to reach almost 7.6 billion

UK figures:

1 in 5 pounds earned on the internet

81% large corporations 60% small businesses reported a cyber-breach in UK

Average cost is 600K – 1.2M pounds per breach [2]

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 4: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

A typical attack pattern

Taken from [5]:

A. Opportunistic infection

B. Network traversal

C. Data exfiltration

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 5: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Key statistical themes in the analysis of network data

A. Point process networks

B. Information flow

C. Network anomaly detection

D. Combining p-values

E. Big data

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 6: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Outline of talk

In this talk we will specifically cover:

A. Detecting dependence between two point processes. Applications:diagnose information flow in a communication network; detect tunnellingin a computer network; some biological applications (e.g. neuronalspikes, ecology, molecular biology); sport; finance, ... (2/3)

B. Combining Monte Carlo p-values. Applications: anomaly detection,change detection, feature discovery, ... (1/3)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 7: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Testing for dependence

On networks, important forms of dependence include:

A ‘causal’ relationship: do events by A trigger events by B?

Correlation: do events by B occur surprisingly close to events by A?

Anti-correlation: do A and B alternate?

Inbition: does A inhibit B?

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 8: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 9: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 10: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 0(t)

Null hypothesis: B is non-homogeneous Poisson process with intensity r0(t)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 11: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 0(t)

OOO

Let b1=volume until first response time

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 12: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 0(t)

OOO

Let b2=volume until second response time

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 13: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 0(t)

OOO

Let b3=volume until third response time

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 14: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Lemma

Under H0, the volumes b1, b2, . . . are the event times of a homogeneous Poissonprocess.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 15: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), f (x) ∝ exp(−βx) anda(t) is closest A event to t.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 16: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

OOO

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), f (x) ∝ exp(−βx) anda(t) is closest A event to t.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 17: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

OOO

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), f (x) ∝ exp(−βx) anda(t) is closest A event to t

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 18: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

OOO

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), f (x) ∝ exp(−βx) anda(t) is closest A event to t

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 19: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

B~f(

t~)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), f (x) ∝ exp(−βx) anda(t) is closest A event to t

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 20: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), where f is a step-function with one step

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 21: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

B~f(

t~)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), where f is a step-function with one step

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 22: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

B~f(

t~)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), where f a decreasingfunction. (f is maximum likelihood)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 23: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

BA

r 1(t)

Alternative hypothesis: B has intensity r1(t) = r0(t)f (t − a(t)), where f a decreasingfunction. (r1 is maximum likelihood)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 24: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Theorem

Under H0, f (0) is distributed as n/(UT ) where U is a uniform random variable over[0, 1] and T is the length of the observation period.

p-value for A causing B: n/{T f (0)}

In previous example: p ≈ 0.15 (not significant)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 25: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Daily emailing behaviour of an individual in the Enron dataset:

Time

Day

365

300

240

180

120

601

0 8 16 24

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 26: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Bayesian intensity estimation for an individual of interest (change-point model for dailybinned data, and density estimation for within day behaviour)

0 5 10 15 20 25 30

Time

λ(t)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 27: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

‘Information flow’ through JD in the Enron data. Over 2001, 12 individuals contactand are contacted back by JD. Full black p < 0.0001, half-black means p ≤ 0.05,white means not significant.

1 2 3 4 5 6 7 8 9 10 11 12

12

34

56

78

910

1112

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 28: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

7→ j 6 j → 7: only one email on each edge, they are sent about one month apart,and appear to be unrelated judging by their subject-lines. The p-value iscomputed to be about .2

10→ j 6 j → 10: 14 emails from 10 to i and 9 from i to 10, the most coincidentalemail times falling in July, about 3 1

2 hours from each other. The p-valueis 0.07. If time is not transformed the raw p-value is 0.0035, suggesting asignificant interaction. However, upon inspecting the subject-lines of10→ i and i → 10 it is not in fact obvious that there is consistentreciprocation. For example, the subject-lines of the two most coincidentalemails are “FW: Enron Complaint” and “Dunn hearing link?”, which arenot obviously related.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 29: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Based on the subject-lines of the most coincidental email times, the three black circlesare correct detections.

6→ i i → 5 The p-value is around 2 · 10−5. The most coincidental email times aretwo hours and twenty minutes apart, and have the same subject-line,“Re: FW: SoCalGas Capacity”.

9→ i i → 3 The p-value is around 6 · 10−8. The most coincidental email times are50 minutes apart, have the subject-lines “California Update–LegislativePush Underway” and “Re: California Update–Legislative PushUnderway”.

12→ i i → 11 The p-value is around 2 · 10−5. The most coincidental email timesare 30 minutes apart and have the subject-lines “RE: CA Unbundling”and “CA Unbundling”

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 30: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Other tests of H0 : f ∝ 1:

Likelihood ratio test based on H1 : f step-function, params λ1 ≥ λ2, τ “time-out”

Linear-time recursive solution for small N (catastrophic cancellation for N ≈ 100)Quadratic-time recursive solution for larger NAsymptotically is a weighted upper K-S test

Or: treat reordered events bi as if they were p-values (Fisher’s Method):

−2∑

log(bi ) ∼ χ22N

All implemented in the R package ‘mppa’ (available on CRAN).

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 31: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Extensions

1 unknown base rate (see next week’s talk)

2 other forms of dependence (correlation, inhibition, ...)

3 chains of interaction (e.g. A causes B causes C )

4 single response model (only a few causal events)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 32: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Combining p-values

We often generate many test statistics and want to determine if there is an overalleffect. Applications in cyber-security:

Detecting traffic forwarding by timing correlations (a test for each in/out pair)

Change detection in Netflow (e.g. testing port usage, connectivity histogram, dataflow, ...)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 33: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

00:00 01:00 02:00 03:00 04:00 05:00

Time (minutes:seconds)

Ser

ver

IP a

ddre

ss:P

ort

10:5

310

:771

11:1

4811

:535

31:

802:

803:

804:

443

5:44

36:

443

7:20

498:

389

9:53

9:77

1

Figure : Outgoing traffic from a client computer over 5 minutes, split by server IP: port.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 34: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

The p-value

The p-value is a measure of the significance of an effect. Framework:

1 A null hypothesis, denoted H0, under which there is no effect

2 An alternative hypothesis, H1, under which the effect is present

3 A test statistic T for the effect, which would tend to be larger under H1 thanunder H0

The p-value isp = P[T ≥ t],

where P is the probability measure under the null hypothesis and t is the observedstatistic.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 35: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Example: permutation test

Let a1, ..., ak and b1, ..., bl denote two groups of data.

1 H0: all elements are exchangeable

2 H1: elements are only exchangeable within groups

3 t = |a− b|

Then p = P[T ≥ t], where T = |A− B| and A, B are formed from randompermutations of the indices.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 36: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Monte Carlo p-value

Often p cannot be computed exactly. Instead, it is estimated via:

p = 1/N∑

I(T ∗i ≥ t),

where T ∗i is the ith simulated replicate of T under the null hypothesis.

Previous example (permutation test):

1 randomly permute the indices a1, . . . , ak and b1, ..., bl to form A∗1, . . . ,A∗k and

B∗1 , . . . ,B∗l

2 Compute T ∗ = |A∗ − B∗|

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 37: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Combining p-values

Suppose we generate m ordered p-values p1 ≤ . . . ≤ pm.

Under H0 all of the p-values are independent and uniformly distributed (aside fromordering).

Some ways of combining the p-values into one, overall score:

p1 ∼ Beta(1,m) (the minimum p-value)

−2∑

log(pi ) ∼ χ22m (Fisher’s method)

min(pi m/i) ∼ Uniform[0, 1] (Simes’s test)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 38: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Some difficulties

Why it’s a bit harder than that:

1 ‘Needle-in-a-haystack’ problems (work with NA Heard)

2 Discrete p-values (work with NA Heard)

3 Bayesian p-values (work with DJ Lawson, my talk next week)

4 Monte Carlo p-values (work with A Gandy)

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 39: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Combining Monte Carlo p-values

Naıve approach: do N simulations for each.

Under H0 the number p-values estimated to be zero is a Binomial variable with successprobability 1/(N + 1) and size m. For N � m there is a high probability of calculatinga p-value of 0!Let p = [p1, . . . , pm] and f be a function f : [0, 1]m → R that combines the p-values.

Lemma

If there exists i ∈ 1, . . . ,m such that

minj 6=i

supx∈[0,1]m

|∇i f (x)|/|∇j f (x)| =∞,

then asymptotically

suppi∈(0,1)m

var(f (p)naıve)

var(f (p)opt)= m.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 40: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Our algorithm

Our algorithm looks for a more clever allocation of the simulation effort

During simulation, it adaptively identifies ‘which p-values need most work’

Our algorithm appears to reach the optimal asymptotic variance

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 41: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Example: change detection in Netflow

Traffic from one computer on Imperial College’s network (with thanks to AndyThomas) over a day

The data has an artificial changepoint where the user knowingly changed hisbehaviour

We split the computer’s traffic by edge (the other IP address), bin the data perhour, and throw away any edge with less than three bins

This results in approximately 100 time series

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 42: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Example: change detection in Netflow cont’d

1 For any proposed changepoint, count the absolute difference between the numberof flows before and after the changepoint on each edge, resulting in a statistic ti .

2 For each edge:1 Randomly permute the binned data2 Compute same absolute difference between the number of flows, resulting in a

simulated statistic T ∗i .

3 Get a running estimate of the changepoint p-value for that edge, pi

3 Use our algorithm to combine the p-values using Fisher’s score

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 43: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

1e−

071e

−05

1e−

031e

−01

Time

p−va

lue

0 6 12 18 24

Figure : Overall p-value in favour of a changepoint

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 44: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Time

54

32

1

0 6 12 18 24

Figure : Most significant edges. Samples taken: 15039, 14767, 11598, 7985, 6931

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 45: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Time

54

32

1

0 6 12 18 24

Figure : Least significant edges. Samples taken: 55, 56, 58, 50, 61

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 46: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Extensions

1 explicity tackle the search and multiple testing problem (my work here with peopleat ACS)

2 develop methodology for non-smooth functions (e.g. Simes’s test)

3 Handle correlated data

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 47: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

Conclusion

1 Probabilistic and statistical methods for analysing point process networks arebecoming increasingly important.

2 We discussed testing for dependence without any regard to computational issues(e.g. the search through the graph) or the use of marks on the points (e.g. subjectlines). Of course these (arguably more difficult) points still need to be addressed.

3 Monte Carlo tests are a very promising approach for detecting features/anomalieson networks. Much more work is needed on making them algorithmically feasibleon Big data

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data

Page 48: Statistical questions related to the analysis of dynamic ...pr12244/lanl_1.pdf · Statistics for dynamic network analysis Dependence between point processes Combining p-values B A

Statistics for dynamic network analysisDependence between point processes

Combining p-values

ITU releases 2014 ICT figures.

https://www.gov.uk/government/uploads/system/uploads/attachment data/file/60942/THe-COST-OF-CYBER-CRIME-SUMMARY-FINAL.pdf.

Accessed: 2014-06-05.

Policy: keeping the UK safe in cyber space.

https://www.gov.uk/government/policies/keeping-the-uk-safe-in-cyberspace.Accessed: 2014-06-05.

James H Fowler.

Connecting the congress: A study of cosponsorship networks.Political Analysis, 14(4):456–487, 2006.

Axel Gandy and Patrick Rubin-Delanchy.

An algorithm to compute the power of Monte Carlo tests with guaranteed precision.The Annals of Statistics, 41(1):125–142, 2013.

Joshua Neil, Curtis Hash, Alexander Brugh, Mike Fisk, and Curtis B Storlie.

Scan statistics for the online detection of locally anomalous subgraphs.Technometrics, 55(4):403–414, 2013.

Patrick Rubin-Delanchy and Nicholas A Heard.

A test for dependence between two point processes on the real line.arXiv preprint arXiv:1408.3845, 2014.

P. Rubin-Delanchy Statistical questions related to the analysis of dynamic network data


Recommended