Structural Analysis in Large Networks Observations and Applications Mary McGlohon Committee Christos...

Post on 28-Dec-2015

215 views 0 download

Tags:

transcript

Structural Analysis in Large NetworksObservations and ApplicationsMary McGlohon

CommitteeChristos Faloutsos, co-chairAlan Montgomery, co-chairGeoffrey GordonDavid Jensen, University of Massachusetts, Amherst

Motivation•Network (a.k.a. graph, relational, social

network) data has become ubiquitous. We want to know:▫How do networks form and structure

themselves?▫How does information propagate through

networks?▫How do sub-communities form?

2Facebook

Computer networksIMDB actor-movie

1 1 2 2

3 3

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

3

“Outline” for thesis

1 1

2 2

3 3

Motivation: Topology

•How do these network strucures form?▫Example: identify topological properties

common to many different types of graphs (citations, friendships, etc.)

▫Developing models of these properties allows for forecasting.

4

vs

1 1

Graph topology

Motivation: Cascades

•Once the networks form, how does information propagate through the graph?▫Example: Extract, analyze, and model

cascades.

5

Cascade

2 2

Motivation: Community

•How do we compare communities, or sub-networks?▫Example: For a set of online groups

(Usenet), which ones continue to thrive over time?

6

2004

2008?

3 3

Thesis statement

•We propose to ▫investigate how interactions in graphs

occur, how these interactions lead to diffusion and community behavior, and

▫to model these behaviors and apply these findings to real-world problems.

7

1 1 2 2 3 3

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

8

investigate how interactions in graphs occur,

how these interactions lead to diffusion

and community behavior, and

to model these behaviors and apply these findings to real-world problems.

We propose to…

Impact•Understanding the relations found in

networks has many applications, such as:

• Fraud/anomaly detection▫Given typical behavior and information about

nodes/edges, how “suspicious” is a node or group of nodes?

•Ad personalization/recommendation systems▫Given some information about an individual and

their friends, which ads to display?•Resource allocation▫Given typical patterns of network growth, how

can we allocate resources (hardware, advertising budget, etc.)?

9

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

KDD08ICDM08

ICWSM07

ICWSM09-2*ICWSM09-3*

KDD09*

ICWSM09-1*

10

Completed Work

*- to appear

SDM07

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

11

Proposed Work

P1a: How do cascades compare across network structures?

P2: Can we predict success/failure of groups?

P1b: Can we use cascades to model product adoption?

The rest of the talk

•Motivation and thesis statement•Completed work•Proposed work•Conclusions and impact•Audience participation!

12

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

What patterns are common to networks?

13

Completed Work

Topological Observations

•Diameter over time

14

•Connected components

•Edge weights

(Kevin Bacon)

Topological Observations: Data

•Analyze unipartite and bipartite networks•Networks are evolving over time•Networks may be weighted

15

n1

n2

n3

n4

n5

n6

n7

310

1.2

8.3

2

6

1

n1

n2

n3

n4

m1

m2

m3

-Repeated edges-Edge weights

10

1.2

8.3

2

6

1

3UnipartiteCitations,Blogs,Router traffic

BipartiteIMDB Actor-Movie,Campaign contributions…

Topological Observations: Gelling Point•When does a graph begin displaying

expected patterns, such as the giant connected component? How can we tell when this happens?

16

Topological Observations: Gelling Point•Observation: Most real graphs display a

gelling point, where the graph begins to come together and the giant connected component forms. After that point, they exhibit typical behavior.

17Time

Diameter

IMDB

t=1914

Topological Observations: NLCCs• In graphs a giant

connected component emerges.

•We look at sizes of the next-largest connected components (NLCCs)

•After gelling point, do they continue to grow? Do they shrink?

18

Topological Observations: NLCCs•Observation: After the gelling point, the

giant connected component takes off, but next-largest connected components remain constant or oscillate.

19

Time

IMDB

Size of next-

largest connected componen

ts

t=1914) ia2nd connected component

3rd connected component

Topological Observations: Weights•How are edges in a graph repeated, or

otherwise weighted?•As the number of edges increases, does

the total edge weight grow linearly?

20

Topological Observations: Weights•Observation: Weight additions follow a

power law with respect to the number of edges:

W(t) ∝ E(t)w

▫W(t): total weight of graph at t▫E(t): total edges of graph at t▫w is PL exponent (w>1)

•Many other weighted laws: see [KDD08, ICDM08]

21log(Edges)

log(Weights)

Orgs-Candidates

slope=1.3

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

What patterns are common to networks?

22

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

23

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Can we develop generative models?

24

Completed Work

Topological Models: “Butterfly”

•Goals are to generate:▫Constant/oscillating NLCC’s▫Densification power law [Leskovec+05]

▫Shrinking diameter (after “gelling point”)▫Power-law degree distribution▫Emergent, local, intuitive behavior

25

Topological Models: “Butterfly”

•Main idea: Uses 3 parameters▫“Curiosity”: how much to explore local

network (~U(0,1), creates power-law degree distribution)

▫ “Flyout”: how many local networks to explore (global, joins components)

▫“Friendliness”: how often to connect (global, allows new components)

•Details: see [KDD08]

26

Topological Models: “Butterfly”

27Log(degree)

Log(count)slope=-2

Power-law degree distributionNodes

Diam-eter

Shrinking diameter

log(nodes)

log(edges) slope=1.1

7

Densification

Nodes

NLCCsize

Oscillating NLCCs

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Can we develop generative models?

28

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

ButterflyRTMOddball

29

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

What are patterns of cascades in networks?

ButterflyRTM Oddball

30

Completed Work

Cascade Observations: Data

•Gathered from August-September 2005*•Used set of 44,362 blogs, traced cascades•2.4 million posts•245,404 blog-to-blog links

31Time [1 day]

Nu

mb

er

of

post

s

Jul 4

Aug 1Sep 29

Cascade Observations: Prelims

32

Blogosphere

B1 B2

B4B3

Cascades

d

e

b c

e

a

a

b c

de

“Star” “Chain”

•How quickly does a link to a post occur?•What size do cascades typically reach?•What are typical shapes– how often are

“stars” and “chains” occurring?

33

Temporal Observations

• How quickly does a link to a post occur?• Does popularity decay at a constant rate?

• With an exponential (“half life”)?

Linear-linear scale Log-linear scale Log-log scale

Cascade Observations: Link Popularity•Observation: The probability that a post

written at time tp acquires a link at time tp + Δ is:

p(tp+Δ) ∝ Δ-1.5

• Similar to [Vazquez+06]

34log(days after post)

log(

# in-l

inks

)slope=-1.5

(Linear-linear scale)

Cascade Observations: Cascade Size•Q: What size distribution do cascades follow?

Are large cascades frequent?

•Observation: The probability of observing a cascade of n blog posts follows a Zipf distribution:

p(n) ∝ n-2

35log(Cascade size) (# of nodes)

log(C

oun

t)

slope=-2d

e

b c

e

a

log(Size) of chain (# nodes)

log(C

oun

t)

a=-8.5

log(Size) of star (# nodes)

log(C

oun

t) a=-3.1

Cascade Observations: Cascade Size•Q: What is the distribution of particular cascade

shapes?•Observation: Stars and chains in blog cascades

also follow a power law, with different exponents

(star -3.1, chain -8.5).

36

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

What are patterns of cascades in networks?

ButterflyRTM Oddball

37

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades lawsCascades as

features

ButterflyRTMOddball

38

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades lawsCascades as

features

Can we develop predictive models for cascades?

ButterflyRTMOddball

39

Completed Work

Cascade Models: CGM

•Cascade Generation Model•Overview: Produce realistic cascades

through an emergent “viral” model•Details: See [SDM07]

40

Cascade Models: CGM

41

Most frequent cascades

model

data

log(Cascade size) (# nodes)

log(C

ou

nt

)

log(C

oun

t)

log(Star size)

log(C

oun

t)

log(Chain size)

DataModel

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades lawsCascades as

features

Can we develop predictive models for cascades?

ButterflyRTMOddball

42

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades lawsCascades as

features

Cascade generation model

ZC model

Butterfly RTMOddball

43

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades laws Cascades as

features

How can we compare communities?

Cascade generation model

ZC model

Butterfly RTM Oddball

44

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades laws Cascades as

features

Political Usenet study

Cascade generation model

ZC model

Butterfly RTM Oddball

45

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades laws Cascades as

features

Political Usenet study

Can we detect anomalies?

Cascade generation model

ZC model

Butterfly RTM Oddball

46

Completed Work

Community Tools: SNARE

•Problem: Given a network and some domain knowledge about suspicious nodes (flags), determine which nodes are most risky.

•Data: Accounting transaction data. Nodes are accounts, edges are transactions between accounts.

47

Accounts Payable

Accounts Receivable

Revenue Accts

Community Tools: SNARE

•Example: “Channel stuffing”▫Some accounts overstated▫But other accounts also involved.▫Since many accounts are slightly affected, it

is easy to cover up activity.

48

Accounts Payable

Accounts Receivable

Revenue Accts

Very risky

Not risky

Community Tools: SNARE

49

•Social Network Analytic Risk Evaluation▫Use domain knowledge to flag certain nodes.▫Assume homophily between nodes (“guilt by

association”)▫Then, using initial risk as initial node potentials,

use belief propagation (message passing between nodes) to determine end risk scores.

Community Tools: SNARE

50

•Belief Propagation▫Flags are node potentials, or “intial risk scores”▫All nodes send messages back and forth with

beliefs▫Upon convergence, end result will reflect

“riskiest” nodes.

After

Revenue Accts

Before

Accounts Payable

Accounts Receivable

Community Tools: SNARE

51

•Produces improvement over simply using flags▫Up to 6.5 lift▫Improvement especially for low false positive

rate

False positive rate

True positive

rate

Results for accounts data (ROC Curve)Ideal

SNARE

Baseline (flags only)

Community Tools: SNARE

52

•Accurate- Produces large improvement over simply using flags

•Flexible- Can be applied to other domains•Scalable- One iteration BP runs in linear time

(# edges)•Robust- Works on large range of parameters

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades laws Cascades as

features

Political Usenet study

Can we detect anomalies?

Cascade generation model

ZC model

Butterfly RTM Oddball

53

Completed Work

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

Gelling point, CC’s

Weighted laws

Cascades laws Cascades as

features

Political Usenet study

SNARE

Cascade generation model

ZC model

Butterfly RTM Oddball

54

Completed Work

The rest of the talk

•Motivation and thesis statement•Completed work•Proposed work•Conclusions and impact•Audience participation!

55

Proposed Work

•2 main problems:▫P1: Cascades and product adoption

How do cascades vary according to network structure?

Can we use cascades to model product adoption?

▫P2: Predicting success/failure of online groups

56

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

57

P1a: How do cascades compare across network structures?

P2: Can we predict success/failure of groups?

P1b: Can we use cascades to model product adoption?

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

58

P1a: How do cascades compare across network structures?

P2: Can we predict success/failure of groups?

P1b: Can we use cascades to model product adoption?

• In different networks, how does starting point of an epidemic affect the epidemic size?

•What modifications on current model changes the cascades (weights, self-infection)?

•Can we reverse-engineer network properties based on observed cascades?

P1a: Cascades & Network Structure

59

Many hubs?Large diameter?

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

60

P1a: How do cascades compare across network structures?

P2: Can we predict success/failure of groups?

P1b: Can we use cascades to model product adoption?

P1b: Cascades & Product Adoption

•Examine adoption of Caller Ringback Tones (CRBT)▫User buys ringtone▫Friend calls user, hears CRBT

•Phone call data▫Nodes: User ID, DOB, salutation (Mr/Ms),

date of joining, data plan▫Call Edges: src/dest ID, call time, duration▫SMS Edges: src/dest ID, time▫CRBT purchases: purchase date, song

name, cost

61

P1b: Cascades & Product Adoption•Can we fit the Bass Model for different

CRBT’s?

62

# adopters today

# adopters yesterday

# potential adopters

“mass marketing”

“word of mouth”

P1b: Cascades & Product Adoption

•Are some CRBT’s more “viral” than others? Does the footprint follow a skewed distribution?

•How long after purchase is a CRBT infective?

63Number of downloads (per song)

Survival Functio

nP(X>x)

P1b: Cascades & Product Adoption

•How does the weight of a link, homophily, or other factors affect the likelihood of transmission?

•Can we explicitly test whether a purchase is a result of basic similarity of neighbors or a result of “viral” propagation?

•How can we build and verify a model for this propagation?

64

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

65

P1a: How do cascades compare across network structures?

P2: Can we predict success/failure of groups?

P1b: Can we use cascades to model product adoption?

P2: Success & Failure of Online Groups•Use data over 4 years from nearly 200

newsgroups. (Political Usenet)•Many discussion groups stop posting by

the third year.

•Why?

66

P2: Success & Failure of Online Groups•P2 Questions:▫If structural network characteristics can be

traced to success or failure, which features are most predictive?

▫Can we test causality in the predictive characteristics?

67

Timeline

68

May 09

Jun ‘09

Sep ‘09

Nov ‘09

Mar ‘10

Aug ‘10

Jul ‘10

P1 preliminaries

Internship at Google

P1a: Cascades and network structureP1b: Cascades and product adoption

P2: Success/failure of online groupsComplete document

Defend

Related work▫ Topology:

Heavy-tailed degree distributions [Faloutsos+99] [Albert+02] [Kleinberg+99]

Shrinking diameter, densification [Leskovec+05] Random graphs model [Erdos+60] “Forest Fire” model [Leskovec+05] “Winners do not take all” model [Pennock+02]

▫ Cascades Recommendations: [Leskovec+06] Diffusion in blogs: [Adar+03] [Gruhl+04] [Kempe+03]

[Kumar+03] Marketing: Product adoption [Bass69], Word-of-mouth

[Godes+04] Virus propagation: Populations [Hethcote], Networks

[Boguna, Pastor-Satorras] [Charkabarti]▫ Communities and other applications

Securities fraud detection [Neville+05] [Fast+07] Author identification [Hill+04] Online group behavior [Backstrom+08]

69

Conclusions: Completed

•Demonstrated several properties common to networks in a wide range of domains.▫Oscillating sizes of next-largest connected

components▫Power laws for weighted graphs▫Butterfly model: generates properties

70

Conclusions: Completed

•Studied and modeled cascades in blogs▫Several power laws for cascade shapes and

size▫Cascade Generation Model

•Devised SNARE for anomaly detection for accounting data (lift factor up to 6.5)

71

Conclusions: Proposed

•P1a: Continue cascade studies across network structures

•P1b: Use cascades to model purchases in phone-call graph

•P2: Build predictive models for success and failure in online groups

72

ReferencesTopology[KDD08] M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted

Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD. Las Vegas, Nev., August 2008.

[ICDM08] L. Akoglu. M. McGlohon, and C. Faloutsos. RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs. ICDM. Pisa, Italy, Dec. 2008.

Cascades[SDM07] J. Leskovec, J, M. McGlohon, C. Faloutsos, N. Glance, and

M. Hurst. Patterns of Cascading Behavior in Large Blog Graphs. SDM. Minneapolis, Minn., April 2007.

[ICWSM07] M. McGlohon, J. Leskovec, C. Faloutsos, N. Glance, and M. Hurst. Finding patterns in blog shapes and blog evolution. ICWSM. Boulder, Colo., March 2007.

[ICWSM09-1] M. Goetz, J. Leskovec, M. McGlohon, and C. Faloutsos. Modeling Blog Dynamics. ICWSM. San Jose, Cali. May 2009.

73

References

Community

[KDD09] M. McGlohon, S. Bay, M. Anderle, D. Steier, and C. Faloutsos. SNARE: A Link Analytic System for Evaluating Fraud Risk. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIG-KDD). Paris, France. June 2009.

[ICWSM09-2] M. McGlohon and M. Hurst. Community Structure and Information Flow in Usenet: Improving analysis with a thread ownership model. International Conference on Weblogs and Social Media (ICWSM). San Jose, CA. May 2009.

[ICWSM09-3] M. McGlohon and M. Hurst. Considering the Sources: Comparing linking patterns in Usenet and blogs. International Conference on Weblogs and Social Media (ICWSM09). San Jose, CA. May 2009.

74

75

▫Support: ▫ PricewaterhouseCoopers▫ Microsoft Live Labs▫ NSF Graduate Research Fellowship▫ Yahoo! Key Technical Challenges Grant, Pennsylvania

Infrastrucutre Technology Alliance (PITA)▫ Hewlett-Packard▫ NSF Grants No. IIS- 0705359, IIS-0534205, and CNS-0721736,

0209107, SENSOR-0329549, EF-0331657, IIS-0326322▫ U.S. Department of Energy Lawrence Livermore National

Laboratory contract No.W-7405-ENG-48.

Audience participation!

76

77

Talk expansion pack

78

P1b: Other Cascade Data•Post data from corporate blogs▫Demographic data on bloggers (employee

ID, location, job description)▫Read data (timestamped)▫Write data (timestamped)

•CRBT adoption in general▫Perhaps people do not adopt particular

songs, but the CRBT mechanism•More public blog data (spinn3r)▫Also use edge information from

blogrolls/comments

79

P2: Potential features to examine•Posting behavior▫Which users are posting, how often are they

posting, and how skewed is the distribution?•Linking behavior▫How long are cascades (threads), in terms

of post and time?•Content▫Topics, keywords, sentence length, other

textual features, sentiment analysis

80

Unipartite Networks• Postnet: Posts in blogs,

hyperlinks between• Blognet: Aggregated Postnet,

repeated edges• Patent: Patent citations• NIPS: Academic citations• Arxiv: Academic citations• NetTraffic: Packets, repeated

edges• Autonomous Systems (AS):

Packets, repeated edges

n1

n2

n3

n4

n5

n6

n7

81

4 million nodes 8 million edges

17 years

4 million nodes 8 million edges

17 years

Bipartite Networks

• IMDB: Actor-movie network• Netflix: User-movie ratings• DBLP: conference- repeated

edges▫Author-Keyword▫Keyword-Conference▫Author-Conference

• US Election Donations: $ weights, repeated edges▫Orgs-Candidates▫ Individuals-Orgs

n1

n2

n3

n4

m1

m2

m3

82

6 million nodes 10 million edges

22 years

6 million nodes 10 million edges

22 years

Topological Models: “Butterfly”

83

•New node joins, picks host and iteratively random walks around neighbors, with ~ U(0,1)▫Some nodes “friendlier” than others

•Nodes may have multiple hosts ( ).▫ Joins components

•Nodes link with probability ▫May choose host, but not link (start new

component)

new node

host

Topological Models: “Butterfly”

•Nodes may have multiple hosts ( ).▫ Joins components

84

•Node picks “host” and iteratively perform random walk around neighbors, with ~ U(0,1)▫Some nodes “friendlier” than others

•Nodes link with probability ▫May choose host, but not link (start new

component)

Topological Models: RTM

•Recursive Tensor Model•Goal: to introduce time and burstiness•Main idea: Begin with a core tensor

(multidimensional array), and use self-similarity to reproduce observed power laws.

85

Topological Models: RTM

•Self similarity arises from Kronecker product

•2D:

86

[Leskovec+06]

Topological Models: RTM•3D: Use Kronecker product on a core tensor

•Reproduced power laws as found in ICDM08

87

Adjacency matrix

Topological Models: RTM•3D: Use Kronecker product on a core tensor

•Reproduced power laws as found in ICDM08

88

3rd dim: time

Topological Applications: Oddball•Main ideas:▫Use local neighborhood of node▫Find common patterns▫Score how much a node deviates from

common patterns•Results▫Identified anomalous nodes such as Ken Lay

in Enron, particularly different blog posts

89

Cascade Models: CGM

90

B1 B2

B4B3

ii) Infect each in-linked neighbor with probability

p1,1

B1 B2

B4B3

iii) Add infected neighbors’ posts to cascade.

p1,1

p4,,1

i) Randomly pick blog to infect, add post to cascade.

p1,1

B1 B2

B4B3

B1 B2

B4B3

iv) Set node infected in (i) to uninfected.

p1,1

p4,1

Cascade Models: Zero-crossing

•Main ideas:▫Models blogs in both network growth and

network diffusion▫Choose to post based on random walk

(produces burstiness)▫Link based on recency an popularity

(reproduces “-1.5 law” and skewed degree)▫Improvement over CGM because network is

generated

91

Community Observations: Newsgroups•Observation: Threads introduced to a group

later in the thread tended to have more activity from that group.

•Observation: Discussions tended to flow from “main” groups (can.politics) into subgroups (ab.politics, bc.politics)

92

Community Observations: Newsgroups•189 newsgroups (‘polit’ in name), January

2004-June 2008•37 million posts• Includes many countries, provinces,

states, topical groups (alt.politics.guns)

93

Major issue: over half are cross-posted to multiple groups.Where is conversation truly occurring?

{alt.politics, us.politics}

{alt.politics, us.politics}

{alt.politics,

us.politics, pa.politics}{alt.politics

, us.politics,

pa.politics}

Community Observations: Newsgroups•Solution: Introduce “Thread ownership”,

by assigning threads according to where authors exclusively post.

94

Community Observations: Newsgroups•Observation: Discussions tended to flow from

“main” groups (can.politics) into subgroups (ab.politics, bc.politics)

95

TOPOLOGY

COMMUNITY

CASCADES

OBSERVATIONS APPLICATIONS/TOOLS

What patterns are common to networks?

What are patterns of cascades in networks?

How can we compare communities?

Can we detect anomalies, and predict group behavior?

Can we develop predictive models for cascades?

Can we develop generative models and detect anomalies?

96

Completed Work