+ All Categories
Home > Documents > HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran...

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran...

Date post: 13-Dec-2015
Category:
Upload: stella-booth
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text- Based Cascades Xinran He 1 , Theodoros Rekatsinas 2 , James Foulds 3 , Lise Getoor 3 and Yan Liu 1 07/08/2015 1 University of Southern California 2 University of Maryland, College Park 3 University of California, Santa Cruz
Transcript
Page 1: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based

Cascades

Xinran He1, Theodoros Rekatsinas2, James Foulds3, Lise Getoor3 and Yan Liu1

07/08/20151University of Southern California2University of Maryland, College Park3University of California, Santa Cruz

Page 2: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Introduction• Diffusion is an important and fundamental phenomenon:

• Abundant text-based cascades in a variety of social platforms

A

B C

D

E

F

G

01/17

Viral marketing, detection of rumors, modeling news dynamics …

t=0

t=1 t=1.5

t=2

t=3.5

Page 3: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Traditional vs Text-based Cascades

02/17

t=0t=3.5

t=1

t=2

t=1.5

B

A

C

D

E

F

G

t=0t=3.5

t=1

t=2

t=1.5

Traditional cascades Text-based cascades

- Temporal information - Temporal information- Content information

Incorporate content information => better model of diffusion Incorporate temporal information => better model of documents

Page 4: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Network Inference

aaaaaabbb

cccbbbccc

aaabbbbba aaa

aabccc

cccbbcaaa

Network Inference focuses on inferring a hidden diffusion network

Related work: - NetInf, NetRate [Gomez et al. 11,12], MMHP [Yang and Zha 13], KernelCascades [Du el al. 12]

- TopicCascades [Du el al. 13]

t=0t=3.5

t=1

t=2

t=1.5

A

C

D

E

F

G

B B

A

C

D

E

F

G

aaaaab

bbb bbabbc

ccc

Topic 1 Topic 2 Topic 3

aaaaaabbb

cccbbbccc

aaabbbbba aaa

aabccc

cccbbcaaa

0.60.5

0.3 0.2

0.2

0.1

0.1

03/17

Page 5: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Topic Modeling

aaaaaabbb

cccbbbccc

aaabbbbba aaa

aabccc

cccbbcaaa

Topic modeling aims to discover the latent thematic topics

Related work: - LDA [Blei et al. 03], CTM [Blei and Lafferty 06]

- Citation Influence model [Dietz el al. 07], TIR model [Foulds et al. 13]

t=0t=3.5

t=1

t=2

t=1.5

A

C

D

E

F

G

B B

A

C

D

E

F

G

aaaaab

bbb bbabbc

ccc

Topic 1 Topic 2 Topic 3

aaaaaabbb

cccbbbccc

aaabbbbba aaa

aabccc

cccbbcaaa

aaaaaabbb

cccbbbccc

aaabbbbba aaa

aabccc

cccbbcaaa

Corpus

04/17

Page 6: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Our Contribution

HawkesTopic: joint model for simultaneous Network Inference and Topic Modeling from text-based cascades

aaaaaabbb

cccbbbccc

aaabbbbba

aaaaabccc

cccbbcaaa

aaaaab

Topic 1

bbb bbabbc

Topic 2

cccTopic 3

Topic Modeling

He et al. HawkesTopic ICML 2015

B

A

C

D

E

F

Gaaaaaabbb

aaaaabccc

cccbbcaaa

aaabbbbba

cccbbbccc t=0t=3.5

t=1

t=2

t=1.5

Network Inference

A

B C

D

E

F

G

0.6 0.4

0.10.2

0.3

0.3

05/17

Page 7: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

HawkesTopic: Intuition

𝑣1

𝑣2

aaaaaabbb

ccccccbbb

aaaababbb

cccccabbb

bbbbbacca

Mutual exciting nature: A posting event can trigger future events

Content cascades: The content of a document should be similar to the document that triggers its publication

𝒕

𝒕

He et al. HawkesTopic ICML 2015 06/17

Page 8: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Modeling Posting Times

Mutually exciting nature captured via Multivariate Hawkes Process (MHP) [Liniger 09].

For MHP, intensity process takes the form:

: influence strength from to : probability density function of the delay distribution

Base intensity Influence from previous events

He et al. HawkesTopic ICML 2015 07/17

+Rate =

Page 9: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Generating Posting Times

𝑣1

𝑣2

𝒕

𝒕

Generate events and their posting times in a breadth first order by interpreting the MHP as clustered Poisson process [Simma 10]

Provide explicit parent relationship for evolution of the content information

Level 0

Level 1

Level 2

He et al. HawkesTopic ICML 2015 08/17

Page 10: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Modeling Documents

He et al. HawkesTopic ICML 2015

𝑣1

𝑣2

𝒕

𝒕

𝛼1

𝛼2

ccbcacccc

aabaaaccc

ccbcacaaa

ccbcccaab

aacaabccc

ccbaabccc

aaaTopic 1

aabaac

cccTopic 2

ccbcac

𝛽1 :𝐾

Step 1: Generate the topics

Step 2: For spontaneous events (level=0): 𝜂𝑒∼𝑁 (𝛼𝑣 ,𝜎2 𝐼 )

Step 3: For triggered events (level>0): 𝜂𝑒∼𝑁 (𝜂parent [𝑒] ,𝜎2 𝐼 )

Step 4: For each word in each document: 𝑧𝑒,𝑛∼Discrete (𝜋 (𝜂𝑒 )) ,𝑥𝑒 ,𝑛∼Discrete(𝛽𝑧𝑒 ,𝑛)

09/17

Page 11: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Inference

Joint variational inference based on full mean-field approximation

𝑄 (𝜼 ,𝒛 ,𝑷 )=∏𝑒∈𝐸

[𝑞 (𝜂𝑒|�̂�𝑒 )𝑞 (𝑃𝑒|𝑟 𝑒 )∏𝑛=1

𝑁𝑒

𝑞 (𝑧𝑒 ,𝑛∨𝜙𝑒 ,𝑛)]-- Laplace approximation for non-conjugate variable:

-- Other variables:

Update for the :

𝑟𝑒 ,𝑒′∝𝑁 (�̂�𝑒|�̂�𝑒′ , �̂�2 𝐼 )×𝐴𝑣

𝑒 ′,𝑣 𝑒× 𝑓 Δ(𝑡𝑒−𝑡𝑒′ )

Similarity between document topics

Influence between users

Proximity of events in time

Hawkes Process

10/17

Page 12: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Experiments: setting

11/17

Evaluation metrics:-- Topic modeling: document competition likelihood [Wallach et al. 09]-- Network Inference: AUC against the ground truth network

“Ebola” news articles ~4 months~9k articles, 330 news media sitesCopying information as ground truth

High-energy physics theory papers ~12 yearsTop 50/100/200 researchersCitation network as ground truth

Page 13: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Experiments: algorithmsAlgorithm Description Topic Modeling Network Inference

HTM Our method with topic number K=50 and K=100 for ArXiv with 200 authors

LDA Latent Dirichlet Allocation with collapsed Gibbs sampling

CTM Correlated topic modeling with variational inference

Hawkes Hawkes process considering only event posting time

Hawkes-LDA Two steps approach that first infers topics with LDA

Hawkes-CTM Two steps approach that first infers topics with CTM

12/17

Page 14: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Result: EventRegistry

Hawkes Hawkes-LDA Hawkes-CTM HTMComponent 1 0.622 0.669 0.673 0.697Component 2 0.670 0.704 0.716 0.730Component 3 0.666 0.665 0.669 0.700

LDA CTM HTMComponent 1 -42945 -42458 -42325Component 2 -22558 -22181 -22164Component 3 -17574 -17574 -17571

Network Inference accuracy: 10% improvement

Topic modeling accuracy:

13/17

Page 15: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Result: EventRegistry

14/17

Page 16: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Result: ArXiv

Hawkes Hawkes-LDA Hawkes-CTM HTM

Top50 0.594 0.656 0.645 0.807Top100 0.588 0.589 0.614 0.687Top200 0.618 0.630 0.629 0.659

LDA CTM HTMTop50 -11074 -10769 -10708Top100 -15711 -15477 -15252Top200 -27758 -27630 -27443

15/17

Network Inference accuracy: 40% improvement

Topic modeling accuracy:

Page 17: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

He et al. HawkesTopic ICML 2015

Result: ArXiv

16/17

Page 18: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Conclusion

HawkesTopic model unifies Correlated Topic Model and Hawkes process:Þ infers hidden diffusion networkÞ discovers thematic topics of documents

Joint model of temporal information and content information in text-based cascades gets the best result

Experiments on ArXiv and EventRegistry datasetsÞ EventRegistry: 10% improvement in AUCÞ ArXiv: 40% improvement in AUC

He et al. HawkesTopic ICML 2015 17/17

Page 19: HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He 1, Theodoros Rekatsinas 2, James Foulds 3, Lise.

Questions?Thank You


Recommended