+ All Categories
Home > Documents > Inter and Intra Topic Structure Learning with Word...

Inter and Intra Topic Structure Learning with Word...

Date post: 21-Jun-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
17
Inter and Intra Topic Structure Learning with Word Embeddings He Zhao 1 Lan Du 1 Wray Buntine 1 Mingyaun Zhou 2 Abstract One important task of topic modeling for text anal- ysis is interpretability. By discovering structured topics one is able to yield improved interpretabil- ity as well as modeling accuracy. In this paper, we propose a novel topic model with a deep struc- ture that explores both inter-topic and intra-topic structures informed by word embeddings. Specifi- cally, our model discovers inter topic structures in the form of topic hierarchies and discovers intra topic structures in the form of sub-topics, each of which is informed by word embeddings and cap- tures a fine-grained thematic aspect of a normal topic. Extensive experiments demonstrate that our model achieves the state-of-the-art performance in terms of perplexity, document classification, and topic quality. Moreover, with topic hierar- chies and sub-topics, the topics discovered in our model are more interpretable, providing an illumi- nating means to understand text data. 1. Introduction Significant research effort has been devoted to developing advanced text analysis technologies. Probabilistic topic models such as Latent Dirichlet Allocation (LDA), are pop- ular approaches for this task, which discover latent topics from text collections. One preferred property of proba- bilistic topic models is interpretability: one can explain that a document is composed of topics and a topic is de- scribed by words. Although widely used, most variations of standard vanilla topic models (e.g., LDA) assume topics are independent and there are no structures among them. This limits those models’ ability to explore any hierarchical thematic structures. Therefore, it is interesting to develop a model that is capable of exploring topic structures and yields not only improved modeling accuracy but also better 1 Faculty of Information Technology, Monash University, Aus- tralia 2 McCombs School of Business, University of Texas at Austin. Correspondence to: Lan Du <[email protected]>, Mingyuan Zhou <[email protected]>. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). Table 1. Example local topics with top 10 words 1 journal science biology research journals international cell psychology scientific bioinformatics 2 fitness piano guitar swimming violin weightlifting lessons training swim weight 3 taylor prince swift william jovi bon woman gala pill jon 4 san auto theft grand andreas mobile gta game rockstar december interpretability. One popular direction to explore topic structure is using the hierarchical/deep representation of text data, such as the nested hierarchical Dirichlet process (nHDP) (Paisley et al., 2015), Deep Poisson Factor Analysis (DPFA) (Gan et al., 2015), and Gamma Belief Network (GBN) (Zhou et al., 2016; Cong et al., 2017). In general, these models assume that topics in the higher layers of a hierarchy are more general/abstract than those in the lower layers. Therefore, by revealing hierarchical correlations between topics, topic hierarchies provide an intuitive way to understand text data. In addition to topic hierarchies, we are also interested in analyzing the fine-grained thematic structure within each in- dividual topic. As we know, in conventional models, topics are discovered locally from the word co-occurrences in a corpus. So we refer those topics as local topics. Due to the limitation of the context of a target corpus, some local topics may be hard to interpret because of the following two ef- fects: (1) They can mix the words which co-occur locally in the target corpus but are less semantically related in general; (2) Local topics can be dominated by specialized words, which are less interpretable without extra knowledge. For example, we show four example topics of our experiments in Table 1, where we can see: Topic 1 is composed of the words from both the “scientific publication” and “biology” aspects; Topic 2 is a mixture of “sports” and “music”; Topics 3 and 4 are very specific topics about “singer” and “video game” respectively. We humans are able to understand those local topics in the above way because we are equipped with the global semantics of the words, making us go beyond the local context of the target corpus. Therefore, we are motivated to propose a model which is able to automatically analyze the fine-grained thematic structures of local topics, further improving the interpretability of topic modeling.
Transcript
Page 1: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

Inter and Intra Topic Structure Learning with Word Embeddings

He Zhao 1 Lan Du 1 Wray Buntine 1 Mingyaun Zhou 2

AbstractOne important task of topic modeling for text anal-ysis is interpretability. By discovering structuredtopics one is able to yield improved interpretabil-ity as well as modeling accuracy. In this paper,we propose a novel topic model with a deep struc-ture that explores both inter-topic and intra-topicstructures informed by word embeddings. Specifi-cally, our model discovers inter topic structures inthe form of topic hierarchies and discovers intratopic structures in the form of sub-topics, each ofwhich is informed by word embeddings and cap-tures a fine-grained thematic aspect of a normaltopic. Extensive experiments demonstrate that ourmodel achieves the state-of-the-art performancein terms of perplexity, document classification,and topic quality. Moreover, with topic hierar-chies and sub-topics, the topics discovered in ourmodel are more interpretable, providing an illumi-nating means to understand text data.

1. IntroductionSignificant research effort has been devoted to developingadvanced text analysis technologies. Probabilistic topicmodels such as Latent Dirichlet Allocation (LDA), are pop-ular approaches for this task, which discover latent topicsfrom text collections. One preferred property of proba-bilistic topic models is interpretability: one can explainthat a document is composed of topics and a topic is de-scribed by words. Although widely used, most variationsof standard vanilla topic models (e.g., LDA) assume topicsare independent and there are no structures among them.This limits those models’ ability to explore any hierarchicalthematic structures. Therefore, it is interesting to developa model that is capable of exploring topic structures andyields not only improved modeling accuracy but also better

1Faculty of Information Technology, Monash University, Aus-tralia 2McCombs School of Business, University of Texas at Austin.Correspondence to: Lan Du <[email protected]>, MingyuanZhou <[email protected]>.

Proceedings of the 35 th International Conference on MachineLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s).

Table 1. Example local topics with top 10 words

1journal science biology research journals

international cell psychology scientific bioinformatics

2fitness piano guitar swimming violin

weightlifting lessons training swim weight

3taylor prince swift william

jovi bon woman gala pill jon

4san auto theft grand andreas

mobile gta game rockstar december

interpretability.

One popular direction to explore topic structure is usingthe hierarchical/deep representation of text data, such as thenested hierarchical Dirichlet process (nHDP) (Paisley et al.,2015), Deep Poisson Factor Analysis (DPFA) (Gan et al.,2015), and Gamma Belief Network (GBN) (Zhou et al.,2016; Cong et al., 2017). In general, these models assumethat topics in the higher layers of a hierarchy are moregeneral/abstract than those in the lower layers. Therefore,by revealing hierarchical correlations between topics, topichierarchies provide an intuitive way to understand text data.

In addition to topic hierarchies, we are also interested inanalyzing the fine-grained thematic structure within each in-dividual topic. As we know, in conventional models, topicsare discovered locally from the word co-occurrences in acorpus. So we refer those topics as local topics. Due to thelimitation of the context of a target corpus, some local topicsmay be hard to interpret because of the following two ef-fects: (1) They can mix the words which co-occur locally inthe target corpus but are less semantically related in general;(2) Local topics can be dominated by specialized words,which are less interpretable without extra knowledge. Forexample, we show four example topics of our experimentsin Table 1, where we can see: Topic 1 is composed of thewords from both the “scientific publication” and “biology”aspects; Topic 2 is a mixture of “sports” and “music”; Topics3 and 4 are very specific topics about “singer” and “videogame” respectively. We humans are able to understand thoselocal topics in the above way because we are equipped withthe global semantics of the words, making us go beyondthe local context of the target corpus. Therefore, we aremotivated to propose a model which is able to automaticallyanalyze the fine-grained thematic structures of local topics,further improving the interpretability of topic modeling.

Page 2: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Fortunately, word embeddings such as GloVe (Penningtonet al., 2014), word2vec (Mikolov et al., 2013), and Fast-Text (Bojanowski et al., 2017) can be used as an accessiblesource of global semantic information for topic models.Learned from large corpora, word embeddings encode thesemantics of words with their locations in a space, wheremore related words are closer to each other. For example inTopic 1, according to the distances of word embeddings, thewords “biology, cell, psychology, bioinformatics” shouldbe in one cluster and “journal, science, research, interna-tional, scientific” should be in the other. Therefore, if a topicmodel can leverage the information in word embeddings, itmay discover the fine-grained thematic structures of localtopics. Furthermore, it has been demonstrated that conven-tional topic models suffer from data sparsity, resulting ina large performance degradation on some shorter internet-generated documents like tweets, product reviews, and newsheadlines (Zuo et al., 2016; Zhao et al., 2017c). In thiscase, word embeddings can also serve as complementaryinformation to alleviate the sparsity issue in topic models.

In this paper, we propose a novel deep structuredtopic model, named the Word Embeddings Deep TopicModel, WEDTM1, which improves the interpretability oftopic models by discovering topic hierarchies (i.e., intertopic structure) and fine-grained interpretations of local top-ics (i.e., intra topic structure). Specifically, the proposedmodel adapts a multi-layer Gamma Belief Network whichgenerates deep representations of topics as well as docu-ments. Moreover, WEDTM is able to split a local topicinto a set of sub-topics, each of which captures one fine-grained thematic aspect of the local topic, in a way thateach sub-topic is informed by word embeddings. WEDTMhas the following key properties: (1) Better interpretabil-ity with topic hierarchies and sub-topics informed by wordembeddings. (2) The state-of-the-art perplexity, documentclassification, and topic coherence performance, especiallyfor sparse text data. (3) A straightforward Gibbs samplingalgorithm facilitated by fully local conjugacy under dataaugmentation.

2. Related WorkDeep/hierarchical topic models: Several approaches havebeen developed to learn hierarchical representations ofdocuments and topics. The Pachinko Allocation model(PAM) (Li & McCallum, 2006) assumes the topic structureis modeled by a directed acyclic graph (DAG), which isdocument specific. nCRP (Blei et al., 2010) models topichierarchies by introducing a tree structure prior constructedwith multiple CRPs. Paisley et al. (2015); Kim et al. (2012);Ahmed et al. (2013) further extend nCRP by either softeningits constraints or applying it to different problems respec-

1https://github.com/ethanhezhao/WEDTM

tively. Poisson Factor Analysis (PFA) (Zhou et al., 2012)is a nonnegative matrix factorization model with Poissonlink, which is a popular alternative to LDA, for topic model-ing. The details of the close relationships between PFA andLDA can be found in Zhou (2018). There are several deepextensions to PFA for documents, such as DPFA (Gan et al.,2015), DPFM (Henao et al., 2015), and GBN (Zhou et al.,2016). Among them, GBN factorizes the factor score ma-trix (topic weights of documents) in PFA with nonnegativegamma-distributed hidden units connected by the weightsdrawn from the Dirichlet distribution. From a modeling per-spective, GBN is related to PAM, while GBN assumes thereis a corpus-level topic hierarchy shared by all the documents.As reported by Cong et al. (2017), GBN outperforms otherhierarchical models including nHDP, DPFA, and DPFM.Despite having the attractive properties, these deep modelsbarely consider intra topic structures or the sparsity issueassociated with internet-generated corpora.

Word embedding topic models: Recently, there is a grow-ing interest in applying word embeddings to topic models,especially for sparse data. For example, WF-LDA (Pettersonet al., 2010) extends LDA to model word features with thelogistic-normal transform, where word embeddings are usedas word features in Zhao et al. (2017b). LF-LDA (Nguyenet al., 2015) integrates word embeddings into LDA by re-placing the topic-word Dirichlet multinomial componentwith a mixture of a Dirichlet multinomial component anda word embedding component. Due to the non-conjugacyin WF-LDA and LF-LDA, part of the inference has to bedone by MAP optimization. Instead of generating tokens,Gaussian LDA (GLDA) (Das et al., 2015) directly gener-ates word embeddings with the Gaussian distribution. Themodel proposed in Xun et al. (2017) further extends GLDAby modeling topic correlations. MetaLDA (Zhao et al.,2017c; 2018a) is a conjugate topic model that incorporatesboth document and word meta information. However, inMetaLDA, word embeddings have to be binarized, whichcan lose useful information. WEI-FTM (Zhao et al., 2017b)is a focused topic model where a topic focuses on a subset ofwords, informed by word embeddings. To our knowledge,topic hierarchies and sub-topics are not considered in mostof the existing word embedding models.

3. The Proposed ModelBased on the PFA framework, WEDTM is a hierarchicalmodel with two major components: one for discoveringthe inter topic hierarchies and the other for discoveringintra topic structures (i.e., sub-topics) informed by wordembeddings. The two components are connected by thebottom-layer topics, detailed as follows. Assume that eachdocument j is presented as a word count vector x(1)

j ∈ NV0 ,where V is the size of the vocabulary; the pre-trained L

Page 3: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

dimensional real-valued embeddings for each word v ∈{1, · · · , V } are stored in a L-dimensional vector fv ∈ RL.Now we consider WEDTM with T hidden layers, where thet-th layer is with Kt topics and kt is the index of each topic.In the bottom layer (t = 1), there are K1 (local) topics, eachof which is associated with S sub-topics. To assist clarity,we split the generative process of the model into three parts,shown as follows:

Generatingdocuments

θ(1)j ∼ Gam

[Φ(2)θ

(2)j , p

(2)j /(1− p(2)j )

],

φ(1)k1∼ Dir (βk1) ,

x(1)j ∼ Pois

(Φ(1)θ

(1)j

),

Interstructure

θ(T )j ∼ Gam

(r, 1/c

(T+1)j

),

· · ·θ(t)j ∼ Gam

(Φ(t+1)θ

(t+1)j , 1/c

(t+1)j

)(t < T ),

φ(t)kt∼ Dir (η01) (t > 1),

· · ·

Intrastructure

w<s>k1∼ N [0, diag(1/σ<s>)] ,

α<s>k1∼ Gam(α<s>0 /S, 1/c<s>0 ),

β<s>vk1∼ Gam

(α<s>k1

, ef>v w<s>

k1

),

βvk1 :=∑Ss β

<s>vk1

,

where (t) is the index of the layer that a variable belongs toand <s> is the index of sub-topic s. To complete the model,we impose the following priors on the latent variables:

rkT ∼ Gam(γ0/KT , 1/c0),

γ0 ∼ Gam(a0, 1/b0), p(t)j ∼ Beta(a0, b0),

c(t)j ∼ Gam(e0, 1/f0), α

<s>0 ∼ Gam(e0, 1/f0),

c<s>0 ∼ Gam(e0, 1/f0), σ<s>l ∼ Gam(a0, 1/b0).

We first take a look at the bottom layer of the model, i.e.,the process of generating the documents, which follows aPFA framework. In this part, WEDTM models the wordcounts x(1)

j in a document by a Poisson (Pois) distributionand factorizes the Poisson parameters into a product of thefactor loadings Φ(1) ∈ RV×K1

+ and hidden units θ(1)j . θ(1)jis the first-layer latent representation (unnormalized topicweights) of document j, each element of which is drawnfrom a gamma (Gam) distribution2. The k1-th column ofΦ(1), φ(1)

k1∈ RV+ is the word distribution of topic k1, drawn

from a Dirichlet (Dir) distribution. We then explain thecomponent for discovering inter topic hierarchies, which issimilar to the structure of GBN (Zhou et al., 2016). Specif-ically, the shape parameter of θ(1)j is factorized into θ(2)jand Φ(2) ∈ RK1×K2

+ , where θ(2)j is the second-layer latent

2The first and second parameters of the gamma distribution arethe shape and scale respectively.

representation of document j and φ(2)k2∈ RK1

+ models thecorrelations between topic k2 and all the first-layer topics.Note that strictly speaking, k2 is not a “real” topic as it isnot a distribution over words. But it can be interpreted withwords by Φ(1)φ

(2)k2

. By repeating this construction, we areable to build a deep structure to discover topic hierarchies.

Now we explain how sub-topics are discovered for thebottom-layer topics with the help of word embeddings. Firstof all, WEDTM applies individual asymmetric Dirichletparameters βk1 ∈ RV+ for each bottom-layer (local) topicφ

(1)k1

. We further construct βvk1 =∑Ss β

<s>vk1

, where β<s>vk1models how strongly word v is associated with sub-topic sin local topic k1. For each sub-topic s, we introduce an L-dimensional sub-topic embedding: w<s>

k1∈ RL. As β<s>vk1

is gamma distributed, its scale parameter is constructed bythe dot product of the embeddings of sub-topic s and wordv through the exponential function.

The basic idea of our model is summarized as follows:

1. In terms of sub-topics, we assume each (local, bottom-layer) topic is associated with several sub-topics, in away that the sub-topics contribute to the prior of thelocal topic via a sum model (Zhou, 2016). Therefore, ifa word dominates in one or more sub-topics, it is likelythat the word will still dominate in the local topic. Withthis construction, a sub-topic is expected to capture onefine-grained thematic aspect of the local topic and eachsub-topic can be directly interpreted with words viaβ<s>k1

∈ RV+ .

2. To leverage word embeddings to inform the learningof sub-topics, we introduce the sub-topic embeddingfor each of them, w<s>

k1, which directly interacts with

the word embeddings. Therefore, sub-topic embed-dings are learned with both the local context of thetarget corpus and the global information of word em-beddings. According to our model construction, theprobability density function of βvk1 is the convolutionof S covariance-dependent gamma distributions (Zhou,2016). Therefore, if the sub-topic embeddings of s andword embeddings of v are close, the dot product ofthem will be large, giving a large expectation of β<s>vk1

.The large expectation means that v has a large weightin sub-topic s of k. Finally, β<s>vk1

further contributes

to the local topic’s prior βvk1 , informing φ(1)vk1 of thelocal topic.

3. It is also noteworthy the special case of WEDTM,where S = 1, meaning that there are no sub-topicsand each local topic k1 is associated with one topic em-bedding vector wk1 . Consequently, in WEDTM, thereare three latent variables capturing the weights betweenthe words and local topic k1: eF

>wk1 (F ∈ RL×V is

Page 4: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

the embeddings of all the words), βk1 , and φ(1)k1

, eachof which is a vector over words. It is interesting to ana-lyze the connections and differences of them. eF

>wk1

is the prior of βk1 , while βk1 is the prior of φ(1)k1

. So

eF>wk1 is the closest one to the word embeddings, i.e.,

the global semantic information, while φ(1)k1

is the clos-est one to the data, i.e., the local document contextof the target corpus. Therefore, unlike conventionaltopic models with φ(1)

k1only, the three variables of

WEDTM give three different views to the same topic,from global to local, respectively. We qualitativelyshow this interesting comparison in Section 5.4.

4. The last but not least, word embeddings in WEDTMcan be viewed to serve as the prior/complementaryinformation to assist the learning of the whole model,which is important especially for sparse data.

4. InferenceUnlike many other word embeddings topic models, the fullylocal conjugacy of WEDTM facilitates the derivation of aneffective Gibbs sampling algorithm. As the sampling forthe latent variables in the process of generating documentsand modeling inter topic structure are similar to GBN, thedetails can be found in Zhou et al. (2016). Here we focuson the sampling of the latent variables for modeling intratopic structure.

Assume that sampled by Eq. (28) in Appendix B of Zhouet al. (2016), the latent count for the bottom-layer local top-ics are x(1)vjk1 , which counts how many words v in documentj are allocated with local topic k1.

Sample β<s>vk1. We first sample:

(h<1>vk1

, · · · , h<S>vk1

)∼ Mult

(hvk1 ,

β<1>vk1

βvk1, · · · ,

β<S>vk1

βvk1

),

(1)

where hvk1 ∼ CRT(x(1)v·k1 , βvk1

)(Zhou & Carin, 2015;

Zhao et al., 2017a), and x(1)v·k1 :=∑j x

(1)vjk1

3. Then:

β<s>vk1∼

Gam(α<s>k1+ h<s>vk1

, 1)

e−π<s>vk1 + log 1

qk1

, (2)

where qk1 ∼ Beta(β·k1 , x(1)··k1) (Zhao et al., 2018b) and we

define π<s>vk1:= f>v w

<s>k1

.

3We hereafter use · of a dimension to denote the sum over thatdimension.

Sample α<s>k . We first sample g<s>vk1∼

CRT(h<s>vk1

, α<s>k1

), then:

α<s>k1∼

Gam(α<s>0 /S + g<s>·k1 , 1)

c<s>0 + log(1 + eπ

<s>vk1 log 1

qk1

) . (3)

It is noteworthy that the hierarchical construction onα<s>k1is

closely related to the gamma-negative binomial process andcan be considered as a (truncated) gamma process (Zhou &Carin, 2015; Zhou, 2016) with an intrinsic shrinkage mech-anism on S. It means that the model is able to automaticallylearn the number of effective sub-topics.

Sample w<s>k1

.

w<s>k1∼ N (µ<s>k1

,Σ<s>k1

),

µ<s>k1=

Σ<s>k1

[V∑v

(h<s>vk1

− α<s>k1

2− ω<s>vk1

log log1

qk1

)fv

],

Σ<s>k1

=

[diag(1/σ<s>) +

V∑v

ω<s>vk1fv(fv)

>

]−1,(4)

where ω<s>vk1∼ PG

(h<s>vk1

+ α<s>k1, π<s>vk1

+ log log 1qk1

)and PG denotes the Polya gamma distribution (Polson et al.,2013). To sample from PG, we use an accurate and efficientapproximate sampler in Zhou (2016).

Omitted derivations, details, and the overall algorithm arein the supplementary materials.

5. ExperimentsWe evaluate the proposed WEDTM by comparing it withseveral recent advances including deep topic models andword embedding topic models. The experiments were con-ducted on four real-world datasets including both regularand sparse texts. We report perplexity, document classifica-tion accuracy, and topic coherence scores. We also qualita-tively analyze the topic hierarchies and sub-topics.

5.1. Experimental Settings

In the experiments, we used a regular text dataset (20NG)and three sparse text datasets (WS, TMN, Twitter), the de-tails of which are as follows: 1. 20NG, 20 Newsgroup, con-sists of 18,774 articles with 20 categories. Following Zhouet al. (2016), we used the 2000 most frequent terms afterremoving stopwords. The average document length is 76.2. WS, Web Snippets, contains 12,237 web search snip-pets with 8 categories, used by Li et al. (2016); Zhao et al.(2017c;b). The vocabulary contains 10,052 tokens and thereare 15 words in one snippet on average. 3. TMN, Tag My

Page 5: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

(a) WS

50 100 200-200

-150

-100

-50

0

50

WEDTM-1

WEDTM-2

WEDTM-3

GBN-1

GBN-2

GBN-3

MetaLDA

WEI-FTM

GBN-1: 704, 595, 548

(b) TMN

50 100 200-250

-200

-150

-100

-50

0

50

GBN-1: 1498, 1299, 1190

(c) Twitter

50 100 200-50

-40

-30

-20

-10

0

10

GBN-1: 461, 349, 305

(d) 20NG

100 200 400-20

-10

0

10

20

GBN-1: 631, 522, 479

(e) WS

20% 40% 60% 80%-600

-500

-400

-300

-200

-100

0

100

WEDTM-1

WEDTM-2

WEDTM-3

GBN-1

GBN-2

GBN-3

MetaLDA

WEI-FTM

GBN-1: 1786, 994, 728, 595

(f) TMN

20% 40% 60% 80%-800

-600

-400

-200

0

200

GBN-1: 3140, 1925, 1495, 1299

(g) Twitter

20% 40% 60% 80%-150

-100

-50

0

50

GBN-1: 699, 446, 382, 349

(h) 20NG

20% 40% 60% 80%-25

-20

-15

-10

-5

0

5

GBN-1: 704, 595, 549, 522

Figure 1. (a)-(d): Relative per-heldout-word perplexity6 (the lower the better) with the varied K1 and fixed proportion (80%) of trainingwords of each document. (e)-(h): Relative per-heldout-word perplexity6 with the varied proportion of training words of each documentand fixed K1 (100 on WS, TMN, and Twitter; 200 for 20NG). The error bars indicate the standard deviations of 5 random trials. Thenumber attached to WEDTM and GBN indicates the number of layers (i.e., T ) used.

News, consists of 32,597 RSS news snippets from Tag MyNews with 7 categories, used by Nguyen et al. (2015); Zhaoet al. (2017c;b). Each snippet contains a title and a short de-scription. There are 13,370 tokens in the vocabulary and theaverage length of a snippet is 18. 4. Twitter, was extractedin 2011 and 2012 microblog tracks at Text REtrieval Con-ference (TREC)4 and preprocessed in Yin & Wang (2014).It has 11,109 tweets in total. The vocabulary size is 6,344and a tweet contains 21 words on average.

We compared WEDTM with: 1. GBN (Zhou et al., 2016),the state-of-the-art deep topic model. 2. MetaLDA (Zhaoet al., 2017c; 2018a), the state-of-the-art topic model withbinary meta information about document and/or word. Wordembeddings need to be binarized before used in the model. 3.WEI-FTM (Zhao et al., 2017b), the state-of-the-art focusedtopic model that incorporates real-valued word embeddings.

It is noteworthy that GBN was reported (Cong et al., 2017)to have better performance than other deep (hierarchi-cal) topic models such as nHDP (Paisley et al., 2015),DPFA (Gan et al., 2015), and DPFM (Henao et al., 2015).MetaLDA and WEI-FTM were reported to perform betterthan other word embedding topic models including WF-LDA (Petterson et al., 2010) and GPUDMM (Li et al., 2016)as well as short text topic models like PTM (Zuo et al.,

4http://trec.nist.gov/data/microblog.html

2016). Therefore, we considered the three above competi-tors to WEDTM.

Originally MetaLDA (when no document meta informa-tion is provided) and WEI-FTM follow the LDA frame-work, where the topic distribution for document j is θj ∼Dir(α01) and α0 is a hyperparameter (usually set to 0.1).For a fair comparison, we replaced this part with thePFA framework with the gamma-negative binomial pro-cess (Zhou & Carin, 2015), which is equivalent to GBNwhen T = 1 and closely related to the hierarchical DirichletProcess LDA (HDPLDA) (Teh et al., 2012).

For all the models, we used 50-dimensional GloVe wordembeddings pre-trained on Wikipedia5. Except for Met-aLDA, where we followed the paper to binarise the wordembeddings, the other three models used the original real-valued embeddings. The hyperparameter settings we usedfor WEDTM and GBN are a0 = b0 = 0.01, e0 = f0 =1.0, η0 = 0.05. For MetaLDA and WEI-FTM, we collected1000 MCMC samples after 1000 burnins; for GBN andWEDTM, we collected 1000 for T = 1 and 500 for T > 1MCMC samples after 1000 for T = 1 and 500 for T > 1burnins, to estimate the posterior mean. Due to the shrink-age effect of WEDTM on S, discussed in Section 4, we setS = 5 which is large enough for all the topics.

5https://nlp.stanford.edu/projects/glove/

Page 6: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

(a) WS

20% 40% 60% 80%-5

0

5

10

15

20WEDTM-1

WEDTM-2

WEDTM-3

GBN-1

GBN-2

GBN-3

MetaLDA

WEI-FTM

GBN-1: 66.6, 80.9, 83.1, 82.2

(b) TMN

20% 40% 60% 80%-4

-2

0

2

4

6

8

10

GBN-1: 72.0, 77.9, 78.8, 79.4

(c) 20NG

20% 40% 60% 80%-2

-1

0

1

2

3

4

5

GBN-1: 64.5, 68.3, 69.6, 69.9

Figure 2. Relative document classification accuracy6 (%) on WS, TMN, and 20NG with the varied proportion of training words of eachtraining document. The results with K1 = 100 on WS and TMN, K1 = 200 on 20NG are reported.

(a) WS

20% 40% 60% 80%-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

WEDTM

WEDTM-te

GBN

MetaLDA

WEI-FTM

WEI-FTM-te

(b) TMN

20% 40% 60% 80%-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

(c) Twitter

20% 40% 60% 80%-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

Figure 3. Topic coherence (NPMI, the higher the better) on WS, TMN, and Twitter with the varied proportion of training words of eachdocument. The results with K1 = 100 are reported. For WEDTM and WEI-FTM, the top words of a topic are generated by ranking theword distribution and the dot product of topic and word embeddings (denoted “-te”).

5.2. Perplexity

Perplexity is a measure that is widely used (Wallach et al.,2009) to evaluate the modeling accuracy of topic models.Here we randomly chose a certain proportion of the wordtokens in each document as training and used the remain-ing ones to calculate per-heldout-word perplexity. Figure 1shows the relative perplexity6 results of all the models onall the datasets, where we varied the number of bottom-layer topics as well as the proportion of training words. Theproposed WEDTM performs significantly better than theothers, especially on sparse data. There are several inter-esting remarks of the results: (1) The perplexity advantageof WEDTM over GBN becomes obvious when the corpusbecomes sparse (e.g., WS/TMN/Twitter V.S. 20NG and 20%V.S. 80% training words). It shows that using word embed-dings as the prior information benefits the model. (2) Ingeneral, increasing the depth of the model leads to betterperplexity. However, when the data are too sparse (e.g. WSwith 20% training words), the single-layer WEDTM andGBN perform better than their multi-layer counterparts. (3)Although MetaLDA and WEI-FTM leverage word embed-

6We subtracted the score of GBN with only one layer (GBN-1)from the score of each model. The lines plot the differences. SoGBN-1 is the horizontal line on “0”. The absolute score of GBN-1is given below each figure.

dings as well, the proposed WEDTM outperforms themsignificantly. Perhaps the way that WEDTM incorporatesword embeddings is more effective.

5.3. Document Classification

We consider the multi-class classification task for predict-ing the categories for test documents to evaluate the qualityof the latent document representation (unnormalized topicweights) extracted by these models.7 In this experiment,following Zhou et al. (2016), we ran the topic models onthe training documents and trained a L2 regularized logisticregression using the LIBLINEAR package (Fan et al., 2008)with the latent representation θ(1)j as features. After train-ing, we used the trained topic models to extract the latentrepresentations of the test documents and the trained logisticregression to predict the categories. For all the datasets, werandomly selected 80% documents for training and usedthe remaining 20% for testing. Figure 2 shows the relativedocument classification accuracy6 results for all the models.It can be observed that with word embeddings, WEDTMoutperforms GBN significantly, the best on TMN and 20NG,and the second-best on WS. Again, we see a similar phe-

7The results of Twitter are not reported because each documentof it is associated with multiple categories.

Page 7: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

nomenon: word embeddings help more on the sparser dataand increasing the network depth improves the accuracy.

5.4. Topic Coherence

Topic coherence is another popular evaluation of topic mod-els (Zuo et al., 2016; Zhao et al., 2017b;b). It measuresthe semantic coherence in the most significant words (topwords) in a topic. Here we used the Normalized PointwiseMutual Information (NPMI) (Aletras & Stevenson, 2013;Lau et al., 2014) to calculate topic coherence score of thetop 10 words of each topic and report the average score ofall the topics.8

To compare with the other models, in this experiment, weset S = 1 for WEDTM. Recall that in WEDTM, fromglobal to local, there are three ways to interpret a topic.Here we evaluate NPMI for two of them: eF

>wk1 and φ(1)k1

.Figure 3 shows the NPMI scores for all the models on WS,TMN, and Twitter. It is not surprising to see that the topwords generated by eF

>wk1 in WEDTM always gain thehighest NPMI scores, meaning that the topics are morecoherent. This is because the topic embeddings in WEDTMdirectly interact with word embeddings. Moreover, if wejust compare the topics generated by φ(1)

k1, WEDTM also

gives more coherent topics than the other models. Thisdemonstrates that the proposed model is able to discovermore interpretable topics.

5.5. Qualitative Analysis

As one of the most appealing properties of WEDTM is itsinterpretability, we conducted the extensive qualitative eval-uation of the quality of the topics discovered by WEDTM,including topic embeddings, sub-topics, and topic hierar-chies. More qualitative analysis including topic hierarchyvisualization and synthetic document generation is shownin the supplementary materials.

Demonstration of topic embeddings: We demonstratethat WEDTM discovers more coherent topics by comparingwith those of GBN in Table 2. Here we set S = 1 as well.This demonstration further explains the numerical results inFigure 3. It is also interesting to compare the local interpre-tation (φ(1)

k ) and global interpretation (topic embeddings)of the same topic in WEDTM. For example, in the fifthset, the local interpretation (5.b) is about “networks andsecurity,” while the global interpretation (5.c) generalizes itwith more general words related to “communications.” Wecan also observe that although the local interpretation ofWEDTM is not as close to word embeddings as the globalinterpretation, as informed by the global interpretation, the

8We used the Palmetto package with a large Wikipedia dumpto compute NPMI (http://palmetto.aksw.org).

local interpretation of WEDTM’s topics is still considerablymore coherent than those in GBN.

Demonstration of sub-topics: In Figure 4, We show thesub-topics discovered by WEDTM for the topics used as ex-amples at the beginning of the paper (Table 1). It can be ob-served that the intra topic structures with sub-topics clearlyhelp to explain the local topics. For example, WEDTMsuccessfully splits Topic 1 into sub-topics related to “jour-nal” and “biology,” and Topic 2 into “music” and “sports”.Moreover, with the help of word embeddings, WEDTM dis-covers general sub-topics for specific topics. For example,Topic 3 and 4 are more interpretable with the sub-topicsof “singer” and “game” respectively. The experiment alsoempirically demonstrates the shrinkage mechanism of themodel: for most topics, the effective sub-topics are less thanthe maximum number S = 5.

Demonstration of topic hierarchies: Figure 5 shows anexample that jointly demonstrates the inter and intra struc-tures of WEDTM. The tree is a cluster of topics relatedto “health,” where the topic hierarchies are discovered byranking {Φ(t)}t, the leaf nodes are the topics in the bottomlayer, and each bottom-layer topic is associated with a setof sub-topics. In WEDTM, the inter topic structures are re-vealed in the form of topic hierarchies while the intra topicstructures are revealed in the form of sub-topics. Combiningthe two kinds of topic structures in this way gives a betterview of the target corpus, which may further benefit othertext analysis tasks.

6. ConclusionIn this paper, we have proposed WEDTM, a deep topicmodel that leverages word embeddings to discover intertopic structures with topic hierarchies and intra topic struc-tures with sub-topics. Moreover, with the introduction tosub-topic embeddings, each sub-topic can be informed bythe global information in word embeddings, so as to dis-cover a fine-grained thematic aspect of a local topic. Withtopic embeddings, WEDTM provides different views to atopic, from global to local, which further improves the inter-pretability of the model. As a fully conjugate model, the in-ference of WEDTM can be done by a straightforward Gibbssampling algorithm. Extensive experiments have shownthat WEDTM achieves the state-of-the-art performance onperplexity, document classification, and topic quality. In ad-dition, with topic hierarchies, sub-topics, and topic embed-dings, the model can discover more interpretable structuredtopics, which helps to get better understandings of text data.Given the local conjugacy, it is possible to derive more scal-able inference algorithms for WEDTM, such as stochasticvariational inference and stochastic gradient MCMC, whichis a good subject for future work.

Page 8: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Table 2. Top 10 words of five sets of example topics on the WS dataset. Each set contains the top words of 3 topics: topic ‘a’ is generatedby φ

(1)k in GBN-3; topic ‘b’ is generated by φ

(1)k in WEDTM-3; topic ‘c’ is generated by eF

>wk1 in WEDTM-3. Topic ‘a’ and ‘b’ arematched by the Hellinger distance of φ(1)

k . Topic ‘b’ and ‘c’ are different ways of interpreting one topic in WEDTM.Topic Index Top 10 words NPMI

1a engine car buying home diesel selling fuel automobile violin jet -0.055b engine motor diesel fuel gasoline jet electric engines gas technology 0.202c engine diesel engines gasoline steam electric fuel propulsion motors combustion 0.224

2a party labor democratic political socialist movement union social news australian 0.168b party political communist democratic socialist labor republican parties conservative leader 0.188c party democratic communist labour liberal socialist conservative opposition elections republican 0.219

3a cancer lung tobacco intelligence artificial information health symptoms smoking treatment -0.006b cancer lung tobacco information health smoking treatment gov research symptoms 0.050c cancer breast diabetes pulmonary cancers patients asthma cardiovascular cholesterol obesity 0.050

4a oscar academy awards swimming award winners swim oscars nominations picture 0.020b art awards oscar academy gallery museum surrealism sculpture picasso arts 0.076c paintings awards award art museum gallery sculpture painting picasso portrait 0.087

5a security computer network nuclear weapons networking spam virus spyware national 0.059b security network wireless access networks spam spyware networking national computer 0.061c wireless internet networks devices phone broadband users network wi-fi providers 0.143

journal science biology

research journals

international cell

psychology scientific

bioinformatics

journal journals scientific research society

publishes science

information documents

peer-reviewed

biology genetics research

nanotechnology genetic

molecular biological

interdisciplinary neuroscience biotechnology

peer-reviewed zoology

humanist internationale

asm nejm

naturalis cri

bas mathematica

1

fitness piano guitar

swimming violin

weightlifting lessons training swim

weight

piano guitar violin violins pianos guitars tenor

strings keyboard

fender

indoor workout

swimming conditioning

exercise training fitness weight boxing gloves

ebay listings trash mass grand digg

rental overlooked

luxury collectors

2

taylor swift

prince william

jovi bon gala jon

white winter

concert music

singing crowd

audience singer sang band song night

singer princess dancer

romantic actress sang gown sing jovi duet

3

san auto theft

grand andreas mobile

gta game

rockstar december

game version

playstation android nintendo release season series

original released

andreas theft auto

mobile dlc

continues madness

sad anniversary

struck

4

Figure 4. The sub-topics (red) of the example topics (blue). Larger font size indicates larger weight (∑V

v β<s>vk ) of a sub-topic to the

local topic. We set S = 5 and trimmed off the sub-topics with extreme small weights.

47 health cancer information gov disease healthy nutrition diet medical hiv

4 health cancer information gov healthy nutrition disease diet medical heart

21 theoretical edu models hypotheses cgi model reasoning abstract hypothesis framework

33 hiv aids system respiratory prevention epidemic infection hepatitis organism living

1 information links web

online resources

home page

search com news

online information

web internet pages

website books page mail book

full-text brochures

how-to summaries guidebooks databases dictionaries

listings maps

keywords

pdf link

unreleased search

org virtual mexico

authoritative disclaimer

iranian

4 news bbc com story blog

people comics article msnbc

talk

dead killed

violence fled

muslim god

cartoons minister morning cartoon

news com

people washington

story article bush blog

stories world

5 health cancer

information healthy

gov nutrition disease

diet medical

heart

health cancer disease nutrition diseases

care medical

food information

patients

vitamin cholesterol

diets dietary

hdl calorie fatty

calcium nutrition obesity

12 theoretical edu

models hypotheses

cgi model

reasoning abstract

hypothesis framework

theory mathematical

analysis empirical

theoretical study

growth theories

molecular understanding

superstring physik

pts theorem

stl symmetry

que jul syr

mathematica

36 hiv aids

system respiratory prevention epidemic infection hepatitis organism species

hiv respiratory

aids disease species

diseases infection human

prevention hepatitis

rcn nlm

chudler acad mcw hse

assoc mtn nic

disassembles

Figure 5. One example sub-tree of the topic hierarchy discovered by WEDTM on the WS dataset with K1 = 50 and S = 5. Thetree is generated in the same way to Zhou et al. (2016). A line from node kt at layer t to node kt−1 at layer t − 1 indicates thatφ(t)kt−1kt

> 1.5/Kt−1 and its width indicates the value of φ(t)kt−1kt

(i.e. topic correlation strength). The outside border of the text box iscolored as orange, blue, or black if the node is at layer three, two, or one, respectively. For the leaf nodes, sub-topics are shown in thesame way to Figure 4.

Page 9: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

ReferencesAhmed, A., Hong, L., and Smola, A. Nested Chinese restau-

rant franchise process: Applications to user tracking anddocument modeling. In ICML, 2013.

Aletras, N. and Stevenson, M. Evaluating topic coherenceusing distributional semantics. In Proc. of the 10th In-ternational Conference on Computational Semantics, pp.13–22, 2013.

Blei, D., Griffiths, T., and Jordan, M. The nested Chineserestaurant process and Bayesian nonparametric inferenceof topic hierarchies. Journal of the ACM (JACM), 57(2):7, 2010.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. En-riching word vectors with subword information. TACL, 5:135–146, 2017.

Cong, Y., Chen, B., Liu, H., and Zhou, M. Deep latentDirichlet allocation with topic-layer-adaptive stochasticgradient Riemannian MCMC. In ICML, pp. 864–873,2017.

Das, R., Zaheer, M., and Dyer, C. Gaussian LDA for topicmodels with word embeddings. In ACL, pp. 795–804,2015.

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., andLin, C.-J. LIBLINEAR: A library for large linear classifi-cation. JMLR, 9:1871–1874, 2008.

Gan, Z., Chen, C., Henao, R., Carlson, D., and Carin, L.Scalable deep Poisson factor analysis for topic modeling.In ICML, pp. 1823–1832, 2015.

Henao, R., Gan, Z., Lu, J., and Carin, L. Deep Poissonfactor modeling. In NIPS, pp. 2800–2808, 2015.

Kim, J. H., Kim, D., Kim, S., and Oh, A. Modeling topichierarchies with the recursive Chinese restaurant process.In CIKM, pp. 783–792. ACM, 2012.

Lau, J. H., Newman, D., and Baldwin, T. Machine readingtea leaves: Automatically evaluating topic coherence andtopic model quality. In EACL, pp. 530–539, 2014.

Li, C., Wang, H., Zhang, Z., Sun, A., and Ma, Z. Topicmodeling for short texts with auxiliary word embeddings.In SIGIR, pp. 165–174, 2016.

Li, W. and McCallum, A. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML,pp. 577–584, 2006.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., andDean, J. Distributed representations of words and phrasesand their compositionally. In NIPS, pp. 3111–3119, 2013.

Nguyen, D. Q., Billingsley, R., Du, L., and Johnson, M.Improving topic models with latent feature word repre-sentations. TACL, 3:299–313, 2015.

Paisley, J., Wang, C., Blei, D., and Jordan, M. Nestedhierarchical Dirichlet processes. TPAMI, 37(2):256–270,2015.

Pennington, J., Socher, R., and Manning, C. GloVe: Globalvectors for word representation. In EMNLP, pp. 1532–1543, 2014.

Petterson, J., Buntine, W., Narayanamurthy, S. M., Caetano,T. S., and Smola, A. J. Word features for Latent DirichletAllocation. In NIPS, pp. 1921–1929, 2010.

Polson, N., Scott, J., and Windle, J. Bayesian inferencefor logistic models using Polya–Gamma latent variables.Journal of the American Statistical Association, 108(504):1339–1349, 2013.

Teh, Y. W., Jordan, M., Beal, M., and Blei, D. HierarchicalDirichlet processes. Journal of the American StatisticalAssociation, 101(476):1566–1581, 2012.

Wallach, H. M., Mimno, D. M., and McCallum, A. Rethink-ing LDA: Why priors matter. In NIPS, pp. 1973–1981,2009.

Xun, G., Li, Y., Zhao, W. X., Gao, J., and Zhang, A. Acorrelated topic model using word embeddings. In IJCAI,pp. 4207–4213, 2017.

Yin, J. and Wang, J. A Dirichlet multinomial mixture model-based approach for short text clustering. In SIGKDD, pp.233–242. ACM, 2014.

Zhao, H., Du, L., and Buntine, W. Leveraging node at-tributes for incomplete relational data. In ICML, pp.4072–4081, 2017a.

Zhao, H., Du, L., and Buntine, W. A word embeddingsinformed focused topic model. In ACML, pp. 423–438,2017b.

Zhao, H., Du, L., Buntine, W., and Liu, G. MetaLDA: Atopic model that efficiently incorporates meta information.In ICDM, pp. 635–644, 2017c.

Zhao, H., Du, L., Buntine, W., and Liu, G. Leveragingexternal information in topic modelling. Knowledge andInformation Systems, pp. 1–33, 2018a.

Zhao, H., Rai, P., Du, L., and Buntine, W. Bayesian multi-label learning with sparse features and labels, and labelco-occurrences. In AISTATS, pp. 1943–1951, 2018b.

Zhou, M. Softplus regressions and convex polytopes. arXivpreprint arXiv:1608.06383, 2016.

Page 10: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Zhou, M. Nonparametric Bayesian negative binomial factoranalysis. Bayesian Analysis, 2018.

Zhou, M. and Carin, L. Negative binomial process countand mixture modeling. TPAMI, 37(2):307–320, 2015.

Zhou, M., Hannah, L., Dunson, D. B., and Carin, L. Beta-negative binomial process and Poisson factor analysis. InAISTATS, pp. 1462–1471, 2012.

Zhou, M., Cong, Y., and Chen, B. Augmentable gammabelief networks. JMLR, 17(163):1–44, 2016.

Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., andXiong, H. Topic modeling of short texts: A pseudo-document view. In SIGKDD, pp. 2105–2114, 2016.

Page 11: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

Supplementary Material for “Inter and Intra Topic Structure Learning withWord Embeddings”

He Zhao 1 Lan Du 1 Wray Buntine 1 Mingyaun Zhou 2

1. Inference DetailsRecall that the data is the word count vector x(1)

j of doc-ument j. Given the PFA framework, we can apply thecollapsed Gibbs sampling to sample the bottom-layer topicfor the i-th word in j, vji, similar to Eq. (28) in AppendixB of Zhou et al. (2016), as follows:

P (zji = k1) ∝βvk1 + x

(1)−ji

vji·k1

β·k1 + x(1)−ji

··k1

(x(1)−ji

·jk1 + φ(2)k1:θ(2)j

)(1)

where zji is the topic index for vji and x(1)vjk :=∑i δ(vji =

v, zji = k1) counts the number of times that term v appearsin document j; we use x−ji to denote the count x calculatedwithout considering word i in document j.

Given the latent counts x(1)v·k1 , the multinomial likelihood of

φ(1)k1

is proportional to(φ(1)vk1

)x(1)v·k1 . Due to the Dirichlet-

multinomial conjugacy, the joint likelihood of βk1 is com-puted as:

L (βvk1) ∝ Γ(β·k1)

Γ(βk1· + x(1)··k1)

V∏v

Γ(βvk1 + x(1)v·k1)

Γ(βvk1). (2)

By introducing an auxiliary beta distributed variable:

qk1 ∼ Beta(β·k1 , x(1)··k1), (3)

we can transform the first gamma ratio in Eq. (2) to(qk1)

β·k1 (Zhao et al., 2018). After this augmentation, onecan show that the likelihood of βk1,v is proportional to thenegative binomial distribution.

Recall that βvk1 =∑Ss β

<s>vk1

and the shape parameter ofβ<s>vk1

, α<s>k1is drawn from hierarchical gamma. This con-

struction is closely related to the gamma-negative binomial

1Faculty of Information Technology, Monash University, Aus-tralia 2McCombs School of Business, University of Texas at Austin.Correspondence to: Lan Du <[email protected]>, MingyuanZhou <[email protected]>.

Proceedings of the 35 th International Conference on MachineLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s).

process (Zhou & Carin, 2015; Zhou, 2016), which enablesthe model to automatically determine the number of effec-tive sub-topics.

Furthermore, we introduce another auxiliary variable:

hvk1 ∼ CRT(x(1)v·k1 , βvk1

), (4)

where h ∼ CRT(n, r) stands for the Chinese RestaurantTable distribution (Zhou & Carin, 2015) that generates thenumber of tables h seated by n customers in a Chineserestaurant process with the concentration parameter r (Wray& Marcus, 2012). Given hvk1 , the second gamma ratio canbe augmented as (βvk1)

hvk1 . Finally, with the two auxiliaryvariables, Eq. (2) can be written as:

L (βvk1 , qk1 , hvk1) ∝ (qk1)βvk1 (βvk1)

hvk1 . (5)

Sample β<s>vk1. Given the table counts hvk1 in Eq. (5), we

can sample the counts h<s>vk1for each sub-topic s as follows:

(h<1>vk1

, · · · , h<S>vk1

)∼ Mult

(hvk1 ,

β<1>vk1

βvk1, · · · ,

β<S>vk1

βvk1

).

(6)

Given h<s>vk1, we can sample β<s>vk1

as:

β<s>vk1∼

Gam(α<s>k1+ h<s>vk1

, 1)

e−π<s>vk1 + log 1

qk1

, (7)

where we define

π<s>vk1:= f>v w

<s>k1

. (8)

Sample α<s>k1. By integrating β<s>vk1

out, we sampleα<s>k1

as:

α<s>k1∼

Gam(α<s>0 /S + g<s>·k1 , 1)

c<s>0 + log(

1 + eπ<s>vk1 log 1

qk1

) , (9)

where

g<s>·k1 :=

V∑v

g<s>vk1, (10)

g<s>vk1∼ CRT

(h<s>vk1

, α<s>k1

). (11)

According to the gamma-gamma conjugacy, α<s>0 andc<s>0 can be sampled in a similar way.

Page 12: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Sample w<s>k1

. After integrating β<s>vk1out and ignoring

unrelated terms, the joint likelihood related to w<s>k1

isproportional to:

L(π<s>vk1

)∝

(eπ<s>vk1

+log log 1qk1

)h<s>vk1

(1 + e

π<s>vk1+log log 1

qk1

)α<s>k1+h<s>vk1

.(12)

The above likelihood can be augmented by an auxiliaryvariable: ω<s>vk1

∼ PG(1, 0), where PG denotes the Polyagamma distribution (Polson et al., 2013). Given ω<s>vk1

, weget:

L(π<s>vk1

, ω<s>vk1

)∝ e

h<s>vk1

−α<s>k1

2 π<s>vk1 e−ω<s>vk12 π<s>vk1

2

.

(13)

The above likelihood results in the normal likelihood ofw<s>k1

. Therefore, we sample it from a multi-variate normaldistribution as follows:

w<s>k1∼ N (µ<s>k1

,Σ<s>k1

),

µ<s>k1=

Σ<s>k1

[V∑v

(h<s>vk1

− α<s>k1

2− ω<s>vk1

log log1

qk1

)fv

],

Σ<s>k1

=

[diag(1/σ<s>) +

V∑v

ω<s>vk1fv(fv)

>

]−1,

(14)

where µ<s>k1∈ RL and Σ<s>

k1∈ RL×L.

According to (Polson et al., 2013), we can sample

ω<s>vk1∼ PG

(h<s>vk1

+ α<s>k1, π<s>vk1

+ log log1

qk1

).

(15)

To sample from the Polya gamma distribution, we use anaccurate and efficient approximate sampler in Zhou (2016).

Finally, σ(s) can be sampled from its gamma posterior.

The inference algorithm is shown in Figure 1.

2. Visualization of the Discovered TopicHierarchies and Sub-topics

Figure 1-9 show the topic hierarchies discovered byWEDTM on WS, TMN, and Twitter respectively.

3. Generating Synthetic DocumentsBelow we provide several synthetic documents generatedfrom WEDTM, following the GBN paper (Zhou et al., 2016).Given trained {Φ(t)}t, we used the generative model shownin Figure 1 in the main paper to generate a simulated topicweights θ(1)j . We show the top words ranked according to

Φ(1)θ(1)j . Below are some example synthetic documents

generated in this manner with WEDTM trained on the TMNdataset. The generated documents are clearly interpretable.

1. study drug cancer risk health people heart women dis-ease patients drugs researchers children brain foundfinds kids high doctors suggests research medicalsurgery food diabetes treatment blood years report menyoung shows year time scientists mets age linked yan-kees coli long care tuesday good hospital early prostatebreast fda developing common adults monday outbreakolder weight lower parents problems taking day studiesvirus aids babies life diet thursday loss levels higherwednesday home evidence make trial tests effectivedeath sox therapy pregnancy experts prevent low pa-tient run pressure pain alzheimer game flu type winnumber treat start season obesity symptoms

2. mets yankees game season sox win run baseball nflleague hit home time team red inning players day backbeat phillies start pitcher innings night runs rangersgiants major year games rays big manager york victoryindians coach left past boston make draft dodgers hitsgood pitch hitter career bay tigers blue list play bravesfans marlins tuesday years disabled high cubs thursdaysaturday lockout chicago week nationals top wednes-day lead white jays twins streak los days philadelphiafield long texas josh pitched ninth friday sunday fran-cisco homer lineup put young college football timesroundup pitching cleveland loss sports angeles

3. japan nuclear earthquake plant power tsunami cri-sis japanese oil quake radiation stocks disaster tokyoprices world friday tuesday fukushima investors hitcrippled energy reactor water march monday safetythursday plants economic reactors market governmentweek wednesday year country devastating wall highworkers radioactive concerns worst street officials per-cent companies damage electric report massive fearsearnings damaged economy impact food levels agencyoperator global daiichi plans markets stock chernobylsales coast growth rise higher states atomic fuel unitedrisk stricken health crude supply dollar low strong de-mand level recovery day magnitude quarter shares fellstruck tepco billion commodities experts gains relief

4. wedding royal prince kate william middleton lon-don queen britain ireland british irish friday eliza-

Page 13: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Require: {x(1)j }j , T, S, {Kt}t, a0, b0, e0, f0, η0, MaxIteration

Ensure: The sampled value for all the latent variables

1: Randomly initialise all the latent variables according to the generative process;2: for t← 1 to T do3: for iter ← 1 to MaxIteration do4: if t = 1 then5: / ∗ Generating documents ∗ /6: Sample the bottom-layer topics by Eq. (1); Calculate {x(1)vjk1}v,j,k1 ;7: end if8: / ∗ Inter topic structure ∗ /9: Do the upward-downward Gibbs sampler

10: (Algorithm 1 in Zhou et al. (2016)) for {θ(t)j , c(t)j ,p

(t)j }t,j , {Φ(t)}t, r;

11: if t = 1 then12: / ∗ Intra topic structure ∗ /13: Sample {qk1}k1 by Eq. (3); Sample {hvk1}v,k1 by Eq. (4);14: for s← 1 to S do15: Sample {h<s>vk1

}v,k1 by Eq. (6);16: Sample {g<s>vk1

}v,k1 by Eq. (11); Calculate {g<s>·k1 }k1 by Eq. (10);17: Sample α<s>0 and c<s>0 from their gamma posterior;18: Sample {α<s>k1

}k1 by Eq. (9);19: Sample {ω<s>vk1

}v,k1 by Eq. (15); Sample σ<s> from its gamma posterior;20: Sample {w<s>

k1}k1 by Eq. (14);

21: Calculate {π<s>vk1}v,k1 by Eq. (8);

22: Sample {β<s>vk1}v,k1 by Eq. (7);

23: end for24: Calculate {βvk1}v,k1 ;25: end if26: end for27: end for

Figure 1. Gibbs sampling algorithm for WEDTM

3 computer programming web software internet java network language information server

16 web internet server release software download microsoft information page service

24 programming software java language web code tutorials source com information

28 computer network security apple wireless personal access networking virus spam

42 fashion intel chip design processor core pentium designers duo itanium

1 information links web

online resources

home page

search com news

online information

web internet pages

website books page mail book

full-text brochures

how-to summaries guidebooks databases dictionaries

listings maps

keywords

pdf link

unreleased search

org virtual mexico

authoritative disclaimer

iranian

20 web internet server release software download microsoft service browser

beta

server browser software

user windows microsoft

application http

users access

usenet dial-up

browsers newsgroups downloaded

applets wirelessly

msn icq

modem

controller domain

info prediction desktops hosting

kilocalories openings

deployment attractions

29 memory computer

virtual amd cpu disk

cache discovery hardware

athlon

memory computer system

cpu disk

computers user data

software portable

cpu desktop

encryption microprocessor microprocessors

desktops pentium

risc workstations

solaris

soldiers speakers

franz languages

pronounced sun wind ultra

integrated base

37 digital reviews video photo com

roberts camera

julia mobile news

video photo digital phone online

graphics computer awards sony dvd

msn nytimes

materialist wal-mart

umi microarray biographie metasearch

presario keyword

25 edu data

library document

xml algorithms structures

code machine

instruction

document data

system documents

format files

analysis code

applications xml

xml parser parse

encryption http

javascript delete html risc

syntax

42 programming software

java language

code web

tutorials source

developer computer

programming software language

object-oriented computer interactive

python programs

php download

sun platforms

lie developer torvalds hackers notation spielberg speeds schmidt

47 amazon com

books node

american guide

shopping edition

dvd selection

amazon com

edition children

troy profiles

assignment proofs

isolation refereed

scott jeff

john geoffrey

sir smith ryan ian

hugh andrew

books reading node

lecturer accident

pregnancy galaxy

compounding field

temple

19 software systems computer parallel

computing linux

device architecture

management design

software computer systems

computing system

computers algorithms information database

user

scalable unix hpc

tonto freshwater

arg ita

goalkeepers ftp

gopher

32 computer network security apple

wireless personal access

networking virus spam

network networks wireless internet software phone

devices usb

microsoft users

alerts transit

integrating screen cited

slashdot bacterial creators

relevance geared

computer security copied stereo

capability analyzes

sem wds

korea curriculum

43 fashion intel chip

design processor

core pentium

designers duo

itanium

fashion designer clothing clothes

designers accessories

design dresses products trends

intel processor

chip apple

itanium microprocessor

xeon centrino

processors computer

Figure 2. Analogous plots to Figure 6 in the main paper for a tree about “computers” on the WS dataset.

beth designer princess april couple week palace peoplevisit idol american day fashion world made abbey artbride marriage diana duchess secret time honeymoonphoto king dress dinner famous sarah cake back medialady buckingham year ring show photos hat coveragewednesday party saturday mcqueen month morningpop watch guests charlie days sheen westminster fu-

ture kardashian star television night work crown tributeago duke officials worn check years pre kim maya wearfinal site met ceremony music tonight museum nuptialsharry ferguson pounds royals trip tuesday turned

5. apple sony mobile data ipad company corp networkphone billion software google iphone computer mil-

Page 14: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

14 party political democratic labor communist wikipedia election socialist campaign politics

5 political democracy system government republic parliamentary politics presidential president social

17 party political labor democratic communist election socialist campaign republican communism

36 wikipedia encyclopedia wiki category article united culture knowledge word model

1 information links web

online resources

home page

search com news

online information

web internet pages

website books page mail book

full-text brochures

how-to summaries guidebooks databases dictionaries

listings maps

keywords

pdf link

unreleased search

org virtual mexico

authoritative disclaimer

iranian

3 economic international development

public policy

national council sector global

organization

economic international development community

organization global

national united public europe

transparency cooperation

nongovernmental investment

strengthening imf

undp macroeconomic competitiveness

strengthen

manufacture manufacturer

directory china taiwan

products manufacturers

exporter supplier suppliers

4 news bbc com story blog

people comics article msnbc

talk

dead killed

violence fled

muslim god

cartoons minister morning cartoon

news com

people washington

story article bush blog

stories world

8 political democracy

system government

republic parliamentary presidential

politics president

social

political democracy government presidential

leader politics

parliamentary president

democratic conflict

totalitarian democratic dictatorial

semi-presidential liberties socialist

centralized accountable economically

nonviolent

republic africa congo

parliament kenya france czech

indonesia lebanon

feb

27 party political labor

democratic communist

election socialist

campaign republican

communism

party democratic communist

socialist political election

campaign elections

committee congress

9 wikipedia encyclopedia

wiki category

article united

knowledge word

model the

wikipedia encyclopedia

article motivated

jump physicality sherman

pump manager

dod

language century system

rule forms

boundary distinct function structure

geographical

wiki primary circle

alleges identification

plus staying

psi reporters

greco-roman

17 culture history ancient

american modern

japanese western asian

religion roman

culture history ancient cultural

literature religion folklore

buddhism cultures chinese

Figure 3. Analogous plots to Figure 6 in the main paper for a tree about “movie” on the WS dataset.

49 war military force world revolution nuclear navy civil weapons research

12 gov house congress information legislation government federal budget senate medicare

13 economic research international development public national policy council sector global

29 war military force revolution nuclear navy world civil weapons army

34 economic international development public national policy research council sector global

48 edu data library document xml research algorithms structures code machine

1 information links web

online resources

home page

search com news

online information

web internet pages

website books page mail book

full-text brochures

how-to summaries guidebooks databases dictionaries

listings maps

keywords

pdf link

unreleased search

org virtual mexico

authoritative disclaimer

iranian

14 gov house

congress legislation

government federal budget

information senate

medicare

committee congress senate federal

legislative legislature

bill amendment legislation

appropriations

2 edu school

university department graduate course

students science college faculty

school university graduate college

students science teaching program

education harvard

grad brookings princeton doctoral drexel suny cuny mba

graduate fordham

3 economic international development

public policy

national council sector global

organization

economic international development community

organization global

national united public europe

transparency cooperation

nongovernmental investment

strengthening imf

undp macroeconomic competitiveness

strengthen

manufacture manufacturer

directory china taiwan

products manufacturers

exporter supplier suppliers

11 research gov

writing papers

laboratory project science national

fields researchers

research information

program science studies

materials scientific

researchers documents

environmental

res anl

mcafee sciences

dobb nih fas

herein laboratories

ahrq

4 news bbc com story blog

people comics article msnbc

talk

dead killed

violence fled

muslim god

cartoons minister morning cartoon

news com

people washington

story article bush blog

stories world

35 war military force

revolution nuclear

navy world civil

weapons army

war military force navy civil

invasion air

army personnel

troops

weapons nuclear bomb bombs

explosive israeli fuel

weapon radiological

tank

8 political democracy

system government

republic parliamentary presidential

politics president

social

political democracy government presidential

leader politics

parliamentary president

democratic conflict

totalitarian democratic dictatorial

semi-presidential liberties socialist

centralized accountable economically

nonviolent

republic africa congo

parliament kenya france czech

indonesia lebanon

feb

20 web internet server release software download microsoft service browser

beta

server browser software

user windows microsoft

application http

users access

usenet dial-up

browsers newsgroups downloaded

applets wirelessly

msn icq

modem

controller domain

info prediction desktops hosting

kilocalories openings

deployment attractions

25 edu data

library document

xml algorithms structures

code machine

instruction

document data

system documents

format files

analysis code

applications xml

xml parser parse

encryption http

javascript delete html risc

syntax

Figure 4. Analogous plots to Figure 6 in the main paper for a tree about “war” on the WS dataset.

17 school year tax states state health high college prices money

9 state governor bill law house wisconsin texas senate court budget

18 school college students schools chicago education high university public student

51 tax year prices health states money million pay state costs

60 derby rail kentucky animal morning kingdom line winner horse belmont

65 city york day state texas tornado year town governor home

81 rules law internet tuesday federal states laws companies groups government

87 food oil health gas energy recipes natural water spill year

4 tax health money states million

pay billion costs

federal insurance

money tax pay

million health billion costs

government care

insurance

medicaid medicare

tax money loans loan

income costs

pension borrowers

29 budget republican

house republicans

senate plan cuts

spending bill

congress

republican senate budget

republicans debate

democrats congressional

bill democratic congress

president party

support john

leader opposition

united democrats

leaders wednesday

dominated head ticked

nevada smoke

courteney apologizing

hasn haley steve

39 school college

students schools chicago

education high

university student

cooperative

school college

students education university schools teachers student colleges graduate

41 state governor

bill texas law

house immigration

arizona senate year

bill law

abortion immigration

repeal state

legislation texas

massachusetts legislature

43 rules law

internet laws

federal tuesday

companies government

states ban

laws rules

regulations legislation

ban federal

companies restrictions

officials law

proceeds communications

impede create official

suspends effectively explosives handling promised

65 court supreme appeal judge

appeals federal case ruling

monday ruled

court supreme

judge appeals

ban decision

case appeal federal friday

87 wisconsin union state public law

workers bargaining

rights collective

ohio

republican vote

legislation county election

legislature senate

democrats congressional republicans

1 time back years long make good week year fans big

feel things

bit sense pretty feeling

moment moments

smile ability

time long back years good make work

season times man

12 city york day

vegas home center

las town park

square

city park

home anniversary

day center million

memorial world

downtown

chapel park

disneyland downtown

avenue mall

replica hotel

garden cemetery

13 women population

census cities young white

american growing

world change

women population

children growing people living age

world poor

hispanic

gender social

feminist religion

inequality affluent

minorities culture theory racial

10 prices sales year

march april

economy percent growth

unemployment consumer

prices sales

percent economy

year rate

decline fell

months month

inflation growth upward sharply prices

downward downturn

inventories recession

rate

52 cars toyota auto car

production ford

motor plant parts

motors

cars toyota

car production

electric auto

automakers engine trucks fuel

chevrolet hybrid cruze

camaro midsize

gmc sedan sedans pickups sporty

83 bus crash killed train york

driver crashed people plane killing

bus crash truck train

accident passengers

police driver

hit southern

court agrees appeal

decision means ruling made

months regular season

35 star show girl

schwarzenegger baby black

celebrity arnold news reality

actress sarah

girl katie kate

daughter britney jessica

tells girlfriend

wife woman mother actress show

daughter man girl

interview red

74 derby rail

kentucky animal

morning kingdom

line winner horse

belmont

sunday saturday

match friday race open

thursday crowd

win week

derby belmont

preakness thoroughbred

horse kentucky stakes horses race

baffert

75 sheen day

charlie sports box

office replay photo

weekend men

summer movie year york

games win

hollywood night

american play

replay surprises

tour miss thor

sequel messina escort

accusations sequels

moviegoers episodes cheers scream

hangover fool

roaring broadway madness doodle

8 tornado tornadoes

people storms texas

weather homes south

alabama joplin

area southern county

hit homes south

california thousands northern florida

winds wildfires storm rain rains

wildfire hurricane flooding tropical floods

30 police man

woman year

family dead found death killed fire

woman girl boy

killing child

stabbed mother gunman

apartment kills

police woman

man killed found

mother home injured dead shot

54 oil gas

energy natural

spill drilling

gulf mexico

coal climate

gas oil

energy coal mine

drilling water mining

petroleum natural

91 death drug man

tuesday killer

search execution

found texas

penalty

found officials tuesday relatives police days

investigators evidence monday

drug

killer injection

executions penalty heroin

sentence sedative painkiller

lethal kills

44 world race

olympic cup

champion hockey

olympics london

championships wins

championships world

champion win won

olympics cup

championship olympic friday

olympic champion

cup usain

medalist olympics

win cycling

swimmer denmark

46 nfl players lockout league labor

owners judge

goodell court roger

talks negotiations agreement

meeting discuss dispute hearing monday

deal scheduled

nfl league

nba football games season

undrafted team sports coach

players lockout cheryl retains riding

rheumatoid amended

shots brit wiki

59 food health recipes

wine coffee

starbucks foods fresh eat

make

food products

meat vegetables

recipes chocolate

salad eat

coffee foods

chocolate wine bread

vegetable champagne

baked broccoli beans foods pasta

66 united world china states global

countries million year food

international

china world

countries economic

global africa

growth nations country

economy

overheating thirst

hazards salmonella pregnancy

pregnancies mortality sickness midlife

worsens

76 deal year

agreement trade

contract reached company

talks sign

extension

deal agreement

talks contract

negotiations accord

company signed plan

government

collapsed staying portal doubt action putt

balanced rights

marketplace join

Figure 5. Analogous plots to Figure 6 in the main paper for a tree about “daily life” on the TMN dataset.

18 libya obama president libyan war military gaddafi united muammar coast

20 libya obama president libyan gaddafi united military muammar war coast

46 protesters protests syrian forces syria government bahrain protest killed president

66 kahn strauss libya lede dominique imf journalists middle war french

95 war military afghanistan american report iraq rights human civil troops

1 time back years long make good week year fans big

feel things

bit sense pretty feeling

moment moments

smile ability

time long back years good make work

season times man

7 united states arab

minister iran

world east

middle state

foreign

united president minister

arab foreign

iraq iran

saudi world states

iran russia

sanctions diplomatic

talks summit tensions peace

countries envoy

36 war military

afghanistan american

iraq rights report human

civil troops

war intelligence

military human

iraq afghanistan

crimes army

security troops

58 libya libyan

gaddafi muammar

nato military rebels

qaddafi leader

fly

troops military force nato

forces war

government rebels friday

security

libya gaddafi libyan

muammar moammar

qaddafi gadhafi

benghazi libyans vieira

67 obama president barack white house

administration speech

wednesday support

visit

president visit

barack house bush

obama presidential secretary minister prime

obama limbo downs

als inhabiting

foal unaware

twins seafood

businesses

79 south sudan border army north region africa

people tunisia town

south border north killed troops

soldiers northern

africa thousands hundreds

court agrees appeal

decision means ruling made

months regular season

escalating enclave border restive tension congo ethnic

tensions bloodshed

defuse

86 israel gaza

palestinian israeli hamas

palestinians west

netanyahu bank

peace

israeli palestinian

israel gaza

palestinians militants

talks jerusalem

peace hamas

court agrees appeal

decision means ruling made

months regular season

89 president iran

german merkel news cuba

minister germany

cuban state

minister president

chief told

government economic

prime deputy agency

economy

cuban cuba

europe wide exiles

moratorium arrest island goal ideas

94 coast ivory

gbagbo ouattara laurent forces russian putin russia

president

forces troops

opposition fighting

supporters army fight

military leader rebels

ivory ouattara gbagbo laurent ivorian

alassane cocoa seized thierry liberian

loyalty president

move return

strongman young patriot

intensifies contest

son

5 protesters protests syrian syria

forces bahrain protest

government security police

police killed

saturday soldiers protests

thousands clashes

city protesters hundreds

protesters protests

opposition protest

government crackdown democracy thousands

party rights

protestors protesters

demonstrators inciting slogans

riot protests batons rioting mobs

28 election party

minister prime

elections vote

berlusconi opposition president

government

elections election

party prime

minister government

vote opposition

parliamentary ruling

81 yemen president

saleh ali

abdullah opposition

yemeni power leader step

president government

talks opposition

officials leader

tuesday monday

deal wednesday

impasse talks

standoff withdraw

protracted persuade

restart negotiations

avert showdown

ali saleh

abdullah yemeni sanaa iran

persian qatar

uranium saudi

31 guilty prison

charges years

sentenced court

murder trial man year

prison convicted

guilty pleaded murder criminal charges

jail lawyer

conviction

man prison court years trial

month case

woman convicted charges

crimes genocide murder ratko serb

tribunal criminal

prosecutor hague serbian

45 lawsuit million filed case suit

federal judge pay

court sued

filed lawsuit million court case

company pay legal rights claim

defamation lawsuit

infringement filed

alleging suing

wrongdoing libel sued

violating

61 libyan forces rebels nato

gaddafi rebel libya

muammar city

misrata

forces troops artillery army

fighters militants attack

airstrikes rebels force

sunday day main front

attack friday left

gunfire troops killed

misrata gaddafi

muammar qaddafi

moammar gadhafi

libya tripoli

benghazi libyan

80 kahn strauss

lede dominique

libya imf

journalists middle french hotel

journalist journalists

woman television

told police news media

security war

98 giffords hospital surgery gabrielle

rep doctors chavez brain

venezuela houston

surgery hospital injury brain clinic knee

medical transplant

repair doctors

giffords gabrielle caracas

zsa danisco

jalen congresswoman

polycom del

ronald

35 star show girl

schwarzenegger baby black

celebrity arnold news reality

actress sarah

girl katie kate

daughter britney jessica

tells girlfriend

wife woman mother actress show

daughter man girl

interview red

39 school college

students schools chicago

education high

university student

cooperative

school college

students education university schools teachers student colleges graduate

54 oil gas

energy natural

spill drilling

gulf mexico

coal climate

gas oil

energy coal mine

drilling water mining

petroleum natural

97 pirates ship navy

somali release

crew pirate

american jack

disney

ship navy

pirates ships boat

warship naval waters aboard pirate

russian talks friday

monday french agency british russia time

thursday

dodgers calif jack pete

champion buffett warren turtle

anchor ichiro

Figure 6. Analogous plots to Figure 6 in the main paper for a tree about “politics” on the TMN dataset.

30 food health study found air years time scientists people recipes

26 air found france scientists sea species ocean atlantic crash years

33 food health recipes time wine make coffee starbucks free foods

37 study women cancer drug people risk population heart patients census

51 tax year prices health states money million pay state costs

1 time back years long make good week year fans big

feel things

bit sense pretty feeling

moment moments

smile ability

time long back years good make work

season times man

42 air found france

scientists sea

species atlantic ocean crash

observatory

found scientists discovery wreckage evidence

discovered early

missing site sea

whales sea

volcano birds deer arctic

elephant mammals monkeys species

air france coast ocean french

sea water

atlantic flight paris

54 oil gas

energy natural

spill drilling

gulf mexico

coal climate

gas oil

energy coal mine

drilling water mining

petroleum natural

70 air flight plane airport airlines

jet traffic airline boeing flights

air flight plane

passengers aircraft flights airport planes boeing

jet

probe obama

instruction weight

devastating pension

laser san

manuals rips

3 dies book life art

family love

author died taylor age

art writer book love

author artist stone writes books

biography

book life

family world man love

music time

death story

6 business companies

tech small china start

market investors

make big

business companies

make investors

investment big

market money

buy good

investor marketers advertisers

savvy networking profitable investors startups

niche clout

12 city york day

vegas home center

las town park

square

city park

home anniversary

day center million

memorial world

downtown

chapel park

disneyland downtown

avenue mall

replica hotel

garden cemetery

59 food health recipes

wine coffee

starbucks foods fresh eat

make

food products

meat vegetables

recipes chocolate

salad eat

coffee foods

chocolate wine bread

vegetable champagne

baked broccoli beans foods pasta

68 questions readers books letter editor

amazon reader kindle book

apologizes

readers read

words comments

books book letters text

newspaper messages

95 madoff coli

outbreak people

flu ponzi health

bernard virus

deadly

flu disease people infected

virus outbreak

killed infections reported deaths

loan europe minister scheme made faction shared nation safety

populations

2 study cancer drug risk

heart patients disease drugs

researchers people

drug patients disease cancer breast

risk drugs

diabetes blood

treatment

study cancer

researchers disease doctors people found

studies tests

research

disorder overweight

asthma obese

patients symptoms

obesity estrogen diabetic

medication

diabetes treatments

clinical vaccine

prescribe arthritis

nutritional aspirin allergy asthma

10 prices sales year

march april

economy percent growth

unemployment consumer

prices sales

percent economy

year rate

decline fell

months month

inflation growth upward sharply prices

downward downturn

inventories recession

rate

13 women population

census cities young white

american growing

world change

women population

children growing people living age

world poor

hispanic

gender social

feminist religion

inequality affluent

minorities culture theory racial

4 tax health money states million

pay billion costs

federal insurance

money tax pay

million health billion costs

government care

insurance

medicaid medicare

tax money loans loan

income costs

pension borrowers

41 state governor

bill texas law

house immigration

arizona senate year

bill law

abortion immigration

repeal state

legislation texas

massachusetts legislature

52 cars toyota auto car

production ford

motor plant parts

motors

cars toyota

car production

electric auto

automakers engine trucks fuel

chevrolet hybrid cruze

camaro midsize

gmc sedan sedans pickups sporty

83 bus crash killed train york

driver crashed people plane killing

bus crash truck train

accident passengers

police driver

hit southern

court agrees appeal

decision means ruling made

months regular season

Figure 7. Analogous plots to Figure 6 in the main paper for a tree about “food and health” on the TMN dataset.

Page 15: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

10 nokia lumia blackberry bbm phone android will channel window tablet

4 nokia lumia phone window tablet smartphone will inch device microsoft

48 blackberry bbm android channel app user social will device messenger

1 year time thing will big

good week best well don

lot thing pretty good think feel

going hard way find

year day

good best time will well long top ago

5 nokia lumia phone window tablet

smartphone will inch

device launch

will apple

company microsoft market

sell video buy

stock computer

nokia lumia liam

competing rollout

celebration aug

cancelled estimate

immigration

smartphone blackberry

smartphones ipad

megapixel kinect

handset app

cellphone android

38 black friday deal

monday cyber best sale

holiday shopping

find

best gift

online list buy web copy data

million record

black cyber

introduce bombshell decision

impromptu govt

pentagon pair brink

friday deal

monday asked

lay ian

stopping dallas tablet favor

56 good bmw claim order

durable car

jobless week labor drop

percent car

auto economy earnings

retail jobless market month

forecast

66 sony kinect patent apple

company tech

sensor device

technology wig

software microsoft

data company

application technology

google user

design communication

14 tweet print email nov

november update

comment share photo

reuters

photo update email est

updated headline monthly column

gmt editor

60 blackberry bbm

android channel

app user

social will

device messenger

blackberry android

messaging networking

user iphone apple

browser app

software

will network service major region internet group online mobile

national

Figure 8. Analogous plots to Figure 6 in the main paper for a tree about “phone” on the Twitter dataset.

37 thanksgiving friday black day holiday hanukkah year will time shopping

2 friday black thanksgiving shopping deal holiday store monday day cyber

6 thanksgiving hanukkah holiday year day time will turkey jewish family

19 cameron benefit david migrant movember keith urban prime will minister

37 illinois northern lynch michigan western husky jordan game quarterback season

45 thanksgiving macy parade day balloon will york band wind nov

95 hewlett packard share earnings quarter revenue company year fourth book

1 year time thing will big

good week best well don

lot thing pretty good think feel

going hard way find

year day

good best time will well long top ago

4 friday black

thanksgiving shopping

store day

holiday retailer year

shopper

shopping thanksgiving

store holiday

mall street food

discount christmas

retail

day season

year will time night early

training full

open

friday black unity

jeremy bill

compromise stricter

compare failed divide

14 tweet print email nov

november update

comment share photo

reuters

photo update email est

updated headline monthly column

gmt editor

38 black friday deal

monday cyber best sale

holiday shopping

find

best gift

online list buy web copy data

million record

black cyber

introduce bombshell decision

impromptu govt

pentagon pair brink

friday deal

monday asked

lay ian

stopping dallas tablet favor

7 thanksgiving hanukkah

holiday day year time

turkey will

jewish family

hanukkah thanksgiving

celebrate holiday feast

dinner meal

hannukah dessert

eat

year million family day

dinner people book event time week

2 photo nov

news november

image getty file

reuters topic

photograph

cox press editor david

columnist chris mail anne photo journal

news web

service wednesday

weekly press photo staff

coach san

30 cameron benefit david

migrant movember

keith urban prime

minister hair

foreign jean

european minister

government protest

ban opposition

inflation british

movember moustache eyebrow cervical raping

taxpayer oxytocin

huy prostate

lip

41 illinois northern

lynch michigan western husky jordan

quarterback game

season

championship win

season consecutive

victory cup

touchdown football

bowl basketball

73 usa today sport credit

mandatory center

nov image

oct quarter

field jeff

center game

forward houston

tied dallas

chicago ball

manhattan brown oldest

updated questioned

georgia bryant queen enlarge

filmmaker

79 press associated published

print bill bon

picture jovi nfl

comment

newspaper weekly scott news paul

coach tom

director told

spokeswoman

86 school rape

steubenville ohio

charge year case girl

charged leave

police investigation

jury attorney abuse senior court

school rape

criminal

ohio location

announced conspiracy

settled monday stunning conduct

died three

50 google street view

station meningitis

airport university student

princeton train

service space travel airport

site will visit

mission center

destination

university school health college medical center degree

treatment mental state

53 macy parade

thanksgiving day

balloon york will

band wind

school

mph band park

school night tour

street day

festival beach

89 hewitt love

jennifer baby birth girl

brian married actress

husband

daughter married

christmas girl boy

pregnant baby wife

girlfriend divorced

actress gameday

excitement celebration masterpiece hollywood

reveals sharlto peyton frantic

19 hewlett packard share

earnings quarter revenue company

fourth sale nyse

earnings percent

profit quarter stock

market billion share

forecast growth

hewlett packard

keith bright

knowshon grow

walking secretive published

unsure

49 book psalm printed auction

bay baghdad record killed iraq

tuesday

killed wounded

police bomb city

attack blast dead

explosion people

sotheby printed

sold bought

collection auctioned

selling record

purchased printing

book psalm

approval geek

recorded bryant

tue etat

monogamy couch

Figure 9. Analogous plots to Figure 6 in the main paper for a tree about “thanksgiving” on the Twitter dataset.

16 net game brooklyn knicks york star point jellyfish robot lakers

11 net brooklyn jellyfish robot piston flying raptor toronto kidd point

12 knicks york game star will usa lakers wizard lecavalier mike

15 bryant kobe lakers year extension angeles los contract nba signed

62 star wizard game duck lakers dallas washington wall anaheim point

1 year time thing will big

good week best well don

lot thing pretty good think feel

going hard way find

year day

good best time will well long top ago

2 photo nov

news november

image getty file

reuters topic

photograph

cox press editor david

columnist chris mail anne photo journal

news web

service wednesday

weekly press photo staff

coach san

12 net brooklyn jellyfish robot piston flying raptor toronto

kidd point

game win

team scored

play saturday quarter

lead goal

coach

fossil robotic

tail aircraft

quantum plane sea

ancient design space

51 knicks york mike

anthony woodson carmelo

stoudemire shumpert

game season

season game coach playing injury night start going injured team

stoudemire couldn

carmelo amare lebron lakers

shootaround didn gasol knicks

betting beijing

question remark tuesday

wednesday china kong

morning regulator

22 bryant kobe lakers year

extension angeles

los contract

nba signed

contract month return nba sign team year

season tuesday

deal

angeles lakers

los gasol nba tim

segundo laker star

roster

kobe bryant year

reference yeezus fastest forcing desktop reveals

preventing

29 star wizard game duck dallas lakers

washington wall

anaheim point

game goal

scored play star

scoring quarter

win second season

73 usa today sport credit

mandatory center

nov image

oct quarter

field jeff

center game

forward houston

tied dallas

chicago ball

manhattan brown oldest

updated questioned

georgia bryant queen enlarge

filmmaker

87 lecavalier tampa

johansson globe

scarlett golden

flyer lightning

bay will

film play

movie winning

title football player time win best

press philadelphia

eric story

phoenix downtown

florida written father

multiple

lecavalier vinny

scarlett samantha

legend geek

joaquin seth jan

pope

14 tweet print email nov

november update

comment share photo

reuters

photo update email est

updated headline monthly column

gmt editor

Figure 10. Analogous plots to Figure 6 in the main paper for a tree about “NBA” on the Twitter dataset.

Page 16: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

lion microsoft market amazon tablet playstation makerdevices wireless app technology security service salesusers year smartphone android customers phones dealplans nokia report world week system online breachsmartphones services amp apps group percent buy ex-ecutive top thursday music information ceo computerssamsung consumer location chief wednesday storeslargest research business electronics growth booksprices jobs china windows major selling hackers firmlaunch high digital tuesday friday shares intel com-panies systems tablets make years device city peoplepopulation energy verizon blackberry profit internetbid store

Below are some example synthetic documents withWEDTM trained on the WS dataset.

1. research papers thesis project writing edu paper disser-tation proposal reports patent technical report topicsguide academic exploratory information write methodsresearchers publications parallel process phd issuesideas computing media topic custom projects essayhelp archives tools page labs web lab university com-munication resources ibm proposals intellectual homeadvice cluster dissertations survey library essays ad-dress practical links step abstract property bell systemslinux performance master serve abstracts developedwritten department index quality exploring matters restheses documents course college collaborative process-ing programme materials focused preparing computerprogram students search guidelines online articles ex-perience services recommendations welcome conductfunding aspects idea people

2. football team league news american fans nfl playerssoccer stadium sports indoor national com home statis-tics club history hockey baseball college fan associa-tion teams arena england game player texas websitescores basketball professional fixtures espn season ncaaclubs university afl schedule tickets minnesota inde-pendent tables athletics sport welcome united leaguesdedicated ice chicago giants nba stadiums online rugbystats nhl gymnastics coverage coaches levels deliverscity arsenal index coach database conference standingsinformation cstv fantasy bears articles views athleticmoved rules assault bulls supporters officials kickoffdivision games women tournament photos body mediafame covering adjacent sexual federation pages rutgers

3. health hiv prevention stress healthy aids diet nutritionhepatitis fitness food epidemic pressure infection in-formation cdc disease people blood life gov diseasesepidemiology treatment cholesterol picnic tips medicalarticles children safety symptoms fat topics heart foods

living centers fatty help kids liver care researchers eat-ing diagnosis who weight body virus recipes humancontrol viral sheets com helps patterns and compre-hensive diabetes risk trials int learn conditions adviceinfluenza diets exercise affect public press consensusconstant united eat infections listing unaids expertsmind personality disorder causes world flu infectiousguidelines global loss women avian humans populationucsf cope job brains index

4. system government parliamentary republic presidentialmaradona diego parliament systems westminster free-dom independence democracy semi-presidential con-sensus constitutional armando elected congressionalconstitution executive america people declaration elec-tions president czech gov australia election documentshand french political national congress authority unitedarchives legislature kingdom country governmentspowers power public british representation canada ar-gentine versus buenos separation born legislative cen-tral assembly affairs republics india peace france srisemi english branch body operates charters romaniadistributed presidency practice the parliaments govern-ing house countries britain rule federal principles jurylanka foreign bill govern nunavut portuguese irelandaires immense head qld parl pub shtml politics ukraineparties

5. school research college edu graduate university edu-cation students elementary center student library dis-trict information schools program programs degree citypublic media papers phd master thesis law departmentundergraduate home project staff america regional writ-ing scholarship calendar independent academic townpaper administration news scholarships north disser-tation community faculty welcome archives virginiachildren president union wisconsin temple proposalparents ohio pennsylvania teachers philadelphia re-ports guide houston nation degrees chronicle servicesmassachusetts patent opportunities boston technicalactivities county study masters east report employmentcommission usa expo lewis michigan san academymemorial topics meeting institute campus homepageweb arizona maine business announcements philly hub

ReferencesPolson, N., Scott, J., and Windle, J. Bayesian inference

for logistic models using Polya–Gamma latent variables.Journal of the American Statistical Association, 108(504):1339–1349, 2013.

Wray, B. and Marcus, H. A Bayesian view of the Poisson-Dirichlet process. arXiv preprint arXiv:1007.0296v2[math.ST], 2012.

Page 17: Inter and Intra Topic Structure Learning with Word Embeddingsmingyuanzhou.github.io/Papers/WEDTM_ICML2018.pdfWord embedding topic models: Recently, there is a grow-ing interest in

WEDTM

Zhao, H., Rai, P., Du, L., and Buntine, W. Bayesian multi-label learning with sparse features and labels, and labelco-occurrences. In AISTATS, pp. 1943–1951, 2018.

Zhou, M. Softplus regressions and convex polytopes. arXivpreprint arXiv:1608.06383, 2016.

Zhou, M. and Carin, L. Negative binomial process countand mixture modeling. TPAMI, 37(2):307–320, 2015.

Zhou, M., Cong, Y., and Chen, B. Augmentable gammabelief networks. JMLR, 17(163):1–44, 2016.


Recommended