Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent...

Latent LSTM Allocation

Manzil Zaheer, Amr Ahmed and Alexander J Smola

Presented by Akshay Budhkar & Krishnapriya Vishnubhotla

March 3, 2018

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 1 / 22

Outline

1 IntroductionLatent Dirichlet AllocationLSTMs

2 Latent LSTM AllocationAlgorithmInferenceDifferent Models

3 Results

4 Conclusion


Latent Dirichlet Allocation

Probabilistic graphical model

Not sequential, but easily interpretable.


LSTMs

Good for modeling sequential data, preserves temporal aspect

Too many parameters

Hard to interpret


Latent LSTM Allocation (LLA) - Algorithm


Graphical model for LLA


Marginal probability of observing a document is

p(wd |LSTM, φ) =∑zd

p(wd , zd |LSTM, φ)

=∑zd

∏t

p(wd ,t |zd ,t ;φ)p(zd ,t |zd ,1:t−1; LSTM)(1)

Uses a K × H dense matrix and a V × K sparse matrix.



Inference

Stochastic Expectation Maximization is used to compute theposterior.

The Evidence Lower Bound (ELBO) can be written as:∑d

log p(wd |LSTM, φ)

≥∑d

∑zd

q(z) logp(zd ; LSTM)

∏t p(wd ,t |zd ,t ;φ)

q(zd)

(2)

Conditional probability of topic at time step t is:

p(zd ,t = k|wd ,t , zd ,1:t−1|LSTM, φ)

∝ p(zd ,t = k |zd ,1:t ; LSTM)p(wd ,t |zd ,t = k ;φ)(3)

And

p(wd ,t |zd ,t = k ;φ) = φw ,k =nw ,k + β

nk + Vβ(4)



Mathematical Intuition

LDAlog p(w) =

∑t

log p(wt |model)

=∑t

log∑zt

p(wt |zt)p(zt |doc)(5)

LSTMlog p(w) =

∑t

log p(wt |wt−1,wt−2, . . . ,w1) (6)

LLA

log p(w) = log∑z1:T

∏t

p(wt |zt)p(zt |zt−1, zt−2, . . . , z1) (7)


Different Models


Perplexity vs. Number of topics (Wikipedia)


Perplexity vs. Number of topics (User Search)

Cannot use Char LLA, since URLs lack morphological structure


LDA Ablation Study


Interpreting Cleaner Topics


Interpreting Factored Topics


LSTM Topic Embedding (Wikipedia)


Convergence Speed


Effect of Joint vs. Independent Training


Final Thoughts

Pros

Provides a knob for interpretability and accuracyLess number of parameters for a reasonable perplexityCleaner factored topics

Cons

Did not compare to something like hierarchical LDACan’t use Char LLA for every problemPerplexity is not a good measure of text generation accuracy


Bibliography

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal ofmachine Learning research, 3(Jan):993–1022.

Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J.,and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks withintrinsically diverse targets. arXiv preprint arXiv:1506.06863.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neuralcomputation, 9(8):1735–1780.

Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Jointclustering and non-linear dynamic modeling of sequence data. In InternationalConference on Machine Learning, pages 3967–3976.


Date post:	03-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent...

Documents