Latent LSTM Allocation
Manzil Zaheer, Amr Ahmed and Alexander J Smola
Presented by Akshay Budhkar & Krishnapriya Vishnubhotla
March 3, 2018
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 1 / 22
Outline
1 IntroductionLatent Dirichlet AllocationLSTMs
2 Latent LSTM AllocationAlgorithmInferenceDifferent Models
3 Results
4 Conclusion
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 2 / 22
Latent Dirichlet Allocation
Probabilistic graphical model
Not sequential, but easily interpretable.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 3 / 22
LSTMs
Good for modeling sequential data, preserves temporal aspect
Too many parameters
Hard to interpret
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 4 / 22
Latent LSTM Allocation (LLA) - Algorithm
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 5 / 22
Graphical model for LLA
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 6 / 22
Marginal probability of observing a document is
p(wd |LSTM, φ) =∑zd
p(wd , zd |LSTM, φ)
=∑zd
∏t
p(wd ,t |zd ,t ;φ)p(zd ,t |zd ,1:t−1; LSTM)(1)
Uses a K × H dense matrix and a V × K sparse matrix.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 7 / 22
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 8 / 22
Inference
Stochastic Expectation Maximization is used to compute theposterior.
The Evidence Lower Bound (ELBO) can be written as:∑d
log p(wd |LSTM, φ)
≥∑d
∑zd
q(z) logp(zd ; LSTM)
∏t p(wd ,t |zd ,t ;φ)
q(zd)
(2)
Conditional probability of topic at time step t is:
p(zd ,t = k|wd ,t , zd ,1:t−1|LSTM, φ)
∝ p(zd ,t = k |zd ,1:t ; LSTM)p(wd ,t |zd ,t = k ;φ)(3)
And
p(wd ,t |zd ,t = k ;φ) = φw ,k =nw ,k + β
nk + Vβ(4)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 9 / 22
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 10 / 22
Mathematical Intuition
LDAlog p(w) =
∑t
log p(wt |model)
=∑t
log∑zt
p(wt |zt)p(zt |doc)(5)
LSTMlog p(w) =
∑t
log p(wt |wt−1,wt−2, . . . ,w1) (6)
LLA
log p(w) = log∑z1:T
∏t
p(wt |zt)p(zt |zt−1, zt−2, . . . , z1) (7)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 11 / 22
Different Models
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 12 / 22
Perplexity vs. Number of topics (Wikipedia)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 13 / 22
Perplexity vs. Number of topics (User Search)
Cannot use Char LLA, since URLs lack morphological structure
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 14 / 22
LDA Ablation Study
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 15 / 22
Interpreting Cleaner Topics
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 16 / 22
Interpreting Factored Topics
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 17 / 22
LSTM Topic Embedding (Wikipedia)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 18 / 22
Convergence Speed
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 19 / 22
Effect of Joint vs. Independent Training
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 20 / 22
Final Thoughts
Pros
Provides a knob for interpretability and accuracyLess number of parameters for a reasonable perplexityCleaner factored topics
Cons
Did not compare to something like hierarchical LDACan’t use Char LLA for every problemPerplexity is not a good measure of text generation accuracy
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 21 / 22
Bibliography
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal ofmachine Learning research, 3(Jan):993–1022.
Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J.,and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks withintrinsically diverse targets. arXiv preprint arXiv:1506.06863.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neuralcomputation, 9(8):1735–1780.
Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Jointclustering and non-linear dynamic modeling of sequence data. In InternationalConference on Machine Learning, pages 3967–3976.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 22 / 22