+ All Categories
Home > Documents > Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent...

Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent...

Date post: 03-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J Smola Presented by Akshay Budhkar & Krishnapriya Vishnubhotla March 3, 2018 Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 1 / 22
Transcript
Page 1: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Latent LSTM Allocation

Manzil Zaheer, Amr Ahmed and Alexander J Smola

Presented by Akshay Budhkar & Krishnapriya Vishnubhotla

March 3, 2018

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 1 / 22

Page 2: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Outline

1 IntroductionLatent Dirichlet AllocationLSTMs

2 Latent LSTM AllocationAlgorithmInferenceDifferent Models

3 Results

4 Conclusion

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 2 / 22

Page 3: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Latent Dirichlet Allocation

Probabilistic graphical model

Not sequential, but easily interpretable.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 3 / 22

Page 4: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

LSTMs

Good for modeling sequential data, preserves temporal aspect

Too many parameters

Hard to interpret

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 4 / 22

Page 5: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Latent LSTM Allocation (LLA) - Algorithm

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 5 / 22

Page 6: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Graphical model for LLA

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 6 / 22

Page 7: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Marginal probability of observing a document is

p(wd |LSTM, φ) =∑zd

p(wd , zd |LSTM, φ)

=∑zd

∏t

p(wd ,t |zd ,t ;φ)p(zd ,t |zd ,1:t−1; LSTM)(1)

Uses a K × H dense matrix and a V × K sparse matrix.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 7 / 22

Page 8: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 8 / 22

Page 9: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Inference

Stochastic Expectation Maximization is used to compute theposterior.

The Evidence Lower Bound (ELBO) can be written as:∑d

log p(wd |LSTM, φ)

≥∑d

∑zd

q(z) logp(zd ; LSTM)

∏t p(wd ,t |zd ,t ;φ)

q(zd)

(2)

Conditional probability of topic at time step t is:

p(zd ,t = k|wd ,t , zd ,1:t−1|LSTM, φ)

∝ p(zd ,t = k |zd ,1:t ; LSTM)p(wd ,t |zd ,t = k ;φ)(3)

And

p(wd ,t |zd ,t = k ;φ) = φw ,k =nw ,k + β

nk + Vβ(4)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 9 / 22

Page 10: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 10 / 22

Page 11: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Mathematical Intuition

LDAlog p(w) =

∑t

log p(wt |model)

=∑t

log∑zt

p(wt |zt)p(zt |doc)(5)

LSTMlog p(w) =

∑t

log p(wt |wt−1,wt−2, . . . ,w1) (6)

LLA

log p(w) = log∑z1:T

∏t

p(wt |zt)p(zt |zt−1, zt−2, . . . , z1) (7)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 11 / 22

Page 12: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Different Models

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 12 / 22

Page 13: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Perplexity vs. Number of topics (Wikipedia)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 13 / 22

Page 14: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Perplexity vs. Number of topics (User Search)

Cannot use Char LLA, since URLs lack morphological structure

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 14 / 22

Page 15: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

LDA Ablation Study

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 15 / 22

Page 16: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Interpreting Cleaner Topics

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 16 / 22

Page 17: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Interpreting Factored Topics

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 17 / 22

Page 18: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

LSTM Topic Embedding (Wikipedia)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 18 / 22

Page 19: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Convergence Speed

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 19 / 22

Page 20: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Effect of Joint vs. Independent Training

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 20 / 22

Page 21: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Final Thoughts

Pros

Provides a knob for interpretability and accuracyLess number of parameters for a reasonable perplexityCleaner factored topics

Cons

Did not compare to something like hierarchical LDACan’t use Char LLA for every problemPerplexity is not a good measure of text generation accuracy

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 21 / 22

Page 22: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data.

Bibliography

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal ofmachine Learning research, 3(Jan):993–1022.

Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J.,and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks withintrinsically diverse targets. arXiv preprint arXiv:1506.06863.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neuralcomputation, 9(8):1735–1780.

Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Jointclustering and non-linear dynamic modeling of sequence data. In InternationalConference on Machine Learning, pages 3967–3976.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 22 / 22


Recommended