Deep Learning...

Deep Learning Applications

Fall 2018

/ 38Outline

► Problem Definition

► Background & Related Works

► Proposed Method

► Experimental Results

► Conclusion &Future Works

Problem Definition

3

/ 38Introduction

► What is Multimodal Data?

►Multiple channels of input

►Multiple views of a same concept

A Red Bird in a jungle.

پرنده‌ی‌قرمز Red Bird الطائر‌األحمر 红鸟 Красная Птица

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Applications

►Help inter-modal retrieval.

►Help intra-modal retrieval.

►Help Classification or clustering.When you type “مطمئن” Google search

engine retrieves this image.

When you search similar images for left image Google search engine retrieves right image.

Sport Delicious

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Challenges

►Distinct Modality Specific Properties.

►High Correlation Between Modalities.

►Higher intra-modality than inter-modality correlation.

Man eating apes

Man-eating apes

Red apple on the book

More Correlation

Less Correlation

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Problem Formulation

►Inputs:

► Two modalities like X and Z

►Goals:

► Extracting the most informative representation from X and Z

►Ability to generate missing modality from the present one

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Background

8

/ 38Deep Neural Networks

►Traditional neural networks using:

►More training data

►Deeper Architecture

► Better Optimization algorithms

►Popular Deep Neural Networks:

► Stacked Denoising Auto-encoders

► Recurrent Neural Networks

►Generative Adversarial Networks

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38De-noising Auto-encoders

►Corrupt Input data with the noise of its own.

► Try to find a representation for corrupted version of data in order to

reconstruction has the most information about clean input.

𝑿෩𝑿

𝒁𝒀 𝑰(𝑿, 𝒁)

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Stacking Auto-encoders (SAE)

►Extract high level representation by stacking auto-encoders in a

deep manner.

𝑿

𝒀

𝑿′𝒀’

𝒁

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Recurrent Neural Networks (RNNs)

► Feedforward networks with additional recurrent edges

► Powerful for sequential data like sentences

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Generative Adversarial Networks (GANs) [3]

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

[3] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.

Related Works

14

/ 38Multimodal Deep Learning [1]

► Use two modality-specific auto-encoders and a joint layer on top of them

Train network in order to reconstruct every modality from the other and itself.

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

[1] Ngiam, Jiquan, et al. "Multimodal deep learning." Proceedings of the 28th international conference on machine learning (ICML-11). 2011.

/ 38MDL-CW: A multimodal deep learning framework with cross weights [2]

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

[2] Rastegar, Sarah, et al. "Mdl-cw: A multimodal deep learning framework with cross weights." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

/ 38Generative Adversarial Text to Image Synthesis [4]

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

[4] Reed, Scott, et al. "Generative adversarial text to image synthesis." arXiv preprint arXiv:1605.05396 (2016).

/ 38Prior Works

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Approach Pros. Cons.

SAE(Ng 2011, Sohn 2014)

Simple implementation Discarding low level interactions

MDL-CW(Rastegar 2016)

Considers lower level interactions Non-generative

RNN(Socher 2013, Karapathy

2014, Karapthy 2015)

Considers Sentence Structure Convergence problem

GAN(Reed 2016)

Generative Memorization

Proposed Method

19

/ 38Shadow Networks

►Train a network to extract when a certain class is absent

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Relativeness

►Two relative data are similar in one particular sense

► Binary Relativeness

► Fuzzy Relativeness

►Relativeness is a function of representation level

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Representation Binding by degree K

►For each of K final Layers the representation for two relative data are

the same

►Relativeness is a function of level so two relative data in a level can

be irrelative in other levels

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Binding Representations for both networks

►Main network:

► For each level choose nearest neighbors among relatives from higher layer

► Bind the representation in this layer for these relatives

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Relatives

Layer

Horses Dark HorsesDark Arabic

Horses

Final Before final Two Before final

/ 38Binding Representations for both networks

►Shadow network:

► For each level choose farthest neighbors among relatives from higher layer

► Bind the representation in this layer for these relatives

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Relatives

Layer

Non-Horses Dog, Plane, Table, …

Final Before final

/ 38Cross Edges

►Learn cross edge weights between shadow and main networks

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38Representation Gating

►Three representations are available from lower layer representations

►Using modality presence signals to deduce final representation

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Gate

Same modality

Cross modality

Cross modality shadow

Modality Presence signals

Higher level representation

Higher Same modality

Higher Cross modality

Experimental Results

27

/ 38Experimental Results

►We have used PASCAL-Sentence for this section:

► Each image annotated by 5 sentences

► 500 train and 500 test images

► 1408 textual features

► 260 visual features

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

/ 38PASCAL-Sentence Dataset Experiments

Text to whole Image and Text Image to whole Image and TextPro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

[1] [6][7][9][12][10]

/ 38Qualitative Results

Image to wholeImage & TextText to wholeImage & Text

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Conclusion

31

/ 38Conclusions

►Using Shadow networks allows us to detect non-existence of topics

►Using Representation binding leads to better generalization

►Gating representations preserve informative representation and do

not corrupt it with weaker representations

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Future Works

33

/ 38Creation and Deception

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Main

Main Generator

Shadow

Shadow Generator

Creation CreationDeception Deception

/ 38Creation

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

►Main Generator generates unreal data which has desired label

/ 38Deception

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

►Shadow generator generates data which deceive the main network

to a wrong label

/ 38Future Works

►Neuron augmentation

►Using RNNs to distinguish between creation and deception

►Implementing brain cognitive functions

►Implementing social interactions between networks

Pro

ble

m

Def

init

ion

Rel

ate

d

Wo

rks

Pro

po

sed

Met

ho

d

Ex

per

imen

tal

Res

ult

s

Co

ncl

usi

on

&

Fu

ture

Wo

rks

Thank You!

/ 38References

1. J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 689–696.

2. S. Rastegar, M. Soleymani M, H. R. Rabiee ,S. M. Shojaee, “Mdl-cw: A multimodal deep learning framework with cross weights” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2601-2609.

3. N. Srivastava and R. R. Salakhutdinov, “Multimodal learning with deep boltzmann machines,” in Advances in neural information processing systems, 2012, pp. 2222–2230.

4. R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, “Grounded compositional semantics for finding and describing images with sentences,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 207–218, 2014.

5. K. Sohn, W. Shang, and H. Lee, “Improved multimodal deep learning with variation of information,” in Advances in Neural Information Processing Systems, 2014, pp. 2141–2149.

6. A. Karpathy, A. Joulin, and F. F. F. Li, “Deep fragment embeddings for bidirectional image sentence mapping,” in Advances in neural information processing systems, 2014, pp. 1889–1897.

7. R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, “Parsing natural scenes and natural language with recursive neural networks,” in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 129–136.

8. M. Rastegari, J. Choi, S. Fakhraei, D. Hal, and L. Davis, “Predictable dual-view hashing,” in Proceedings of The 30th International Conference on Machine Learning, 2013, pp. 1328–1336.

9. B. Ozdemir and L. S. Davis, “A probabilistic framework for multimodal retrieval using integrative indian buffet process,” in Advances in Neural Information Processing Systems, 2014, pp. 2384–2392.

10. P. L. Lai and C. Fyfe, “Kernel and nonlinear canonical correlation analysis,” International Journal of Neural Systems, vol. 10, no. 05, pp. 365–377, 2000.

11. Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Advances in neural information processing systems, 2009, pp. 1753–1760.

/ 38References

11. Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach to learning binary codes,” in IEEE Conferenceon Computer

Vision and Pattern Recognition (CVPR). IEEE, 2011, pp. 817–824.

12. Frome, Andrea, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, and Tomas Mikolov. "Devise: A deep visual-semantic embedding

model." In Advances in Neural Information Processing Systems, pp. 2121-2129. 2013.

13. A. Gionis, P. Indyk, R. Motwani et al., “Similarity search in high dimensions via hashing,” in VLDB, vol. 99, 1999, pp. 518–529.

14. G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Džeroski, “An extensive experimental comparison of methods for multilabel learning,”

Pattern Recognition, vol. 45, no. 9, pp. 3084–3104, 2012.

/ 38Multimodal Deep Boltzmann Machine [1]

/ 38MDL-CL: A Multimodal Deep Learning Framework with Cross Layers

Date post:	27-Jun-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Deep Learning...

Documents