+ All Categories
Home > Documents > APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative...

APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative...

Date post: 27-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
36
APS360 Fundamentals of AI Lisa Zhang Lecture 9; June 6, 2019
Transcript
Page 1: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

APS360 Fundamentals of AI

Lisa Zhang

Lecture 9; June 6, 2019

Page 2: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Agenda

Last time:

I Preventing OverfittingI Transpose ConvolutionsI Autoencoder

Today:

I Lab 4; One-hot encoding of categorical variablesI Review autoencoderI Word Embeddings

Page 3: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Lab 4

Page 4: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Lab 4 Task

I The task in Lab 4 is to train a variation of an autoencoder oncategorical and continuous features.

I Not images!I Machine learning practitioners like using images because

humans have good intuitions about images, and can verifyneural network results

Page 5: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

One-hot encoding

A way to convert categorical features into numerical features:

Example: At UofT, the categorical feature “Term” can take onthree possible values: Fall, Winter, Summer

Fall -> [1, 0, 0]Winter -> [0, 1, 0]Summer -> [0, 0, 1]

Use three numerical features to represent the categorical variable“Term”

We already used one-hot encodings in multi-class classification, toencode the target label

Page 6: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

One-hot encoding

A way to convert categorical features into numerical features:

Example: At UofT, the categorical feature “Term” can take onthree possible values: Fall, Winter, Summer

Fall -> [1, 0, 0]Winter -> [0, 1, 0]Summer -> [0, 0, 1]

Use three numerical features to represent the categorical variable“Term”

We already used one-hot encodings in multi-class classification, toencode the target label

Page 7: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

One-hot encoding

A way to convert categorical features into numerical features:

Example: At UofT, the categorical feature “Term” can take onthree possible values: Fall, Winter, Summer

Fall -> [1, 0, 0]Winter -> [0, 1, 0]Summer -> [0, 0, 1]

Use three numerical features to represent the categorical variable“Term”

We already used one-hot encodings in multi-class classification, toencode the target label

Page 8: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Review Autoencoder

Page 9: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Autoencoder Representation

The output of the encoder is a reduced dimension representation ofsome data (e.g. the image of an MNIST digit)

Each point in this embedding space (latent space) represents anMNIST digit, which can be recovered using the decoder.

Page 10: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Structure in the Autoencoder Representation

I Points that are close to each other in the latent embeddingspace will have similar reconstructions (continuity of thedecoder)

I So, the encoder will learn to map “similar” images close toeach other (due to the bottleneck)

Therefore, distances in the autoencoder representation will becomemeaningful

I Example: interpolation example from last classI Example: https://www.youtube.com/watch?v=XNZIN7Jh3Sg

Page 11: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Interpolating in the Latent Embedding Space

I Compared with interpolating in the pixel spaceI When we interpolate the pixels of an image, the interpolated

images do not look like elements of the training set

Page 12: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Size of the latent space: too small

The size of the latent space is the number of output neurons thatthe encoder has.

Q: What if the size of the latent space of the autoencoders is toosmall?

Poor reconstruction (underfitting)

Page 13: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Size of the latent space: too small

The size of the latent space is the number of output neurons thatthe encoder has.

Q: What if the size of the latent space of the autoencoders is toosmall?

Poor reconstruction (underfitting)

Page 14: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Size of the latent space: too large

Q: What if the size of the latent space of the autoencoders is toolarge?

Hint: What if the size of the latent space is equal to the size of thetraining set?

Decoder memorizes training set (overfitting)

Page 15: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Size of the latent space: too large

Q: What if the size of the latent space of the autoencoders is toolarge?

Hint: What if the size of the latent space is equal to the size of thetraining set?

Decoder memorizes training set (overfitting)

Page 16: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Autoencoder Reconstruction

Q: Will an arbitrary image (very different from images in thetraining set) have a good reconstruction?

Page 17: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Autoencoder Use Cases

I Generate new data (using the decoder)I Transfer learning (using the encoder similar to the way we used

AlexNet)I encoder weights are trained to retain a lot of information (for

reconstruction)I AlexNet weights are trained to retain only information relevant

to classificationI Clustering in the latent space (using encoder)I Denoising an image (encoder-decoder)

Page 18: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Denoising Autoencoder Example

https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html

Page 19: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Embedding of Books

https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526

Page 20: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Embedding of Molecules

https://openreview.net/pdf?id=BkSqjHqxg

Page 21: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

How to train embeddings

I Encoder: data -> embeddingI Decoder: embedding -> data

Page 22: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

How to train embeddings (alternative encoder-decoderarchitecture)

I Encoder: data -> embeddingI Decoder: embedding -> data some feature of the data

Page 23: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

How to train embeddings (denoising)

I Encoder: noisy data -> embeddingI Decoder: embedding -> denoised data

Page 24: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Word Embeddings

Page 25: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

History

I The term “Word embedding” coined in 2003 (Bengio et al.)I word2vec model proposed in 2013 (Mikolov et al.)I GloVe vectors released in 2014 (Pennington et al.)

Page 26: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Architecture for Training Word Embedding

I Encoder: word(??) -> embeddingI Decoder: embedding -> ???

How do we encode the word?

What is our target?

Page 27: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

One-hot encoding of words

I Each word has its own “index”I If there are 10,000 words, there are 10,000 features

Page 28: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

One-hot embedding as input to the encoder

I Encoder: one-hot embedding -> low-dim embeddingI Decoder: low-dim embedding -> ???

Page 29: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Text as sequences

Key idea: the meaning of a word depends on its context, or otherwords that appear nearby

There is evidence that children learns new words based on theirsurrounding words.

Page 30: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Architecture of a word2vec model

I Encoder: one-hot embedding -> low-dim embeddingI Decoder: low-dim embedding -> nearby words

Page 31: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Architecture Example

https://jaxenter.com/deep-learning-search-word2vec-147782.html

Page 32: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Architecture Example: Skip-Gram Model

https://arxiv.org/pdf/1301.3781.pdf

Page 33: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Structure of the Embedding Space

I Words that have similar context words will be mapped tosimilar embeddings

Page 34: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

GloVe Embeddings

I word2vec is a family of architecture used to learn wordembeddings (i.e. a word2vec model)

I GloVe is a set of trained word embeddings that someone elsealready trained (i.e. like AlexNetweights)

Page 35: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

Course Coverage

I We will not train our own word embeddings in this courseI We will not discuss the specific of word2vec models and their

variationsI Instead, we will use pre-trained GloVe embeddings

You are not expected to know about specific word2vec modelarchitectures.

You are expected to have intuition about GloVe embeddings, whichwe will talk about now. . .

Page 36: APS360 Fundamentals of AIlczhang/360/lec/w05/lec09.pdfHow to train embeddings (alternative encoder-decoder architecture) I Encoder: data->embedding I Decoder: embedding->datasomefeature

GloVe Embeddings

Let’s look at some!


Recommended