Thai Word Embedding with Tensorflow

TensorFlow + NLPLanguage Vector Space Model (Word2Vec) Tutorial

Goal of this tutorial

• Learn how to do NLP in Tensorflow

• Learning Word embeddings that can extracting relationship between discrete atomic symbols (words) from the textual corpus.

Wordsin TextCorpus

NLP in Deep Learning• Word Embeddings is needed for NLP Deep Learning. Why?

• Image and audio are already provide useful information for relationship between instance (pixels, frames)

• A pixel value of #FF0000 is very similar to #FE0000, since both are red. We can compute the difference automatically.

• Text does not provide useful information about the relationships between individual symbols.

• 'cat' represented as Id537, 'dog' represented as Id143, Computer don’t know relationship between Id537 and Id143.

Vector Space Model• Find the relationship between discrete symbols (in this case,

words).

• Two proposed methods.

• Count-based method.

• How often the same word co-occurs with its neighbor words in a large text corpus. (e.g., Latent Semantic Analysis)

• Predictive-based method.

• Trying to predict the words from its neighbors (e.g., Neural Probabilistic language model).

Word2Vec• Computationally-efficient predictive model for

learning word embedding from raw text.

• Make by Tomas Mikolov at Google.

• 2 Flavors

• Continuous Bag-of-Words (CBOW)

• Skip-Gram model

CBOW• Continuous Bag-of-Words (CBOW)

• Predict target words from source context words.

• Input: "The cat sits on the ______"

• Output: mat

• Example, 3-gram CBOW = (the,cat) =>sits, (cat,sits)=>on, (sits, on)=> the, (on, the)=> mat

• Better for small dataset.

Skip-Gram model • Skip-Gram model

• Predict source context words from target words.

• Input: sits

• Output: "The cat ____ on the mats"

• Example, 1-skip 3-gram Skip-Gram = (the,sits)=>cat, (cat,on)=>sits, (sits, the)=> on, (on, mats)=> the

• Better for large dataset. We use this in the slide.

Noise-Contrastive Training for Vector Space Model

• We are using Gradient decent method for binary regression to modeling word-relationship models. (Neural Network)

• To discriminates the real target words (that exists in the skip-gram model) and the imaginary noise words (that non-exists in the skip-gram model) => We use the following objective function (maximum it)

Negative Sampling

Input• Batch Training, For e.g., Windows Size = 9

• "the quick brown fox jumped over the lazy dog"

• 1-skip 3-gram Skip-Gram = (the,brown)=>quick, (quick, fox)=>brown, (brown,jumpted)=> fox,...

• Dataset: (quick, the), (quick, brown), (brown, quick), (brown, fox),...

Loop• (quick, the), (quick, brown), (brown, quick),

(brown, fox),...

• For each loop, Random pick word that not in windows set as the negative sampling. Then, Stochastic Gradient Descent method adjust the weight for maximum the above objective function.

Tensorflow code

10,000 ข่าว

Clean Data

Step 0

Step 30,000

Step 0

Step 30,000

Date post:	29-Jan-2018
Category:	Data & Analytics
Upload:	kobkrit-viriyayudhakorn
View:	620 times
Download:	6 times

Thai Word Embedding with Tensorflow

Data & Analytics