lda2vec Text by the Bay 2016

lda2vec (word2vec, and lda)

Christopher Moody @ Stitch Fix

About

@chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody

Gaussian Processes t-SNE

chainer deep learning

Tensor Decomposition

https://twitter.com/chrisemoody

http://github.com/cemoody

word2vec

lda

1

23ld

a2vec

1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained

word2vec

1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent

word2vec

word

2vec

word2vec: learn word vector w from it’s surrounding context

w

word

2vec

“The fox jumped over the lazy dog”Maximize the likelihood of seeing the words given the word over.

P(the|over) P(fox|over)

P(jumped|over) P(the|over) P(lazy|over) P(dog|over)

…instead of maximizing the likelihood of co-occurrence counts.

word

2vec

P(fox|over)

What should this be?

word

2vec

P(vfox|vover)

Should depend on the word vectors.

P(fox|over)

word

2vec

“The fox jumped over the lazy dog”

P(w|c)

Extract pairs from context window around every input word.

word

2vec


c

P(w|c)


word

2vec


w

P(w|c)

c


word

2vec

P(w|c)

w c



word

2vec


P(w|c)

w c


word

2vec

P(w|c)

c w



word

2vec

P(w|c)

c w



word

2vec

P(w|c)

c w



word

2vec

P(w|c)

w c



word

2vec

P(w|c)

cw



word

2vec

P(w|c)

cw



word

2vec

P(w|c)

cw



word

2vec

P(w|c)

c w



word

2vec

P(w|c)

c w



objectiv

e

Measure loss between w and c?

How should we define P(w|c)?

objectiv

e

w . c

How should we define P(w|c)?

Measure loss between w and c?

word

2vec

w . c ~ 1

objectiv

e

w

c

vcanada . vsnow ~ 1

word

2vec

w . c ~ 0

objectiv

e

w

cvcanada . vdesert ~0

word

2vec

w . c ~ -1

objectiv

e

w

c

word

2vec

w . c ∈ [-1,1]

objectiv

e

word

2vec

But we’d like to measure a probability.

w . c ∈ [-1,1]

objectiv

e

word

2vec


objectiv

e

∈ [0,1]σ(c·w)

word

2vec


objectiv

e

∈ [0,1]σ(c·w)

w c

w c

SimilarDissimilar

word

2vec

Loss function:

objectiv

e

L=σ(c·w)

Logistic (binary) choice. Is the (context, word) combination from our dataset?

word

2vec

The skip-gram negative-sampling model

objectiv

e

Trivial solution is that context = word for all vectors

L=σ(c·w)w

c

word

2vec


L = σ(c·w) + σ(-c·wneg)

objectiv

e

Draw random words in vocabulary.

word

2vec


objectiv

e

Discriminate positive from negative samples

Multiple Negative

L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg)

word

2vec

The SGNS ModelPMI

ci·wj = PMI(Mij) - log k

…is extremely similar to matrix factorization!

Levy & Goldberg 2014


word

2vec

The SGNS ModelPMI


‘traditional’ NLP


ci·wj = PMI(Mij) - log k

…is extremely similar to matrix factorization!

word

2vec

The SGNS Model

L = σ(c·w) + Σσ(-c·w)

PMI

ci·wj = log


#(ci,wj)/n

k #(wj)/n #(ci)/n


word

2vec

The SGNS Model

L = σ(c·w) + Σσ(-c·w)

PMI

ci·wj = log


popularity of c,wk (popularity of c) (popularity of w)


word

2vec

PMI

99% of word2vec is counting.

And you can count words in SQL

word

2vec

PMI

Count how many times you saw c·w

Count how many times you saw c

Count how many times you saw w

word

2vec

PMI

…and this takes ~5 minutes to compute on a single core. Computing SVD is a completely standard math library.

word2vec

ITEM_3469 + ‘Pregnant’

+ ‘Pregnant’

= ITEM_701333 = ITEM_901004 = ITEM_800456

what about?LDA?

LDA on Client Item Descriptions

LDA on Item

Descriptions (with Jay)

LDA on Item


LDA on Item


lda vs word2vec

Bayesian Graphical ModelML Neural Model

word2vec is local: one word predicts a nearby word

“I love finding new designer brands for jeans”


But text is usually organized.


But text is usually organized.


In LDA, documents globally predict words.

doc 7681

typical word2vec vector

[ 0%, 9%, 78%, 11%]

typical LDA document vector

[ -0.75, -1.25, -0.55, -0.12, +2.2]

All sum to 100%All real values

5D word2vec vector

[ 0%, 9%, 78%, 11%]

5D LDA document vector

[ -0.75, -1.25, -0.55, -0.12, +2.2]

Sparse All sum to 100%

Dimensions are absolute

Dense All real values

Dimensions relative

100D word2vec vector

[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]


[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]

Sparse All sum to 100%

Dimensions are absolute

Dense All real values

Dimensions relative

dense sparse

100D word2vec vector

[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]


[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]

Similar in fewer ways (more interpretable)

Similar in 100D ways (very flexible)

+mixture +sparse

can we do both? lda2vec

-1.9 0.85 -0.6 -0.3 -0.5

Lufthansa is a German airline and when

fox

#hidden units

Skip grams from sentences

Word vector

Negative sampling loss


German

word2vec predicts locally: one word predicts a nearby word

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics

#hidden units#hidden units

#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


German

Document vector predicts a word from

a global context

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


We’re missing mixtures & sparsity!

German

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


We’re missing mixtures & sparsity!

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


Now it’s a mixture.

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


Trinitarian baptismal

Pentecostals Bede

schismatics excommunication

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

#topicsDocument weight

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


topic 1 = “religion” Trinitarian baptismal

Pentecostals Bede

schismatics excommunication

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


Milosevic absentee

Indonesia Lebanese Isrealis

Karadzic

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


topic 2 = “politics” Milosevic absentee

Indonesia Lebanese Isrealis

Karadzic

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


Sparsity!

0.34 -0.1 0.17

41% 26% 34%

-1.4 -0.5 -1.4

-1.9-1.7 0.75

0.96-0.7 -1.9

-0.2-1.1 0.6

-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5

-2.6 0.45 -1.3 -0.6 -0.8


#topics

#topicsfox

#hidden units

#topics


#hidden units


Word vector


Topic matrix

Document proportion

Document weight

Document vector

Context vector

x

+


34% 32% 34%

t=0

41% 26% 34%

t=10

99% 1% 0%

t=∞

time

@chrisemoody

lda2vec.com


http://lda2vec.com

+ API docs + Examples + GPU + Tests

@chrisemoody

lda2vec.com


http://lda2vec.com

@chrisemoody Example Hacker News comments

Topics: http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/

hacker_news/lda2vec/lda2vec.ipynb

Word vectors: https://github.com/cemoody/

lda2vec/blob/master/examples/hacker_news/lda2vec/

word_vectors.ipynb


http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/lda2vec.ipynb

https://github.com/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/word_vectors.ipynb

@chrisemoody

lda2vec.com

human-interpretable doc topics, use LDA.

machine-useable word-level features, use word2vec.

if you like to experiment a lot, and have topics over user / doc / region / etc. features, use lda2vec. (and you have a GPU)

If you want…


http://lda2vec.com

?@chrisemoody

Multithreaded Stitch Fix


@chrisemoody

lda2vec.com


http://lda2vec.com

CreditLarge swathes of this talk are from

previous presentations by:

• Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper

http://www.coling-2014.org/COLING%202014%20Tutorial-fix%20-%20Tomas%20Mikolov.pdf

http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

http://radimrehurek.com/2014/12/making-sense-of-word2vec/

http://web.engr.illinois.edu/~khashab2/files/2014_presentations/2014_acl_goldberg.pptx

http://cs224d.stanford.edu/syllabus.html

http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf

“PS! Thank you for such an awesome idea”

@chrisemoody

doc_id=1846

Can we model topics to sentences? lda2lstm


Can we model topics to sentences? lda2lstm

“PS! Thank you for such an awesome idea”doc_id=1846

@chrisemoody

Can we model topics to images? lda2ae

TJ Torres


and now for something completely crazy4Fun Stuff

translation

(using just a rotation matrix)

Miko

lov

2013

English

Spanish

Matrix Rotation

deepwalk

Perozz

i

et al 2

014

learn word vectors from sentences


vOUT vOUT vOUT vOUTvOUTvOUT

‘words’ are graph vertices ‘sentences’ are random walks on the graph

word2vec

Playlists at Spotify

context

sequence

lear

ning

‘words’ are song indices ‘sentences’ are playlists

Playlists at Spotify

contextErik

Bernhar

dsson

Great performance on ‘related artists’

Fixes at Stitch Fix

sequence

lear

ning

Let’s try: ‘words’ are items ‘sentences’ are fixes

Fixes at Stitch Fix

context

Learn similarity between styles because they co-occur

Learn ‘coherent’ styles

sequence

lear

ning

Fixes at Stitch Fix?

context

sequence

lear

ningGot lots of structure!


context

sequence

lear

ning


context

sequence

lear

ning

Nearby regions are consistent ‘closets’

?@chrisemoody



context dependent

Levy

& G

oldberg

2014

Australian scientist discovers star with telescopecontext +/- 2 words

context dependent

context

Australian scientist discovers star with telescope

Levy

& G

oldberg

2014

context dependent

context

Australian scientist discovers star with telescopecontext

Levy

& G

oldberg

2014

context dependent

context

BoW DEPS

topically-similar vs ‘functionally’ similar

Levy

& G

oldberg

2014

?@chrisemoody



Crazy Approaches

Paragraph Vectors (Just extend the context window)

Content dependency (Change the window grammatically)

Social word2vec (deepwalk) (Sentence is a walk on the graph)

Spotify (Sentence is a playlist of song_ids)

Stitch Fix (Sentence is a shipment of five items)

CBOW


Guess the word given the context

~20x faster. (this is the alternative.)

vOUT

vIN vINvIN vINvIN vIN

SkipGram


vOUT vOUT

vIN

vOUT vOUT vOUTvOUT

Guess the context given the word

Better at syntax. (this is the one we went over)

lda2

vec

vDOC = a vtopic1 + b vtopic2 +…

Let’s make vDOC sparse

lda2

vec

This works! 😀 But vDOC isn’t as interpretable as the topic vectors. 😔

vDOC = topic0 + topic1

Let’s say that vDOC ads

lda2

vec

softmax(vOUT * (vIN+ vDOC))

theory of lda2vec

lda2

vec

pyLDAvis of lda2vec

lda2

vec

LDA Results

context

History

I loved every choice in this fix!! Great job!

Great Stylist Perfect

LDA Results

context

History

Body Fit

My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted.

Very hard for me to find pants that fit right.

LDA Results

context

History

Sizing

Really enjoyed the experience and the pieces, sizing for tops was too big.

Looking forward to my next box!

Excited for next

LDA Results

context

History

Almost Bought

It was a great fix. Loved the two items I kept and the three I sent back were close!

Perfect

All of the following ideas will change what ‘words’ and ‘context’ represent.

parag

raph

vecto

r

What about summarizing documents?

On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that


The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.

parag

raph

vecto

r

Normal skipgram extends C words before, and C words after.

IN

OUT OUT


The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.

parag

raph

vecto

r

A document vector simply extends the context to the whole document.

IN

OUT OUT

OUT OUTdoc_1347

fromgensim.modelsimportDoc2Vecfn=“item_document_vectors”model=Doc2Vec.load(fn)model.most_similar('pregnant')matches=list(filter(lambdax:'SENT_'inx[0],matches))

#['...Iamcurrently23weekspregnant...',#'...I'mnow10weekspregnant...',#'...notshowingtoomuchyet...',#'...15weeksnow.Babybump...',#'...6weekspostpartum!...',#'...12weekspostpartumandamnursing...',#'...Ihavemybabyshowerthat...',#'...amstillbreastfeeding...',#'...Iwouldloveanoutfitforababyshower...']

sente

nce

sear

ch

Date post:	15-Apr-2017
Category:	Technology
Upload:	christopher-moody
View:	1,597 times
Download:	0 times

lda2vec Text by the Bay 2016

Technology