Dynamic Topic Modeling for Monitoring Market …hzhang2/projects/BrandCompetition/... · Dynamic...

Dynamic Topic Modeling for

Monitoring Market Competition from

Online Text and Image Data

1

August 9, 2015

Hao Zhang1 Gunhee Kim2 Eric P. Xing1

1: School of Computer Science, Carnegie Mellon University2: Department of Computer Science and Engineering, Seoul

National University

• Introduction

• Model

• Learning and Inference

• Evaluation

• Visualization -- Dynamics and Competitions

• Conclusion

Outline

2

3

Background

The increasing pervasiveness of the Internet has lead to a wealth of consumer-

created data over a multitude of online platforms

General public’s experience towards

different companies’ products and service

Performance evaluations in different

market conditions (time, location etc.)

What can we learn?

4

Background

The increasing pervasiveness of the Internet has lead to a wealth of consumer-

created data over a multitude of online platforms

What does marketers want to see?

• Detection: Listen in consumers’ opinions towards their products and

their competitors

• Summarization: Summarize/visualize how a shared market is

occupied by different brands

• Dynamics: Monitoring the changes of market competition over time

5

Problem Statement

SuperBowl + beer Watch + luxury

corona budlight guiness rolex omegaburberry

compete compete

6

Problem Statement

#Style #Prada Black Leather & Nylon

Tessuto Saffiano Shoulder #Bag

http://dlvr.it/8WZKM2 #Forsale #Auction

Coat from @ASOS , top from @FreePeople,

jeans from Rag & Bone, boots from

#ChristianLouboutin & bag from @Prada .

What is the most beautifully-designed

perfume bottle? Tell us on the blog here:

http://smarturl.it/ie2fka and win Gucci

The latest crop of #Chanel Pre-Spring

bags have arrived! See the full

collection now: http://bit.ly/1z3PnKG

Pretty In Pink: From @Chanel to @nailsinc,

the best petal-hued make-up launches this

spring http://vogue.uk/8p6UOi

Designer Kate Spade, Invicta, Gucci &

More Watches from $22 & Extra 20% Off

http://www.dealsplus.com/t/1zr85Y

Chanel

Gucci

Prada

(a) Input: Tweets and associated images of competing brands

7

Problem Statement

watch+diamond

rolex, watch, gold, dial,

mens, datejust, ladies,

steel, diamond, oyster,

stainless,18k

glasses

chanel, giorgio,

sunglasses, classic,

glasses, reading, women's,

#burberrygifts

bags

bag, leather, gucci,

handbag, tote, clothing,

shoulder, canvas, reading,

women's,

watch+diamond

watch, gold, white date,

ladies, dial gift, rolex

#deals_us, blue, vintage,

bracelet, omega,

glasses

chanel, sunglasses, listen,

green, funny, dark, xmas,

womens, Armani,

excellent, Havana. lacoste

bags

authentic, leather, bag,

shoes, gucci, handbag,

prada, tote, deals, brown,

wallet

t t + 1 Timeline

(b) Output: Temporal evolution of topics and brands’ proportion over the topics

Topics (text / visual words) Brands over topics

8

A large portion of tweets simply show images&links without any meaningful

text in them. Images play an important role for representing topics in this type

of documents

Why are joint interpretation of text and images helpful for

online market intelligence?

Take advantage of the pervasiveness of images on the social media

• No previous attempts so far to jointly leverage text and pictures for online market intelligence

Our Approach: Joint Analysis of Text and Images

Oh, it’s really the most beautifully-

designed perfume bottle I have ever

seen!!!

Tweets w/ external links

Tweets w/o external links

Tweets directly w/ images

Tweets directly w/o images

72%

30%

70%

28%

9

Many users prefer to use images to deliver their idea more clearly and broadly,

and thus the topic detection with images reflects users’ intents better.






5.5 million

tweets

6.6 million

images

10

The joint use of images with text also helps marketers interpret the discovered

topics.






What a wonderfulllllllll night!!!!!

140 characters limit

winter

dior

nude

nutrition

Hydrations

marketers may need to see the

associated images to understand

key ideas of tweets easier and

quicker

11

Related Work

Online Market Intelligence

Topic Model for Econometrics

BrandPluse[KDD05] Market-Structure[2012]

Brand Monitoring[2011]

Competitive

Intelligence[2011]Show me the money! [KDD

2007]

• Competitive brands on latent topics

• Jointly leverage text and images

Financial TM [2009] Purchase Behavior [2009]

Topic Sentiment Mixture

[2007]

Online Reviews TM [2008]

Geo TM [2013]

• Modeling brands and competitions

• Jointly leverage text and images

12

Related Work

Dynamic and Multi-view Topic Models

Dynamic TM[2006] Latent Subspace Learning

[2012]

Topic Models for Image

Annotation and Text

illustraction[2010]

Bilateral Correspondence

Model [2014]

• Directly modeling the competition of multiple entities (e.g. brands) over shared topic

spaces

• Modeling the interaction between multiple brands and entities

13

Model

• Input:

– 𝓑 = {1,… ,𝓑𝐿} a set of competition brands of interest

#Style #Prada Black Leather & Nylon Tessuto

Saffiano Shoulder #Bag

http://dlvr.it/8WZKM2 #Forsale #Auction

Coat from @ASOS , top from @FreePeople,

jeans from Rag & Bone, boots from

#ChristianLouboutin & bag from @Prada .

Prada




𝑑 = {𝒖𝑑 , 𝒗𝑑 , 𝒈𝑑}∋ ∋

– 𝓑𝐿 is a set of documents related with brand 𝑙

– 𝑑 = {𝒖𝑑 , 𝒗𝑑 , 𝒈𝑑} ∈ 𝓑𝐿 is a document consisting of text and images

– 𝒖𝑑 vector representation of the text document

– 𝒗𝑑 vector representation of the images

– 𝒈𝑑 ∈ 𝑅𝐿 vector notation which brands are associated with document 𝑑

Dataset

• We collect raw tweets and associated images using Twitter REST API

• Two groups of bands: Luxury (13 brands) and Beer (12 brands)

• Total 6.6 million of tweets and 7.5 million of images, ranging from

10/20/2014 to 02/01/2015

14

TextTF-IDF

vector

TF-IDF

ImagesVGGNet

featurevector

VGGNet

CNN-128 Quantization

Normalization

alignment

𝑈 = {1,… . 𝐺}

𝒖𝑑 = {𝒖𝑑1, 𝒖𝑑1, … , 𝒖|𝑁|}𝑇

vocabulary

vector

𝑉 = {1,… . 𝐻}

𝒗𝑑 = {𝒗𝑑1, 𝒗𝑑1, … , 𝒗|𝑀|}𝑇

visual vocabulary

vector

Image

Text

• Get the vector representations

Raw textTokenize

15

Model

• Base Model: Sparse Topical Coding

𝜃𝑑

𝛽𝑘

𝑘 = 1:𝐾

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑑 = 1:𝐷 Advantages:

• We encourage each document to be associated with

only a small number of strong topics for better

analysis of the interaction between multiple brands

• Sparsity leads to a more robust text/image

representation in topic space, especially for short

documents like tweets (140 characters’ limt)

16

Model

• Multi-view Extension

• Both text and image words share a same document

code 𝜽𝜃𝑑

𝛽𝑘

𝛾𝑘𝑘 = 1:𝐾

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑑 = 1:𝐷

sparsity on

document code

exponential family

sparsity on word

code

• 𝛾: visual topic-word matrix

• Define the distributions as follows:

sample the prior

𝑝 𝜽 ∝ exp(−𝜆 𝜽 1)

sample the word code

𝑝 𝑧𝑑𝑛|𝜽𝑑 ∝ exp −𝛿𝑢 𝑧𝑑𝑛 − 𝜃𝑑 22 − 𝜌𝑢||𝑧𝑑𝑛||1

𝑝 𝑦𝑑𝑚|𝜽𝑑 ∝ exp(−𝛿𝑣 𝑦𝑑𝑚 − 𝜃𝑑 22 − 𝜌𝑣||𝑦𝑑𝑚||1)

sample the word count

𝑝 𝑢𝑑𝑛|𝒛𝑑𝑛, 𝜷 ∝ 𝑁 𝑢𝑑𝑛; 𝑧𝑑𝑛𝑇 𝜷.𝑛, 𝜎𝑢

2𝑰

𝑝 𝑣𝑑𝑚|𝒚𝑑𝑚, 𝜸 ∝ 𝑁 𝑣𝑑𝑚; 𝑦𝑑𝑚𝑇 𝜸.𝑚, 𝜎𝑣

2𝑰

17

Model

• Dynamic extension

𝜃𝑑𝜃𝑑

𝛽𝑘𝑡+1𝛽𝑘

𝑡

𝛾𝑘𝑡+1𝛾𝑘

𝑡𝑘 = 1:𝐾

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑑 = 1:𝐷 𝑑 = 1:𝐷

𝑡 = 1: 𝑇

• Based on the discrete dTM [Blei06]

• Divide a corpus of documents into

sequential groups, so that 𝛽 and 𝛾 change

over time

• State space model with a Gaussian noise:

𝑝 𝜷𝑘.𝑡 𝜷𝑘.𝑡−1 = 𝑁(𝜷𝑘.

𝑡−1, 𝜎𝛽2𝐼)

𝑝 𝜸𝑘.𝑡 𝜸𝑘.𝑡−1 = 𝑁(𝜸𝑘.

𝑡−1, 𝜎𝛾2𝐼)

18

Model

• Competition Extension

𝜃𝑑𝜃𝑑

𝜑𝑘𝑡+1𝜑𝑘

𝑡


𝑡


𝑡𝑡 = 1: 𝑇

𝑘 = 1:𝐾

𝑟𝑑𝑏

𝑔𝑑𝑏

𝑟𝑑𝑏

𝑔𝑑𝑏

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑑 = 1:𝐷 𝑑 = 1:𝐷 • Competition:

𝝓 ∶ 𝑹𝑲×𝑳, proportions of brands on latent

topics, 𝑔𝑑 ∈ 𝑅𝐿 brand vector for document

d, 𝑟𝑑𝑏 ∈ 𝑅𝐾 brand code in topic space

• Distributions:

𝑝 𝑟𝑑𝑏|𝜽𝑑 ∝ exp −𝛿𝑏 𝑟𝑑𝑏 − 𝜃𝑑 22 − 𝜌𝑏||𝑟𝑑𝑏||1

𝑝 𝑔𝑑𝑏|𝒓𝑑𝑏, 𝝓 ∝ 𝑁 𝑔𝑑𝑏; 𝑟𝑑𝑏𝑇 𝝓.𝑏, 𝜎𝑏

2𝑰

𝑝 𝝓𝑘.𝑡 𝝓𝑘.𝑡−1 = 𝑁(𝝓𝑘.

𝑡−1, 𝜎𝜙2𝐼)

• Dynamics:

𝝓 is evolved over time using Gaussian state

space model

bridge

19

Model

• Competition Extension

𝜃𝑑𝜃𝑑

𝜑𝑘𝑡+1𝜑𝑘

𝑡


𝑡


𝑡𝑡 = 1: 𝑇

𝑘 = 1:𝐾

𝑟𝑑𝑏

𝑔𝑑𝑏

𝑟𝑑𝑏

𝑔𝑑𝑏

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑧𝑑𝑛

𝑢𝑑𝑛

𝑦𝑑𝑚

𝑣𝑑𝑚

𝑑 = 1:𝐷 𝑑 = 1:𝐷

bridge

20

Learning and Inference

• Map Formulation

𝒑 𝜽, 𝒛, 𝒖, 𝒚, 𝒗, 𝒓, 𝒈 𝜷, 𝜸,𝝓

= 𝒑 𝜽

𝒏∈𝑵

𝒑 𝒛𝒏 𝜽 𝒑(𝒖𝒏|𝒛𝒏, 𝜷)

𝒎∈𝑴

𝒑 𝒚𝒎 𝜸 𝒑(𝒗𝒎|𝒚𝒎, 𝜸)

𝒃∈𝑩

𝒑 𝒓𝒃 𝝓 𝒑(𝒈𝒃|𝒓𝒃, 𝝓)

• Joint Probability

−log𝒑(Θ𝑡 , 𝜷𝒕, 𝜸𝒕, 𝝓𝒕| 𝒖𝒅𝒕 , 𝒗𝒅𝒕 , 𝒈𝒅𝒕𝒅=𝟏

𝑫𝒕

)

∝ −log𝒑(Θ𝑡, 𝒖𝒅𝒕 , 𝒗𝒅𝒕 , 𝒈𝒅𝒕𝒅=𝟏

𝑫𝒕

|𝜷𝒕, 𝜸𝒕, 𝝓𝒕)

• Denote Θ𝑡 = 𝜃𝑑𝑡 , 𝑧𝑑𝑡 , 𝑦𝑑𝑡 , 𝑟𝑑𝑡𝑑=1𝐷𝑡 (i.e., add the superscript 𝑡)

• Negative log posterior

21

Learning and Inference• Minimize the negative log posterior:

m𝑖𝑛Θ𝑡,𝜷𝑡,𝜸𝑡,𝜙𝑡 𝑡=1

𝑇

𝑡=1

𝑇

𝑑=1

𝐷

𝜆||𝜽𝑑𝑡 ||1

+

𝑡=1

𝑇

(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2

2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2

2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2

2)

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑛∈𝑁𝑑𝑡

(𝜈1||𝒛𝑑𝑛𝑡 − 𝜽𝑑

𝑡 ||22 + 𝜌1||𝒛𝑑𝑛

𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑚∈𝑁𝑑𝑡

(𝜈2||𝒚𝑑𝑚𝑡 − 𝜽𝑑

𝑡 ||22 + 𝜌2||𝒚𝑑𝑚

𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑏∈𝐵𝑑𝑡

(𝜈3||𝒓𝑑𝑏𝑡 − 𝜽𝑑

𝑡 ||22 + 𝜌3||𝒓𝑑𝑏

𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))

𝑠. 𝑡. 𝜽𝑑𝑡 > 0, ∀𝑑, 𝑡. 𝒛𝑑𝑛

𝑡 , 𝒚𝑑𝑚𝑡 , 𝒓𝑑𝑏𝑡 > 0, ∀𝑑, 𝑛,𝑚, 𝑏, 𝑡

𝛽𝑘𝑡 ∈ 𝑃𝑈, 𝛾𝑘

𝑡 ∈ 𝑃𝑉 , 𝜙𝑘𝑡 ∈ 𝑃𝐵 , ∀𝑘, 𝑡

sparse term for

document code

evolving chain

text

image

brand

simplex constraint

22



𝑇

𝑡=1

𝑇

𝑑=1

𝐷


+

𝑡=1

𝑇

(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2

2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2

2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2

2)

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑛∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌1||𝒛𝑑𝑛

𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑚∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌2||𝒚𝑑𝑚

𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑏∈𝐵𝑑𝑡


𝑡 ||22 + 𝜌3||𝒓𝑑𝑏

𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))





23



𝑇

𝑡=1

𝑇

𝑑=1

𝐷


+

𝑡=1

𝑇

(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2

2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2

2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2

2)

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑛∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌1||𝒛𝑑𝑛

𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑚∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌2||𝒚𝑑𝑚

𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑏∈𝐵𝑑𝑡


𝑡 ||22 + 𝜌3||𝒓𝑑𝑏

𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))





24



𝑇

𝑡=1

𝑇

𝑑=1

𝐷


+

𝑡=1

𝑇

(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2

2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2

2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2

2)

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑛∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌1||𝒛𝑑𝑛

𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑚∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌2||𝒚𝑑𝑚

𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑏∈𝐵𝑑𝑡


𝑡 ||22 + 𝜌3||𝒓𝑑𝑏

𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))





25



𝑇

𝑡=1

𝑇

𝑑=1

𝐷


+

𝑡=1

𝑇

(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2

2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2

2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2

2)

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑛∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌1||𝒛𝑑𝑛

𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑚∈𝑁𝑑𝑡


𝑡 ||22 + 𝜌2||𝒚𝑑𝑚

𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))

+

𝑡=1

𝑇

𝑑=1

𝐷𝑡

𝑏∈𝐵𝑑𝑡

(𝜈3||𝒓𝑑𝑡 − 𝜽𝑑

𝑡 ||22 + 𝜌3||𝒓𝑑𝑏

𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))





26

Model Evaluation

As a Topic Model: Topic Quality Evaluation–Argument 1: Lower perplexity ≠ higher quality [J. Chang 2009]

–Argument 2: Perplexity is not a fair metric for models with different

distributions

• We directly evaluate the Coherence and Validity of the learned topics [Xie

2013]

–Define the Coherence Measure (CM):

𝑪𝑴 =# 𝒐𝒇 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒘𝒐𝒓𝒅𝒔

# 𝒐𝒇 𝒘𝒐𝒓𝒅𝒔 𝒊𝒏 𝒗𝒂𝒍𝒊𝒅 𝒕𝒐𝒑𝒊𝒄𝒔

–Define the Validity Measure (VM):

𝐕𝑴 =# 𝒐𝒇 𝒗𝒂𝒍𝒊𝒅 𝒕𝒐𝒑𝒊𝒄𝒔

# 𝒐𝒇 𝒕𝒐𝒑𝒊𝒄𝒔

• Both textual and visual topics are evaluated on the Amazon Mechanical

Turk

27

Model Evaluation

As a Topic Model: Topic Quality Evaluation

VM (Beer / Luxury) CM (Beer / Luxury)

dLDA 0.53 / 0.68 0.55 / 0.52

STC + dyn 0.44 / 0.66 0.57 / 0.57

cdSTC + multi 0.51 / 0.70 0.63 / 0.59

cdSTC + text 0.605 / 0.71 0.61 / 0.59

VM (Beer / Luxury) CM (Beer / Luxury)

Kmeans 0.39 / 0.56 0.59 / 0.64

LDA + multi 0.57 / 0.63 0.51 / 0.69

cdSTC + multi 0.57 / 0.65 0.66 / 0.71

• Average VM/CM on text topics

• Average VM/CM on visual topics

28

Model Evaluation

As a Topic Model: Evaluation on Prediction

• Task I: Given a novel tweet, can we predict its most associated brand?

– Supervised dSTC (sdSTC): infer the most associated brand

maxΘ𝑡,𝓜𝑡,𝞰𝑡 𝑡=1

𝑇

𝑡=1

𝑇

𝑓 Θ𝑡 ,𝓜𝑡 , 𝐷𝑡 + 𝐶𝑅 Θ𝑡 , 𝞰𝑡 +1

2𝞰𝑡 22


𝑡 , 𝒚𝑑𝑚𝑡 > 0, ∀𝑑, 𝑛,𝑚, 𝑡


𝑡 ∈ 𝑃𝑉 , ∀𝑘, 𝑡

where 𝑅 is the multi-class hinge loss.

• Solved using coordinated descent




GucciModelinfer

novel tweets

29

Model Evaluation


• Task I-I:

– Randomly split data in every time slice into 90% for training and 10%

for testing

– Motivation: let the model see data in every time slice

– Text and images complement each other to detect more representative

topics

(a) Beer (b) Luxury

30

Model Evaluation


• Task I-II:

– Use the data in [1, 𝑡 − 1] for training, [𝑡 − 1, 𝑡] for testing

– Motivation: let the model only see data in past time slices

– Image data is very helpful to predict the future

(a) Beer (b) Luxury

31

Model Evaluation


• Task II: given an unseen past document, can we predict which time slice it

is likely to belong?

max𝑡𝑝(𝑑|𝓜𝑡) , 𝑤ℎ𝑒𝑟𝑒

𝑝(𝑑|𝓜𝑡) = 𝑛∈𝑁𝑑 𝑝(𝑢𝑛|𝜷𝑡) 𝑚∈𝑀𝑑 𝑝(𝑣𝑚|𝜸

𝑡) 𝑏∈𝐵𝑑 𝑝(𝑔𝑏|𝝓𝑡)




Modellocate

past tweets

t

Sent at this time

point

time

32

Model Evaluation


• Task II

– Randomly split the data of every time slice into 90% for training and

10% for localization test.

– The explicit modeling of brand information does help improve the

performance

(a) Beer (b) Luxury

33

Model Evaluation

An Interesting Prediction Task

• Task III: what if we want to predict the future competition trends

according to past data?

• How? Given past data, we evolve the occupation matrix 𝜙 over time

[1, t-1]

1 0 00 1 00 0 1

timet

1 0 00 1 00 0 1···

evolve

t + 1

learn

t1 0 00 1 00 0 1

counting

compare

An Interesting Prediction Task

• Task III

– Evaluated using the KL divergence

Groundtruth

Prediction

Bags PerfumeWatch34

Model Evaluation

0.4019 0.2615 0.0739

35

Visualization: Brand Competition

Monitoring Competitions and Dynamics

As a monitor, we aim to answer:

• Static: how brands occupy the market in one time slice?

• Dynamic:

– how each textual/visual topic evolves over time?

– how each brand’s occupation changes over time? (local)

– how’s the competition trends between multi-brands like over time?

(global)

easy

difficult

36



Topic: beauty

beauty

makeup

lip

pink

gloss

glow

color

optimum

draw

plumper

t=1 (2014-10-22)

lip

beauty

makeup

color

skin

pink

gloss

eye

dioraddict

palatte

t=2 (2014-10-30)

dior

beauty

men

cologne

makeup

women

perfume

chanel

care

eye

t=3 (2014-11-06)

beauty

hot

care

makeup

lip

eye

pink

color

gloss

mascara

t=4 (2014-11-20)

time

37



Topic: beauty

beauty

care

dior

#Diorshow

designer

offers

eye

flow

chanel

mascara

t=5 (2014-11-20)

deals

health

glow

#sale

body

#diorskin

clothes

burberry

BlackFridday

all-in-1

healthy

beauty

order

skincare

winter

dior

nude

nutrition

makeup

hydrations

skin

dior

glasses

eye

winter

hydra

collagen

protection

beauty

eyeglass

t=6 (2014-11-30) t=7 (2014-12-08) t=8 (2014-12-15)

time



Topic: beauty

39



Topic: fake+bad

t=1 (2014-10-22) t=2 (2014-10-30) t=3 (2014-11-06) t=4 (2014-11-20)

time

fake

quality

bought

real

locked

issues

worrying

reason

don’t

bad

quality

wtf

bad

fake

call

store

check

italy

gucci

worried

check

fake

france

bad

left

compare

safety

droppin

called

trap

cheap

break

fake

bought

hard

compare

back

trust

drop

hell

40



Topic: fake+bad

t=5 (2014-11-20) t=6 (2014-11-30) t=7 (2014-12-08) t=8 (2014-12-15)

time

fake

don’t

risk

leather

told

issues

worrying

damn

wait

price

mixtape

issues

authentic

fake

price

cheap

back

money

risk

mad

fake

call

stop

wait

bad

support

back

hard

change

online

fake

support

leave

quality

care

sales

back

issues

shop

change



Topic: fake-bad



Topics: woman + dress

Topics: girl + waste

43

Conclusion

• We propose a novel dynamic topic model to correctly address three

major challenges:

– Multi-view representation of text and images

– Modeling of latent topics that are competitively shared by multiple

brands

– Tracking temporal evolution of the topics and brand occupations

• First attempt so far to propose a principled topic model to

– Discover the topics that are competitively shared between multiple

brands

–Track the temporal evolution of dominance of brands over topics by

leveraging both text and image data

44

Conclusion• We evaluate our algorithm using newly collected dataset from Twitter

from October 2014 to February 2015:

– 10 million tweets with 8 million of associated images

– Superior performance for dynamic topic modeling and three prediction

tasks:

• Prediction of the most associated brands

• Most-likely created time

• Competition trends for unseen tweet

– Visualizations of competition trends extracted from tons of data

• Various potential applications

– Social media monitoring and visualization

– Joint analysis of online multi-modal data

– Online market intelligence

45

Project page

Thank You!

http://www.cs.cmu.edu/~hzhang2/projects/BrandCompetition/brand

competition.html

Q & A

Date post:	07-Mar-2018
Category:	Documents
Upload:	doanthu
View:	220 times
Download:	1 times

Dynamic Topic Modeling for Monitoring Market …hzhang2/projects/BrandCompetition/... · Dynamic...

Documents