Chang liu insight 2014

transcript

True Fit Skin Care

Chang Liu Fellow at Insight Data Science 2014

So many products…

What makes it so hard? Overwhelming informa7on

So many products… So many reviews…

Reviews can be so long…

So many ingredients…

Time spent

Money wasted

Happiness

So many ingredients…

32k Reviewers •  w/ 2+ reviews

~1200 Products •  ~80 brands •  8 categories

184k Reviews •  Ra7ng [1-‐5] •  Review text •  Quick take

Collabora7ve Filter using User Reviews from Sephora.com

Product

X Y …

Review

… …

Algorithm: •  Item-‐centric collabora7ve filter •  Pearson’s correla7on coefficients

to measure pairwise similarity

Product

X Y …

Review

… …

Similarity = cXY =(Xi − X)(Yi −Y )

(Xi − X)2

∑ (Yi −Y )2

recommendation scoreui = rujcijj

∑ / cij

Product

X Y …

Review

… …

Cross Valida9on •  5-‐fold for reviewer •  Leave-‐one-‐out for product •  Accuracy = 86.3% ± 1%

Similarity = cXY =(Xi − X)(Yi −Y )

(Xi − X)2

∑ (Yi −Y )2

recommendation scoreui = rujcijj

∑ / cij

Visualize the similarity matrix

White = high similarity Black = low similarity

Sorted by brands alphabe7cally

White in a square =

Users reviews are similar for all products in a brand

= Strong customer loyalty

There are structures!

“Organic & Natural”

Expensive!

There are structures! For example…

Cost effec7ve

There are structures! For example…

Ac9onable Insights For Sephora.com: Send marke7ng emails to new customers of brands with stronger customer loyalty!

“Organic & Natural”

Expensive!

Cost effec7ve

Chang Liu PhD. in Civil Engineering @CMU J8D8L5@gmail.com linkedin.com/in/changliucmu github.com/R4trtry

Is the ra7ng a good measure of reviewers’ perspec7ve?

•  Trained a NaïveBaysian classifier for sen7ment analysis

•  W/ 250 thousand reviews from Birchbox.com

•  A website that sends out free samples from smaller brands and gathers massive user reviews

Most common words Most informa9ve feature

Word Count Nega9ve Posi9ve

skin 91349 re-‐wash Penny

product 82481 garbage hook

use 64044 mediocre gorgeous

love 55691 ketchup perk

feel 47879 trash stock

face 42615 unimpressive glowing

like 41427 survey splurge

great 34155 ineffec7ve effortless

really 31672 gag Christmas

smell 27621 worthless happily

text quick take Precision 95.3% 85.4% Recall 89.8% 93.1%

Worth every penny!

Another Valida9on

Is the ra7ng a good measure of reviewers’ perspec7ve?

Another Valida9on

Product X

Product Y

similarity 87.4%

Product X

Product Y

Review

Product

X Y …

… …

M products reviewed by N reviewers

Pairwise similarities are measuredby Pearson's correlation coefficients:

cXY =(Xi − X)(Yi −Y )

(Xi − X)2

∑ (Yi −Y )2

Then weight the ratingsbased on the correlation coefficients:

Scorei =cijr uj

| cij |

ruj : User u's preference on item j

Algorithm: Item-‐centric collabora7ve filter

Chang liu insight 2014

Technology