Post on 19-Jul-2015
transcript
True Fit Skin Care
Chang Liu Fellow at Insight Data Science 2014
So many products…
What makes it so hard? Overwhelming informa7on
So many products… So many reviews…
What makes it so hard? Overwhelming informa7on
So many products… So many reviews…
What makes it so hard? Overwhelming informa7on
Reviews can be so long…
So many products… So many reviews…
What makes it so hard? Overwhelming informa7on
Reviews can be so long…
So many ingredients…
So many products… So many reviews…
Time spent
Money wasted
Happiness
What makes it so hard? Overwhelming informa7on
Reviews can be so long…
So many ingredients…
32k Reviewers • w/ 2+ reviews
~1200 Products • ~80 brands • 8 categories
184k Reviews • Ra7ng [1-‐5] • Review text • Quick take
Collabora7ve Filter using User Reviews from Sephora.com
Product
X Y …
Review
ers
1 …
2 …
3 …
… …
… …
N …
Algorithm: • Item-‐centric collabora7ve filter • Pearson’s correla7on coefficients
to measure pairwise similarity
32k Reviewers • w/ 2+ reviews
~1200 Products • ~80 brands • 8 categories
184k Reviews • Ra7ng [1-‐5] • Review text • Quick take
Collabora7ve Filter using User Reviews from Sephora.com
Product
X Y …
Review
ers
1 …
2 …
3 …
… …
… …
N …
Algorithm: • Item-‐centric collabora7ve filter • Pearson’s correla7on coefficients
to measure pairwise similarity
Similarity = cXY =(Xi − X)(Yi −Y )
i=1
N
∑
(Xi − X)2
i=1
N
∑ (Yi −Y )2
i=1
N
∑
recommendation scoreui = rujcijj
M
∑ / cij
32k Reviewers • w/ 2+ reviews
~1200 Products • ~80 brands • 8 categories
184k Reviews • Ra7ng [1-‐5] • Review text • Quick take
Collabora7ve Filter using User Reviews from Sephora.com
Product
X Y …
Review
ers
1 …
2 …
3 …
… …
… …
N …
Algorithm: • Item-‐centric collabora7ve filter • Pearson’s correla7on coefficients
to measure pairwise similarity
Cross Valida9on • 5-‐fold for reviewer • Leave-‐one-‐out for product • Accuracy = 86.3% ± 1%
Similarity = cXY =(Xi − X)(Yi −Y )
i=1
N
∑
(Xi − X)2
i=1
N
∑ (Yi −Y )2
i=1
N
∑
recommendation scoreui = rujcijj
M
∑ / cij
Visualize the similarity matrix
White = high similarity Black = low similarity
Sorted by brands alphabe7cally
White in a square =
Users reviews are similar for all products in a brand
= Strong customer loyalty
There are structures!
“Organic & Natural”
Expensive!
There are structures! For example…
Cost effec7ve
There are structures! For example…
Ac9onable Insights For Sephora.com: Send marke7ng emails to new customers of brands with stronger customer loyalty!
“Organic & Natural”
Expensive!
Cost effec7ve
Chang Liu PhD. in Civil Engineering @CMU J8D8L5@gmail.com linkedin.com/in/changliucmu github.com/R4trtry
Is the ra7ng a good measure of reviewers’ perspec7ve?
• Trained a NaïveBaysian classifier for sen7ment analysis
• W/ 250 thousand reviews from Birchbox.com
• A website that sends out free samples from smaller brands and gathers massive user reviews
Most common words Most informa9ve feature
Word Count Nega9ve Posi9ve
skin 91349 re-‐wash Penny
product 82481 garbage hook
use 64044 mediocre gorgeous
love 55691 ketchup perk
feel 47879 trash stock
face 42615 unimpressive glowing
like 41427 survey splurge
great 34155 ineffec7ve effortless
really 31672 gag Christmas
smell 27621 worthless happily
text quick take Precision 95.3% 85.4% Recall 89.8% 93.1%
Worth every penny!
Another Valida9on
Is the ra7ng a good measure of reviewers’ perspec7ve?
Another Valida9on
Product X
Product Y
similarity 87.4%
Product X
Product Y
1 1
1 1 1
1 1 1
1 1 1
1
Review
ers
Product
X Y …
1 …
2 …
3 …
… …
… …
N …
M products reviewed by N reviewers
Pairwise similarities are measuredby Pearson's correlation coefficients:
cXY =(Xi − X)(Yi −Y )
i=1
N
∑
(Xi − X)2
i=1
N
∑ (Yi −Y )2
i=1
N
∑
Then weight the ratingsbased on the correlation coefficients:
Scorei =cijr uj
j
M
∑
| cij |
ruj : User u's preference on item j
Algorithm: Item-‐centric collabora7ve filter